Disclaimer: I recently attended Storage Field Day 15. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
WekaIO recently presented at Storage Field Day 15. It’s not the first time I’ve heard from them, and you can read my initial thoughts on them here. You can see their videos from Storage Field Day 15 here, and download a PDF copy of my rough notes from here.
Enter The Matrix
Fine, I just rewatched The Matrix on the plane home. But any company with Matrix in the product name is going to get a few extra points from me. So what is it, Neo?
- Fully coherent POSIX file system that delivers local file system performance;
- Distributed Coding, more resilient at scale, fast rebuilds, end to end data protection
- Instantaneous snapshots, clones, tiering to S3, partial file rehydration;
- InfiniBand or Ethernet, Hyper-converged or Dedicated Storage Server; and
- Bare-metal, containerised, or running in a VM.
There’s an on-premises version and one built for public cloud use.
Liran Zvibel (Co-founder and CEO) took us through some of the key features of the architecture.
Software based for dynamic scalability
- Software scales to thousands of nodes and trillions of records;
- Significantly more scalable than any appliance offering; and
- Metadata scales to thousands of servers.
Patented erasure coding technology
- Allows us to use 66% less NVMe compared to triple replication;
- Fully distributed data and metadata for best parallelism / performance; and
- Snapshots for “free” with no performance impact.
Integrated tiering in a single namespace
- Allows for unlimited namespace critical for deep learning; and
- Enables backup and cloud bursting to public cloud.
I Know Kung Fu
[Look, I’m just going to torture the Matrix analogy for a little longer, so bear with me]. So what do I do with all of this performance in a storage subsystem? Well, the key focus areas for WekaIO include:
- Machine learning / AI;
- Digital Radiology / Pathology;
- Algorithmic Trading; and
- Genomic Sequencing and Analytics.
Most of these workloads deal with millions of files, very large capacities, and are very sensitive to poor latency. There’s also a cool use case for media and entertainment environments that’s worth checking out if you’re into that sort of thing.
Thoughts
WekaIO are aiming to do about 30% of their sales directly, meaning they lean heavily on the channel. Both HPE and Penguin Computing are OEM providers, and obviously there’s also a software-only play with the AWS version. They’re talking about delivering some very big numbers when it comes to performance, but my favourite thing about them is the focus on being able to access the same data through all interfaces, and quickly.
WekaIO make some strong claims about their ability to deliver a fast and scalable file system solution, but they certainly have the pedigree to deliver a solution that meets a number of those claims. There’re some nice features, such as the ability to add servers with different profiles to the cluster, and running nodes in hyper-converged mode. When it comes down to it, performance is defined by the amount of cores available. If you add more compute, you get more performance.
In my mind, the solution isn’t for everyone right now, but if you have a requirement for a performance focused, massively parallel, scale-out storage solution with the ability to combine NVMe and S3, you’d do worse than to check out what WekaIO can do.