Welcome to my semi-regular, random news post in a short format. This is #26. I was going to start naming them after my favourite basketball players. This one could be the Korver edition, for example. I don’t think that’ll last though. We’ll see. I’ll stop rambling now.
Do you know Cody? Cody’s a smart guy, and very good at expressing technical things on his blog. This article on deploying the Pure Storage OVA using PowerShell is a good example of that.
InfiniteIO has been doing some cool stuff. I spoke to them recently and will be writing something about them in the near future. In the meantime, here’s their most recent press release.
I wrote about Excelero recently, but neglected to mention some of what it’s been doing with NVIDIA. You can read more about that here.
It’s been a little while since I last wrote about Excelero. I recently had the opportunity to catch up with Josh Goldenhar and Tom Leyden and thought I’d share some of my thoughts here.
NVMe Performance Good, But Challenging
NVMe has really delivered storage performance improvements in recent times.
All The Kids Are Doing It
Great performance:
Up to 1.2M IOPs, 6GB/s per drive
Ultra-low latency (20μs)
Game changer for data-intensive workloads:
Mission-Critical Databases
Analytical Processing
AI and Machine Learning
But It’s Not Always What You’d Expect
IOPs and Bandwidth Utilisation
Applications struggle to use local NVMe performance beyond 3-4 drives
Stranded IOPS and / or bandwidth = poor ROI
Sharing is the Logical Answer, with local latency
Physical disaggregation is often operationally desirable
24 Drive servers are common and readily available
Data Protection Desired
NVMe performs, but by itself offers no data protection
Local data protection does not protect against server failures
Some NVMe-over-fabrics solutions offer controller based data protection, but limit IOPs, bandwidth and sacrifice latency.
Enterprise-ready: RAID 1/0, High Availability with fast failover, Thin Provisioning, CRC
Flexible Deployment Models
There are a few different ways you can deploy Excelero.
Converged – Local NVMe drives in Application Servers
Single, unified storage pool
NVMesh initiator and client on all nodes
NVMesh bypasses server CPU
Various protection levels
No dedicated storage servers needed
Linearly scalable
Highest aggregate bandwidth
Top-of-Rack Flash
Single, unified storage pool
NVMesh Target runs on dedicated storage nodes
NVMesh Client runs on application servers
Applications get performance of local NVMe storage
Various Protection Levels
Linearly scalable
Data Protection
There are also a number of options when it comes to data resiliency.
[image courtesy of Excelero]
Networking Options
You can choose either TCP/IP or RDMA. TCP/IP offers a latency hit, but it works with any NIC (and your existing infrastructure). RDMA has super low latency, but is only available on a limited subset of NICs.
NVEdge Then?
Excelero described NVEdge as “block storage software for building NVMe Flash Arrays for demanding workflows such as AI, ML and databases in the Cloud and at the Edge”.
Scale-up architecture
High NVMe AFA performance, leveraging NVMe-oF
Full bandwidth performance even at 4K block size
High availability, supporting:
Dual-port NVMe drives
Dual controllers (with fast failover, less than 100ms)
Active / active controller operation and active/passive logical volume access
Data services include:
RAID 1/0 data protection
Thin Provisioning: thousands of striped volumes of up to 1PB each
Enterprise grade block checksums (CRC 16/32/64).
Hardware Compatibility?
Supported Platforms
x86-based systems for higher aggregate performance
SmartNIC-based architectures for lower power & cost
HW Requirements
Each controller has PCIe connectivity to all drives
Controllers can communicate over a network
Controllers communicate over both the network and drive pairs to identify connectivity (failure) issues
Supported Networking
RDMA (InfiniBand or Ethernet) TCP/IP networking
Thoughts and Further Reading
NVMe has been a good news story for folks struggling with the limitations of the SAS protocol. I’ve waxed lyrical in the past about how impressed I was with Excelero’s offering. Not every workload is necessarily suited to NVMesh though, and NVEdge is an interesting approach to solving that problem. Where NVMesh provides a tonne of flexibility when it comes to deployment options and the hardware used, NVEdge doubles down on availability and performance for different workloads.
NVMe isn’t a handful of magic beans that will instantly have your storage workloads. You need to be able to feed it to really get value from it, and you need to be able to protect it too. It comes down to understanding what it is you’re trying to achieve with your applications, rather than just splashing cash on the latest storage protocol in the hope that it will make your business more money.
At this point I’d make some comment about data being the new oil, but I don’t really have enough background in the resources sector to be able to carry that analogy much further than that. Instead I’ll say this: data (in all of its various incantations) is likely very important to your business. Whether it’s something relatively straightforward like seismic data, or financial results, or policy documents, or it may be the value that you can extract from that data by having fast access to a lot of it. Whatever you’re doing with it, you’re likely investing in hardware and software that helps you get to that value. Excelero appears to have focused on ensuring that the ability to access data in a timely fashion isn’t the thing that holds you back from achieving your data value goals.
There are three key features that have been added to NVMesh.
MeshConnect – adding support for traditional network technologies TCP/IP and Fibre Channel, giving NVMesh the widest selection of supported protocols and fabrics of software-defined storage platforms along with already supported InfiniBand, RoCE v2, RDMA and NVMe-oF.
MeshProtect – offering flexible protection levels for differing application needs, including mirrored and parity-based redundancy.
MeshInspect – with performance analytics for pinpointing anomalies quickly and at scale.
Performance
Excelero have said that NVMesh delivers “shared NVMe at local performance and 90+% storage efficiency that helps further drive down the cost per GB”.
Protection
There’s also a range of protection options available now. Excelero tell me that you can start at level 0 (no protection, lowest latency) all the way to “MeshProtect 10+2 (distributed dual parity)”. This allows customers to “choose their preferred level of performance and protection. [While] Distributing data redundancy services eliminates the storage controller bottleneck.”
Visibility
One of my favourite things about NVMesh 2 is the MeshInspect feature, with a “built-in statistical collection and display, stored in a scalable NoSQL database”.
[image courtesy of Excelero]
Thoughts And Further Reading
Excelero emerged form stealth mode at Storage Field Day 12. I was impressed with their offering back then, and they continue to add features while focussing on delivering top notch performance via a software-only solution. It feels like there’s a lot of attention on NVMe-based storage solutions, and with good reason. These things can go really, really fast. There are a bunch of startups with an NVMe story, and the bigger players are all delivering variations on these solutions as well.
Excelero seem well placed to capitalise on this market interest, and their decision to focus on a software-only play seems wise, particularly given that some of the standards, such as NVMe over TCP, haven’t been fully ratified yet. This approach will also appeal to the aspirational hyperscalers, because they can build their own storage solution, source their own devices, and still benefit from a fast software stack that can deliver performance in spades. Excelero also supports a wide range of transports now, with the addition of NVMe over FC and TCP support.
NVMesh 2 looks to be smoothing some of the rougher edges that were present with version 1, and I’m pumped to see the focus on enhanced visibility via MeshInspect. In my opinion these kinds of tools are critical to the uptake of solutions such as NVMesh in both the enterprise and cloud markets. The broadening of the connectivity story, as well as the enhanced resiliency options, make this something worth investigating. If you’d like to read more, you can access a white paper here (registration required).
Disclaimer: I recently attended Storage Field Day 12. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Here are some notes from Excelero’s presentation at Storage Field Day 12. You can view the video here and download my rough notes here.
So what do they do?
Something called NVMesh Server SAN. And it’s pretty wild from what I saw during the demo. But first, what is NVMesh Server SAN? It’s basically magical software-defined block storage designed for NVMe.
[image courtesy of Excelero]
Benefits
So what’s so special about this? Well, it provides:
A virtual SAN solution that is optimised for NVMe, and 100% software-based (using the proper hardware, of course);
“Unified NVMe” by pooling NVMe across the network at what appear to the host as local speeds and latencies;
Really, really low CPU usage on the storage target;
Flexibility, and can be deployed as a virtual, distributed non-volatile array and supporting both converged and disaggregated architectures; and
Efficiency, with performance scaling linearly at close to 100% efficiency.
According to Excelero, some of the key NVMesh benefits include:
maximum utilisation of NVMe flash devices by creating a single pool of high performance block storage (they really flog these devices);
no data localisation for scale-out applications;
predictable application performance – no noisy neighbours; and
making storage as efficient as the optimised hardware platform (such as Open19).
What does this mean for your enterprise applications? You get access to:
Higher performance for random IO intensive enterprise applications;
A flexible architecture to support multiple workloads;
lower operating costs through deployment efficiency and easy serviceability; and
all your data is “local” with no application changes. These is a mighty fine trick.
Excelero’s solution also helps with high-performance computing (HPC) environments, offering:
massive performance: high IOPS and bandwidth, low latency;
unlimited scalability, supports analytics for massive data sets; and
lowest cost/IOP.
Excelero Software Components
[image courtesy of Excelero]
The Centralized Management component:
runs as a Node.js application on top of MongoDB;
pools drives, provisions volumes and monitors stuff; and
transforms drives from raw storage into a pool.
It’s built as a scale-out service to support huge deployments and offers some standard integration, including RESTful API access for seamless provisioning. There’s also a client block driver with the kernel module presenting logical volumes via the block driver API.
From a performance perspective, it interacts directly with drives via RDDA or NVMf offering single hop access to the data, minimising latency overhead, and maximising throughput and IOPS. As a result of this you get consistent access to share volumes spread across remote drives anywhere in the DC. The solution offers “RAIN” data protection (cross-node / rack) for standard servers and from a scalability perspective there’s point to point communication with management and targets, simple discovery, and no broadcasts.
Topology Manager:
Performs cluster management to ensure high availability;
Manages volume life cycle and failure recovery operations; and
Uses Raft protocol to ensure data consistency – avoiding “split brain” scenarios
Key Takeaways
Excelero is a “Software-defined block storage solution” – using standard servers with state of the art flash components and leveraging an intuitive management portal;
Excelero offers virtual SAN for NVMe – pooling NVMe over the network at local speeds to maximise utilisation. By making all data local you move the compute, not the data;
Scale-Out Server SAN – scales performance & capacity linearly, across DCs without limits – enabling just-in-time storage orchestration; and
Converged and disaggregated architectures – no noisy neighbours through full logical disaggregation of storage and compute – grow storage or compute independently
Feelings and Further Reading
Excelero came out of stealth mode during Storage Field Day 12. Ray was mighty impressed with what he saw, as was I. I was also mightily impressed with the relatively inexpensive nature of the hardware that they used to demonstrate the solution. Every SDS solution has a reasonably strict hardware compatibility list. In this case, it makes a lot of sense, as Excelero’s patented RDDA technology contributes a lot to the performance and success of the solution. It’s also NVMe over Fabrics ready too, so as this gains traction the requirement for RDDA will potentially fade away.
Super-fast storage solutions based on NVMe are a lot like big data and bad reality TV shows. They’re front and centre in a lot of conversations around the water cooler but a lot of people aren’t exactly sure what they are or what they should make of them. While Dell recently put the bullet through DSSD, it doesn’t mean that the technology or the requirement for this kind of solution doesn’t exist. What it does demonstrate is that these kind of solutions can be had in the data centre for a reasonably inexpensive investment in hardware coupled with some really smart software. Version 1.1 is still raw in places, and it will be a while before we see widespread adoption of these types of solutions in the enterprise data centre (people like to wrap data services around these kinds of solutions). That said, if you have the need for speed right now, it might be a good idea to reach out to the Excelero folks and have a conversation.
You’ll notice my title was a bit misleading, as I don’t have pricing information in this post. Ray did some rough calculations in his article, and you should talk to Excelero to find out more. As an aside, it’s also worth checking out the Storage Field Day presentation for Yuval Bachar‘s presentation on Open19 as well – it was riveting stuff.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.