Disclaimer: I recently attended Storage Field Day 20. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Non-Volatile Memory Express, known more commonly as NVMe, is a protocol designed for high performance SSD storage access. In the olden days, we used to associate fibre channel and iSCSI networking options with high performance block storage. Okay, maybe not the 1Gbps iSCSI stuff, but you know what I mean. Time has passed, and the storage networking landscape has changed significantly with the introduction of All-Flash and NVMe. But NVMe’s adoption hasn’t been all smooth sailing. There have been plenty of vendors willing to put drives in storage arrays that support NVMe while doing some translation on the backend that negated the real benefits of NVMe. And, like many new technologies, it’s been a gradual process to get end-to-end NVMe in place, because enterprises, and the vendors that sell to them, only move so fast. Some vendors support NVMe, but only over FC. Others have adopted the protocol to run over RoCEv2. There’s also NVMe-TCP, in case you weren’t confused enough about what you could use. I’m doing a poor job of explaining this, so you should really just head over to Dr J Metz’s article on NVMe for beginners at SNIA.
Cisco Are Ready For Anything
As you’ve hopefully started to realise, you’ll see a whole bunch of NVMe implementations available in storage fabrics, along with a large number of enterprises continuing to have conversations about and deploy new storage equipment that uses traditional block fabrics, such as iSCSI or FC or, perish the thought, FCoE. The cool thing about Cisco MDS is that it supports all this crazy and more. If you’re running the latest and greatest NVMe end to end implementation and have some old block-only 8Gbps FC box sitting in the corner they can likely help you with connectivity. The diagram below hopefully demonstrates that point.
[image courtesy of Cisco]
Thoughts and Further Reading
Very early in my storage career, I attended a session on MDS at Cisco Networkers Live (when they still ran those types of events in Brisbane). Being fairly new to storage, and running a smallish network of one FC4700 and 8 Unix hosts, I’d tended to focus more on the storage part of the equation rather than the network part of the SAN. Cisco was still relatively new to the storage world at that stage, and it felt a lot like it had adopted a very network-centric view of the storage world. I was a little confused why all the talk was about backplanes and port density, as I was more interested about the optimal RAID configuration for mail server volumes and how I should protect the data being stored on this somewhat sensitive piece of storage. As time went on, I was invariably exposed to larger and larger environments where decisions around core and edge storage networking devices started to become more and more critical to getting optimal performance out of the environment. A lot of the information I was exposed to in that early MDS session started to make more sense (particularly as I was tasked with deploying larger and larger MDS-based fabrics).
Things have obviously changed quite a bit since those heady days of a network upstart making waves in the storage world. We’ve seen increases in network speeds become more and more common in the data centre, and we’re no longer struggling to get as many IOPS as we can out of 5400 RPM PATA drives with an interposer and some slightly weird firmware. What has become apparent, I think, is the importance of the fabric when it comes to getting access to storage resources in a timely fashion, and with the required performance. As enterprises scale up and out, and more and more hosts and applications connect to centralised storage resources, it doesn’t matter how fast those storage resources are if there’s latency in the fabric.
The SAN still has a place in the enterprise, despite was the DAS huggers will tell you, and you can get some great performance out of your SAN if you architect it appropriately. Cisco certainly seems to have an option for pretty much everything when it comes to storage (and network) fabrics. It also has a great story when it comes to fabric visibility, and the scale and performance at the top end of its MDS range is pretty impressive. In my mind, though, the key really is the variety of options available when build a storage network. It’s something that shouldn’t be underestimated given the plethora of options available in the market.
There are three key features that have been added to NVMesh.
MeshConnect – adding support for traditional network technologies TCP/IP and Fibre Channel, giving NVMesh the widest selection of supported protocols and fabrics of software-defined storage platforms along with already supported InfiniBand, RoCE v2, RDMA and NVMe-oF.
MeshProtect – offering flexible protection levels for differing application needs, including mirrored and parity-based redundancy.
MeshInspect – with performance analytics for pinpointing anomalies quickly and at scale.
Excelero have said that NVMesh delivers “shared NVMe at local performance and 90+% storage efficiency that helps further drive down the cost per GB”.
There’s also a range of protection options available now. Excelero tell me that you can start at level 0 (no protection, lowest latency) all the way to “MeshProtect 10+2 (distributed dual parity)”. This allows customers to “choose their preferred level of performance and protection. [While] Distributing data redundancy services eliminates the storage controller bottleneck.”
One of my favourite things about NVMesh 2 is the MeshInspect feature, with a “built-in statistical collection and display, stored in a scalable NoSQL database”.
[image courtesy of Excelero]
Thoughts And Further Reading
Excelero emerged form stealth mode at Storage Field Day 12. I was impressed with their offering back then, and they continue to add features while focussing on delivering top notch performance via a software-only solution. It feels like there’s a lot of attention on NVMe-based storage solutions, and with good reason. These things can go really, really fast. There are a bunch of startups with an NVMe story, and the bigger players are all delivering variations on these solutions as well.
Excelero seem well placed to capitalise on this market interest, and their decision to focus on a software-only play seems wise, particularly given that some of the standards, such as NVMe over TCP, haven’t been fully ratified yet. This approach will also appeal to the aspirational hyperscalers, because they can build their own storage solution, source their own devices, and still benefit from a fast software stack that can deliver performance in spades. Excelero also supports a wide range of transports now, with the addition of NVMe over FC and TCP support.
NVMesh 2 looks to be smoothing some of the rougher edges that were present with version 1, and I’m pumped to see the focus on enhanced visibility via MeshInspect. In my opinion these kinds of tools are critical to the uptake of solutions such as NVMesh in both the enterprise and cloud markets. The broadening of the connectivity story, as well as the enhanced resiliency options, make this something worth investigating. If you’d like to read more, you can access a white paper here (registration required).
Disclaimer: I recently attended Storage Field Day 17. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
They maintain a strict focus on the SMB and Enterprise ROBO markets, and aren’t looking to be the next big thing in the enterprise any time soon.
So What’s All This About NVMe [over Fabrics]?
According to Max and the team, NVMe over Fabrics is “the next big thing in [network] storage”. Here’s a photo of Max saying just that.
Why Hate SAS?
It’s not that people hate SAS, it’s just that the SAS protocol was designed for disk, and NVMe was designed for Flash devices.
SAS (iSCSI / iSER)
NVMe [over Fabrics]
Complex driver built around archaic SCSI
Simple driver built around block device (R/W)
Single short queue per controller
One device = one controller, no bottlenecks
Single short queue per device
Many long queues per device
Serialised access, locks
Non-serialised access, no locks
Many-to-Many, true Point-to-Point
You Do You, Boo
StarWind have developed their own NVMe SPDK for Windows Server (as Intel doesn’t currently provide one). In early development they had some problems with high CPU overheads. CPU might be a “cheap resource”, but you still don’t want to use up 8 cores dishing out IO for a single device. They’ve managed to move a lot of the work to user space and cut down on core consumption. They’ve also built their own Linux (CentOS) based initiator for NVMe over Fabrics. They’ve developed a NVMe-oF initiator for Windows by combining a Linux initiator and stub driver in the hypervisor. “We found the elegant way to bring missing SPDK functionality to Windows Server: Run it in a VM with proper OS! First benefit – CPU is used more efficiently”. They’re looking to do something similar with ESXi in the very near future.
Thoughts And Further Reading
I like to think of StarWind as the little company from the Ukraine that can. They have a long, rich heritage in developing novel solutions to everyday storage problems in the data centre. They’re not necessarily trying to take over the world, but they’ve demonstrated before that they have an ability to deliver solutions that are unique (and sometimes pioneering) in the marketplace. They’ve spent a lot of time developing software storage solutions over the years, so it makes sense that they’d be interested to see what they could do with the latest storage protocols and devices. And if you’ve ever met Max and Anton (and the rest of their team), it makes even more sense that they wouldn’t necessarily wait around for Intel to release a Windows-based SPDK to see what type of performance they could get out of these fancy new Flash devices.
All of the big storage companies are coming out with various NVMe-based products, and a number are delivering NVMe over Fabrics solutions as well. There’s a whole lot of legacy storage that continues to dominate the enterprise and SMB storage markets, but I think it’s clear from presentations such as StarWind’s that the future is going to look a lot different in terms of the performance available to applications (both at the core and edge).
You can check out this primer on NVMe over Fabrics here, and the ratified 1.0a specification can be viewed here. Ray Lucchesi, as usual, does a much better job than I do of explaining things, and shares his thoughts here.
I recently had the opportunity to hear about Pavilion Data Systems from VR Satish, CTO, and Jeff Sosa, VP of Products. I thought I’d put together a brief overview of their offering, as NVMe-based systems are currently the new hotness in the storage world.
It’s a Box!
And a pretty cool looking one at that. Here’s what it looks like from the front.
[image courtesy of Pavilion Data]
The storage platform is built from standard components, including x86 processors and U.2 NVMe SSDs. A big selling point, in Pavilion’s opinion, is that there are no custom ASICs and no FPGAs in the box. There are three different models available (the datasheet is here), with different connectivity and capacity options.
From a capacity perspective, you can start at 14TB and get all the way to 1PB in 4RU. The box can start at 18 NVMe drives and (growing by increments of 18) goes to 72 drives. It runs RAID 6 and presents the drives as virtual volumes to the hosts. Here’s a look at the box from a top-down perspective.
[image courtesy of Pavilion Data]
There’s a list of supported NVMe SSDs that you can use with the box, if you wanted to source those elsewhere. On the right hand side (the back of the box) are the IO controllers. You can start at 4 and go up to 20 in a box. There’s also 2 management modules and 4 power supplies for resiliency.
[image courtesy of Pavilion Data]
You can see in the above diagram that connectivity is also a big part of the story, with each pair of controllers offering 4x 100GbE ports.
Sure. It’s a box but it needs something to run it. Each controller runs a customised flavour of Linux and delivers a number of the features you’d expect from a storage array, including:
Active-active controller support
Space-efficient snapshots and clones
There’re also plans afoot for encryption support in the near future. Pavilion have also focused on making operations simple, providing support for RESTful API orchestration, OpenStack Cinder, Kubernetes, DMTF RedFish and SNIA Swordfish. They’ve also gone to some lengths to ensure that standard NVMe/F drivers will work for host connectivity.
Thoughts and Further Reading
Pavilion Data has been around since 2014 and the leadership group has some great heritage in the storage and networking industry. They tell me they wanted to move away from the traditional approach to storage arrays (the dual controller, server-based platform) to something that delivered great performance at scale. There are similarities more with high performance networking devices than high performance storage arrays, and this is by design. They tell me they really wanted to deliver a solution that wasn’t the bottleneck when it came to realising the performance capabilities of the NVMe architecture. The numbers being punted around are certainly impressive. And I’m a big fan of the approach, in terms of both throughput and footprint.
The webscale folks running apps like MySQL and Cassandra and MongoDB (and other products with similarly awful names) are doing a few things differently to the enterprise bods. Firstly, they’re more likely to wear jeans and sneakers to the office (something that drives me nuts) and they’re leveraging DAS heavily because it gives them high performance storage options for latency-sensitive situations. The advent of NVMe and NVMe over Fabrics takes away the requirement for DAS (although I’m not sure they’ll start to wear proper office attire any time soon) by delivering storage at the scale and performance they need. As a result of this, you can buy 1RU servers with compute instead of 2RU servers full of fast disk. There’s an added benefit as organisations tend to assign longer lifecycles to their storage systems, so systems like the one from Pavilion are going to have a place in the DC for five years, not 2.5 – 3 years. Suddenly lifecycling your hosts becomes simpler as well. This is good news for the jeans and t-shirt set and the beancounters alike.
NVMe (and NVMe over Fabrics) has been a hot topic for a little while now, and you’re only going to hear more about it. Those bright minds at Gartner are calling it “Shared Accelerated Storage” and you know if they’re talking about it then the enterprise folks will cotton on in a few years and suddenly it will be everywhere. In the meantime, check out Chris M. Evans’ article on NVMe over Fabrics and Chris Mellor also did an interesting piece at El Reg. The market is becoming more crowded each month and I’m interested to see how Pavilion fare.
Disclaimer: I recently attended Dell Technologies World 2018. My flights, accommodation and conference pass were paid for by Dell Technologies via the Press, Analysts and Influencers program. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
[image courtesy of Dell EMC]
Dell EMC today announced PowerMax. Described as the next generation of VMAX, it’s been designed from the ground up to support NVMe. It’s being pitched as suitable for both traditional applications, such as:
Relational Databases; and
And “next generation applications”, such as:
Real time analytics;
From a performance perspective, Dell EMC tell me this thing can do 10M IOPS. It’s also been benchmarked delivering 25% better response time using NVMe Flash (compared to a VMAX AF using SAS Flash) and 50% better response time using NVMe SCM (Storage Class Memory). They also say that can get 150GB/s out of a single system.
End to End NVMe
NVMe over Fabric Ready (soon)
NVMe based drives (dual ported) – Flash and SCM (*soon)
NVMe-based Disk Array Enclosure
Industry standard technology
[Image courtesy of Dell EMC]
Scalability and Density
Starts small, and scales up and out.
Capacity starts at 13TB (effective)
As small as 10U
Scales from 1 Brick
Scales to 8 Bricks
4PB (effective) per system
[Image courtesy of Dell EMC]
From a storage efficiency perspective, a number of features you’d hope for are there:
Inline dedupe and compression – 5:1 data reduction across the PowerMax
No performance impact
Works with all data services enabled
Can be turned on or off by application
There are two different models: the PowerMax 2000 and PowerMax 8000.
1.7M IOPS (RRH-8K)
1PB effective Capacity
1 to 2 PowerBricks
10M IOPS (RRH-8K)
4PB effective Capacity
1 to 8 PowerBricks
PowerMax Software comes in two editions:
The Pro edition gives you all of the above and
The PowerMax is available from May 7, 2018.
Dell EMC tell me the VMAX 250 and 950 series aren’t going away any time soon, but there will be tools made available to migrate from those platforms if you decide to put some PowerMax on the floor. PowerMax is an interesting platform with a lot of potential, hype around the quoted performance numbers notwithstanding. It seems like it takes a lot of floor tiles compared to some other NVMe-based alternatives, although this may be down to the scale of the platform. It stands to reason that the kind of folks interested in this offering are the same ones that were interested in VMAX All Flash. I’d be curious to see what the compatibility matrices look like for the existing VMAX tools when compared to the PowerMax, although I do imagine that they’d be a bit more careful about this then they have been with the midrange products.
You can read the press release from Dell EMC here, and there’s a blog post on PowerMax here.
I recently had the opportunity to have a call with Julie Herd about what E8 Storage have been up to and thought I’d share my thoughts here. I’ll admit it was a very quick chat because the announcement needed little explanation, but it’s sometimes the simple things that are worth noting.
E8 are positioning this primarily as a response to requirements from HPC customers. While some people think IB is dead, there has been a big investment in the technology in HPC environments, and this allows E8 to get into that market without upsetting the apple cart too much. They’re certainly delivering the kind of storage performance that HPC folks would be interested in, so this seems like a sensible solution. They tell me there’s no difference in terms of latency or performance between the IB and RoCE offerings, and it’s really just about a common transport for those users that need it. The cool thing about E8, of course, is that there’s also a software-only version of their offering available, in case you have a particular tin vendor that you’d like to build your super fast NVMe/F storage platform on. You can read the full announcement here.
E8 recently announced the launch of its new E8 Storage Software only product offering for a selected range of pre-qualified servers from leading vendors such as Dell, HP and Lenovo. Built with “standard” components, such as RoCE and standard 2.5″ NVMe SSDs, the E8 Storage Software connects up to 96 host servers to each E8 Storage controller, each linked concurrently to shared storage to deliver unprecedented petabyte scalability.
Thoughts and Further Reading
It’s been interesting to see a number of high-performance storage offerings released as software-only plays. While this announcement isn’t exactly the same as Kaminario’s recent exit from the hardware market, it does mark a significant shift in the underlying technology driving modern software solutions. In the past, storage consumers were beholden to their vendor’s choice of silicon (and associated supply chain). Nowadays, however, we’re seeing a lot more focus on the value that can be delivered beyond the hardware. It no longer matters as much to your storage vendor what the bezel looks like. What is important is that their code is driving your storage solution.
The benefit of this is that you can now, potentially, enjoy an improved consumer experience by leveraging your existing server purchasing arrangements (assuming you’re still in the business of buying tin) to extend your storage footprint as well. The downside is that you’re still possibly going to be a bit limited by the range of hardware supported by software vendors. This is going to be heavily influenced by what hardware the software vendors can get into their integration lab to test, and sometimes you’ll find (much like the early days of VMware ESX) that the latest revision of your favourite bit of kit may not always be supported by your preferred storage provider.
In any case, E8, according to Herd, don’t expect to flip their business model on its head. Instead, they foresee their hardware business continuing along as before, with this software-only offering merely augmenting their capability for customers who are into that kind of thing. You can’t run software without hardware, but it’s nice to be able to choose which hardware you want to deploy. I’m all for it, and I’m looking forward to seeing what else E8 have up their sleeve in the next little while.
Disclaimer: I recently attended VMworld 2017 – US. My flights were paid for by ActualTech Media, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
You can view the video of Kingston‘s presentation at Tech Field Day Extra VMworld US 2017 here, and download a PDF copy of my rough notes from here.
It’s A Protocol, Not Media
NVMe has been around for a few years now, and some people get it confused for a new kind of media that they plug into their servers. But it’s not really, it’s just a standard specification for accessing Flash media via the PCI Express bus. There’re a bunch of reasons why you might choose to use NVMe instead of SAS, including lower latency and less CPU overhead. My favourite thing about it though is the plethora of form factors available to use. Kingston touched on these in their presentation at Tech Field Day Extra recently. You can get them in half-height, half-length (HHHL) add-in cards (AIC), U.2 (2.5″) and M.2 sizes. To give you an idea of the use cases for each of these, Kingston suggested the following applications:
HHHL (AIC) card
Server / DC applications
Direct-attached, server backplane, just a bunch of flash (JBOF)
White box and OEM-branded
Notebooks, desktops, workstations
It’s Pretty Fast
NVMe has proven to be pretty fast, and a number of companies are starting to develop products that leverage the protocol in an extremely efficient manner. Coupled with the rise of NVMe/F solutions and you’ve got some pretty cool stuff coming to market. The price is also becoming a lot more reasonable, with Kingston telling us that their DCP1000 NVMe HHHL comes in at around “$0.85 – $0.90 per GB at the moment”. It’s obviously not as cheap as things that spin at 7200RPM but the speed is mighty fine. Kingston also noted that the 2.5″ form factor would be hanging around for some time yet, as customers appreciated the serviceability of the form factor.
Flash media has been slowly but surely taking over the world for a little while now. The cost per GB is reducing (slowly, but surely), and the range of form factors means there’s something for everyone’s needs. Protocol advancements such as NVMe make things even easier, particularly at the high end of town. It’s also been interesting to see these “high end” solutions trickle down to affordable form factors such as PCIe add-in cards. With the relative ubiquity of operating system driver support, NVMe has become super accessible. The interesting thing to watch now is how we effectively leverage these advancements in protocol technologies. Will we use them to make interesting advances in platforms and data access? Or will we keep using the same software architectures we fell in love with 15 years ago (albeit with dramatically improved performance specifications)?
Conclusion and Further Reading
I’ll admit it took me a little while to come up with something to write about after the Kingston presentation. Not because I don’t like them or didn’t find their content interesting. Rather, I felt like I was heading down the path of delivering another corporate backgrounder coupled with speeds and feeds and I know they have better qualified people to deliver that messaging to you (if that’s what you’re into). Kingston do a whole range of memory-related products across a variety of focus areas. That’s all well and good but you probably already knew that. Instead, I thought I could focus a little on the magic behind the magic. The Flash era of storage has been absolutely fascinating to witness, and I think it’s only going to get more interesting over the next few years. If you’re into this kind of thing but need a more comprehensive primer on NVMe, I recommend you check out J Metz’s article on the Cisco blog. It’s a cracking yarn and enlightening to boot. Data Centre Journal also provide a thorough overview here.
Disclaimer: I recently attended Storage Field Day 12. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Here are some notes from Intel‘s presentation at Storage Field Day 12. You can view the video here and download my rough notes here.
I/O Can Be Hard Work
With the advent of NVM Express, things go pretty fast nowadays. Or, at least, faster than they used to with those old-timey spinning disks we’ve loved for so long. According to Intel, systems with multiple NVMe SSDs are now capable of performing millions of I/Os per second. Which is great, but it results in many cores of software overhead with a kernel-based interrupt-driven driver model. The answer, according to Intel, is the Storage Performance Development Kit (SPDK). The SPDK enables more CPU cycles for storage services, with lower I/O latency. The great news is that there’s now almost no premium now on capacity to do IOPS with a system. So how does this help in the real world?
Real World Applications?
SPDK VM I/O Efficiency
The SPDK offers some excellent performance improvements when dishing up storage to VMs.
Enable disaggregation and migration of VMs using remote storage
Improves storage virtualisation and flexibility
Works with KVM/QEMU
NVMe over Fabrics
NVMe over Fabrics
Utilises NVM Express (NVMe) Polled Mode Driver
Reduced overhead per NVMe I/O
RDMA Queue Pair Polling
No interrupt overhead
Connections pinned to CPU cores
No synchronisation overhead
NVMe-oF Key Takeaways
Preserves the latency-optimised NVMe protocol through network hops
Potentially radically efficient, depending on implementation
Actually fabric agnostic: InfinBand, RDMA, TCP/IP, FC … all ok!
Underlying protocol for existing and emerging technologies
Using SPDK, can integrate NVMe and NVMe-oF directly into applications
VM I/O Efficiency Key Takeaways
Huge improvement in latency for VM workloads
Application-level sees 3-4X performance gains
Application unmodified: it’s all under the covers
Virtuous cycle with VM density
Fully compatible with NVMe-oF!
Further Reading and Conclusion
Intel said during the presentation that “[p]eople find ways of consuming resources you provide to them”. This is true, and one of the reasons I became interested in storage early in my career. What’s been most interesting about the last few years worth of storage developments (as we’ve moved beyond spinning disks and simple file systems to super fast flash subsystems and massively scaled out object storage systems) is that people are still really only interested in have lots of storage that is fast and reliable. The technologies talked about during this presentation obviously aren’t showing up in consumer products just yet, but it’s an interesting insight into the direction the market is heading. I’m mighty excited about NVMe over Fabrics and looking forward to this technology being widely adopted in the data centre.
If you’ve had the opportunity to watch the video from Storage Field Day 12 (and some other appearances by Intel Storage at Tech Field Day events), you’ll quickly understand that I’ve barely skimmed the surface of what Intel are doing in the storage space, and just how much is going on before your precious bits are hitting the file system / object store / block device. NVMe is the new way of doing things fast, and I think Intel are certainly pioneering the advancement of this technology through real-world applications. This is, after all, the key piece of the puzzle – understanding how to take blazingly fast technology and apply a useful programmatic framework that companies can build upon to deliver useful outcomes.
For another perspective, have a look at Chan’s article here. You also won’t go wrong checking out Glenn’s post here.