Cisco MDS, NVMe, and Flexibility

Disclaimer: I recently attended Storage Field Day 20.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Cisco recently presented at Storage Field Day 20. You can see videos of the presentation here, and download my rough notes from here.

 

NVMe, Yeah You Know Me

Non-Volatile Memory Express, known more commonly as NVMe, is a protocol designed for high performance SSD storage access.  In the olden days, we used to associate fibre channel and iSCSI networking options with high performance block storage. Okay, maybe not the 1Gbps iSCSI stuff, but you know what I mean. Time has passed, and the storage networking landscape has changed significantly with the introduction of All-Flash and NVMe. But NVMe’s adoption hasn’t been all smooth sailing. There have been plenty of vendors willing to put drives in storage arrays that support NVMe while doing some translation on the backend that negated the real benefits of NVMe. And, like many new technologies, it’s been a gradual process to get end-to-end NVMe in place, because enterprises, and the vendors that sell to them, only move so fast. Some vendors support NVMe, but only over FC. Others have adopted the protocol to run over RoCEv2. There’s also NVMe-TCP, in case you weren’t confused enough about what you could use. I’m doing a poor job of explaining this, so you should really just head over to Dr J Metz’s article on NVMe for beginners at SNIA.

 

Cisco Are Ready For Anything

As you’ve hopefully started to realise, you’ll see a whole bunch of NVMe implementations available in storage fabrics, along with a large number of enterprises continuing to have conversations about and deploy new storage equipment that uses traditional block fabrics, such as iSCSI or FC or, perish the thought, FCoE. The cool thing about Cisco MDS is that it supports all this crazy and more. If you’re running the latest and greatest NVMe end to end implementation and have some old block-only 8Gbps FC box sitting in the corner they can likely help you with connectivity. The diagram below hopefully demonstrates that point.

[image courtesy of Cisco]

 

Thoughts and Further Reading

Very early in my storage career, I attended a session on MDS at Cisco Networkers Live (when they still ran those types of events in Brisbane). Being fairly new to storage, and running a smallish network of one FC4700 and 8 Unix hosts, I’d tended to focus more on the storage part of the equation rather than the network part of the SAN. Cisco was still relatively new to the storage world at that stage, and it felt a lot like it had adopted a very network-centric view of the storage world. I was a little confused why all the talk was about backplanes and port density, as I was more interested about the optimal RAID configuration for mail server volumes and how I should protect the data being stored on this somewhat sensitive piece of storage. As time went on, I was invariably exposed to larger and larger environments where decisions around core and edge storage networking devices started to become more and more critical to getting optimal performance out of the environment. A lot of the information I was exposed to in that early MDS session started to make more sense (particularly as I was tasked with deploying larger and larger MDS-based fabrics).

Things have obviously changed quite a bit since those heady days of a network upstart making waves in the storage world. We’ve seen increases in network speeds become more and more common in the data centre, and we’re no longer struggling to get as many IOPS as we can out of 5400 RPM PATA drives with an interposer and some slightly weird firmware. What has become apparent, I think, is the importance of the fabric when it comes to getting access to storage resources in a timely fashion, and with the required performance. As enterprises scale up and out, and more and more hosts and applications connect to centralised storage resources, it doesn’t matter how fast those storage resources are if there’s latency in the fabric.

The SAN still has a place in the enterprise, despite was the DAS huggers will tell you, and you can get some great performance out of your SAN if you architect it appropriately. Cisco certainly seems to have an option for pretty much everything when it comes to storage (and network) fabrics. It also has a great story when it comes to fabric visibility, and the scale and performance at the top end of its MDS range is pretty impressive. In my mind, though, the key really is the variety of options available when build a storage network. It’s something that shouldn’t be underestimated given the plethora of options available in the market.

Stellus Is Doing Something With All That Machine Data

Disclaimer: I recently attended Storage Field Day 19.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Stellus Technologies recently came out of stealth mode. I had the opportunity to see the company present at Storage Field Day 19 and thought I’d share my thoughts here. You can grab a copy of my notes here.

 

Company Background

Jeff Treuhaft (CEO) spent a little time discussing the company background and its development up to this point in time.

  • Founded in 2016
  • Data Path architecture developed in 2017
  • Data path validations in 2018
  • First customer deployments in 2019
  • Commercial availability in 2020

 

The Problem

What’s the problem Stellus is trying to solve then? There’s been a huge rise in unstructured data (driven in large part by AI / ML workloads) and an exponential increase in the size of data sources that enterprises are working with. There have also been significant increases in performance requirements for unstructured data. This has been driven primarily by:

  • Life sciences;
  • Media and entertainment; and
  • IoT.

The result is that the storage solutions supporting these workloads need to:

  • Offer scalable, consistent performance;
  • Support common global namespaces;
  • Work with variable file sizes;
  • Deliver high throughput;
  • Ensure that there are no parallel access penalties;
  • Easily manage data over time; and
  • Function as a data system of record.

It’s Stellus’s belief that “[c]urrent competitors have built legacy file systems at the time when spinning disk and building private data centres were the focus”.

 

Stellus Data Platform

Bala Ganeshan (CTO and VP of Engineering) walked the delegates through the Stellus Data Platform.

Design Goals

  • Parallelism
  • Scale
  • Throughput
  • Constant performance
  • Decoupling capacity and performance
  • Independently scale perfromance and capacity on commodity hardware
  • Distributed all, share everything KV based data model data path ready for new memories
  • Consistently high performance even as system scales

File System as Software

  • Stores unstructured data closest to native format: objects
  • Data Services provided on Stellus objects
  • Stateless – state in Key Value Stores
  • User mode enables
    • On-premises
    • Cloud
    • Hybrid
  • Independent from custom hardware and kernel

Don’t currently have deduplication capability built in.

Algorithmic Data Locality and Data Services

  • Enables scale by algorithmically determining location – no cluster-wide maps
  • Built for resilience to multiple failure – pet vs. cattle
  • Understands topology of persistent stores
  • Architecture maintains versions – enables data services such as snapshots

Key-Value-over-NVMe Fabrics

  • Decoupled data services and persistence requires transport
  • Architecture maintains native data structure – objects
  • NVMe-over-Fabric protocol enhanced to transport KV commands
  • Transport independent
    • RDMA
    • TCP/IP

Native Key-Value Stores

  • Unstructured data is generally immutable
  • Updates result in new objects
  • Available in different sizes and performance characteristics
  • We used application-specific KV stores, such as:
    • Immutable data
    • Short-lived updates
    • Metadata

 

Thoughts and Further Reading

Every new company emerging from stealth has a good story to tell. And they all want it to be a memorable one. I think Stellus certainly has a good story to tell in terms of how it’s taking newer technologies to solve more modern storage problems. Not every workload requires massive amounts of scalability at the storage layer. But for those that do, it can be hard to solve that problem with traditional storage architectures. The key-value implementation from Stellus allows it to do some interesting stuff with larger drives, and I can see how this will have appeal as we move towards the use of larger and larger SSDs to store data. Particularly as a large amount of modern storage workloads are leveraging unstructured data.

More and more NVMe-oF solutions are hitting the market now. I think this is a sign that evolving workload requirements are pushing the capabilities of traditional storage solutions. A lot of the data we’re dealing with is coming from machines, not people. It’s not about how I derive value from a spreadsheet. It’s about how I derive value from terabytes of log data from Internet of Things devices. This requires scale – in terms of both capacity and performance. Using key-value over NVMe-oF is an interesting approach to the challenge – one that I’m keen to explore further as Stellus makes its way in the market. In the meantime, check out Chris Evans’s article on Stellus over at Architecting IT.