Intel Optane And The DAOS Storage Engine

Disclaimer: I recently attended Storage Field Day 20.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Intel recently presented at Storage Field Day 20. You can see videos of the presentation here, and download my rough notes from here.

 

Intel Optane Persistent Memory

If you’re a diskslinger, you’ve very likely heard of Intel Optane. You may have even heard of Intel Optane Persistent Memory. It’s a little different to Optane SSD, and Intel describes it as “memory technology that delivers a unique combination of affordable large capacity and support for data persistence”. It looks a lot like DRAM, but the capacity is greater, and there’s data persistence across power losses. This all sounds pretty cool, but isn’t it just another form factor for fast storage? Sort of, but the application of the engineering behind the product is where I think it starts to get really interesting.

 

Enter DAOS

Distributed Asynchronous Object Storage (DAOS) is described by Intel as “an open source software-defined scale-out object store that provides high bandwidth, low latency, and high I/O operations per second (IOPS) storage containers to HPC applications”. It’s ostensibly a software stack built from the ground up to take advantage of the crazy speeds you can achieve with Optane, and at scale. There’s a handy overview of the architecture available on Intel’s website. Traditional object (and other storage systems) haven’t really been built to take advantage of Optane in quite the same way DAOS has.

[image courtesy of Intel]

There are some cool features built into DAOS, including:

  • Ultra-fine grained, low-latency, and true zero-copy I/O
  • Advanced data placement to account for fault domains
  • Software-managed redundancy supporting both replication and erasure code with online rebuild
  • End-to-end (E2E) data integrity
  • Scalable distributed transactions with guaranteed data consistency and automated recovery
  • Dataset snapshot capability
  • Security framework to manage access control to storage pools
  • Software-defined storage management to provision, configure, modify, and monitor storage pools

Exciting? Sure is. There’s also integration with Lustre. The best thing about this is that you can grab it from Github under the Apache 2.0 license.

 

Thoughts And Further Reading

Object storage is in its relative infancy when compared to some of the storage architectures out there. It was designed to be highly scalable and generally does a good job of cheap and deep storage at “web scale”. It’s my opinion that object storage becomes even more interesting as a storage solution when you put a whole bunch of really fast storage media behind it. I’ve seen some media companies do this with great success, and there are a few of the bigger vendors out there starting to push the All-Flash object story. Even then, though, many of the more popular object storage systems aren’t necessarily optimised for products like Intel Optane PMEM. This is what makes DAOS so interesting – the ability for the storage to fundamentally do what it needs to do at massive scale, and have it go as fast as the media will let it go. You don’t need to worry as much about the storage architecture being optimised for the storage it will sit on, because the folks developing it have access to the team that developed the hardware.

The other thing I really like about this project is that it’s open source. This tells me that Intel are both focused on Optane being successful, and also focused on the industry making the most of the hardware it’s putting out there. It’s a smart move – come up with some super fast media, and then give the market as much help as possible to squeeze the most out of it.

You can grab the admin guide from here, and check out the roadmap here. Intel has plans to release a new version every 6 months, and I’m really looking forward to seeing this thing gain traction. For another perspective on DAOS and Intel Optane, check out David Chapa’s article here.

 

 

Intel’s Form Factor Is A Factor

Disclaimer: I recently attended Storage Field Day 17.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

 

The Intel Optane team recently presented at Storage Field Day 17. You can see their videos from Storage Field Day 17 here, and download a PDF copy of my rough notes from here. I urge you to check out the videos, as there was a heck of a lot of stuff in there. But rather than talk about benchmarks and the SPDK, I’m going to focus on what’s happening with Intel’s approach to storage in terms of the form factor.

 

Of Form Factors And Other Matters Of Import

An Abbreviated History Of Drive Form Factors

Let’s start with little bit of history to get you going. IBM introduced the first hard drive – the IBM 350 disk storage unit – in 1956. Over time we’ve gone from a variety of big old drives to smaller form factors. I’m not old enough to reminisce about the Winchester drives, but I do remember the 5.25″ drives in the XT. Wikipedia provides a good a place to start as any if you’re interested in knowing more about hard drives. In any case, we now have the following prevailing form factors in use as hard drive storage:

  • 3.5″ drives – still reasonably common in desktop computers and “cheap and deep” storage systems;
  • 2.5″ drives (SFF) – popular in laptops and used as a “dense” form factor for a variety of server and storage solutions;
  • U.2 – mainstream PCIe SSD form factor that has the same dimensions as 2.5″ drives; and
  • M.2 – designed for laptops and tablets.

Challenges

There are a number of challenges associated with the current drive form factors. The most notable of these is the density issue. Drive (and storage) vendors have been struggling for years to try and cram more and more devices into smaller spaces whilst increasing device capacities as well. This has led to problems with cooling, power, and overall reliability. Basically, there’s only so much you can put in 1RU without the whole lot melting.

 

A Ruler? Wait, what?

Intel’s “Ruler” is a ruler-like, long (or short) drive based on the EDSFF (Enterprise and Datacenter Storage Form Factor) specification. There’s a tech brief you can view here. There are a few different versions (basically long and short), and it still leverages NVMe via PCIe.

[image courtesy of Intel]

It’s Denser

You can cram a lot of these things in a 1RU server, as Super Micro demonstrated a few months ago.

  • Up to 32 E1.L 9.5mm drives per 1RU
  • Up to 48 E1.S drives per 1RU

Which means you could be looking at around a petabyte of raw storage in 1RU (using 32TB E1.L drives). This number is only going to go up as capacities increase. Instead of half a rack of 4TB SSDs, you can do it all in 1RU.

It’s Cooler

Cooling has been a problem for storage systems for some time. A number of storage vendors have found out the hard way that jamming a bunch of drives in a small enclosure has a cost in terms of power and cooling. Intel tell us that they’ve had some (potentially) really good results with the E1.L and E1.S based on testing to date (in comparison to traditional SSDs). They talked about:

  • Up to 2x less airflow needed per E1.L 9.5mm SSD vs. U.2 15mm (based on Intel’s internal simulation results); and
  • Up to 3x less airflow needed per E1.S SSD vs. U.2 7mm.

Still Serviceable

You can also replace these things when they break. Intel say they’re:

  • Fully front serviceable with an integrated pull latch;
  • Support integrated, programmable LEDs; and
  • Support remote, drive specific power cycling.

 

Thoughts And Further Reading

SAN and NAS became popular in the data centre because you could jam a whole bunch of drives in a central location and you weren’t limited by what a single server could support. For some workloads though, having storage decoupled from the server can be problematic either in terms of latency, bandwidth, or both. Some workloads need their storage as close to the processor as possible. Technologies such as NVMe over Fabrics are addressing that issue to an extent, and other vendors are working to bring the compute closer to the storage. But some people just want to do what they do, and they need more and more storage to do it. I think the “ruler” form factor is an interesting approach to the issue traditionally associated with cramming a bunch of capacity in a small space. It’s probably going to be some time before you see this kind of thing in data centres as a matter of course, because it takes a long time to change the way that people design their servers to accommodate new standards. Remember how long it took for SFF drives to become as common in the DC as they are? No? Well it took a while. Server designs are sometimes developed years (or at least months) ahead of their release to the market. That said, I think Intel have come up with a really cool idea here, and if they can address the cooling and capacity issues as well as they say they can, this will likely take off. Of course, the idea of having 1PB of data sitting in 1RU should be at least a little scary in terms of failure domains, but I’m sure someone will work that out. It’s just physics after all, isn’t it?

There’s also an interesting article at The Register on the newer generation of drive form factors that’s worth checking out.

Intel Are Putting Technology To Good Use

Disclaimer: I recently attended Storage Field Day 12.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

 

Here are some notes from Intel‘s presentation at Storage Field Day 12. You can view the video here and download my rough notes here.

 

I/O Can Be Hard Work

With the advent of NVM Express, things go pretty fast nowadays. Or, at least, faster than they used to with those old-timey spinning disks we’ve loved for so long. According to Intel, systems with multiple NVMe SSDs are now capable of performing millions of I/Os per second. Which is great, but it results in many cores of software overhead with a kernel-based interrupt-driven driver model. The answer, according to Intel, is the Storage Performance Development Kit (SPDK). The SPDK enables more CPU cycles for storage services, with lower I/O latency. The great news is that there’s now almost no premium now on capacity to do IOPS with a system. So how does this help in the real world?

 

Real World Applications?

SPDK VM I/O Efficiency

The SPDK offers some excellent performance improvements when dishing up storage to VMs.

  • NVMe ephemeral storage
  • SPDK-based 3rd party storage services

Leverage existing infrastructure for:

  • QEMU vhost-scsi;
  • QEMU/DPDK vhost-net user.

Features and benefits

  • High performance storage virtualisation
  • Reduced VM exit
  • Lower latency
  • Increased VM density
  • Reduced tail latencies
  • Higher throughput

Intel say that Ali Cloud sees ~300% improvement in IOPS and latency using SPDK

 

VM Ephemeral Storage

  • Improves Storage virtualisation
  • Works with KVM/QEMU
  • 6x efficiency vs kernel host
  • 10x efficiency vs QEMU virtuo
  • Increased VM density

 

SPDK and NVMe over Fabrics

SPDK also works a treat with NVMe over Fabrics.

VM Remote Storage

  • Enable disaggregation and migration of VMs using remote storage
  • Improves storage virtualisation and flexibility
  • Works with KVM/QEMU

 

NVMe over Fabrics

NVMe over Fabrics
Feature Benefit
Utilises NVM Express (NVMe) Polled Mode Driver Reduced overhead per NVMe I/O
RDMA Queue Pair Polling No interrupt overhead
Connections pinned to CPU cores No synchronisation overhead

 

NVMe-oF Key Takeaways

  • Preserves the latency-optimised NVMe protocol through network hops
  • Potentially radically efficient, depending on implementation
  • Actually fabric agnostic: InfinBand, RDMA, TCP/IP, FC … all ok!
  • Underlying protocol for existing and emerging technologies
  • Using SPDK, can integrate NVMe and NVMe-oF directly into applications

 

VM I/O Efficiency Key Takeaways

  • Huge improvement in latency for VM workloads
  • Application-level sees 3-4X performance gains
  • Application unmodified: it’s all under the covers
  • Virtuous cycle with VM density
  • Fully compatible with NVMe-oF!

 

Further Reading and Conclusion

Intel said during the presentation that “[p]eople find ways of consuming resources you provide to them”. This is true, and one of the reasons I became interested in storage early in my career. What’s been most interesting about the last few years worth of storage developments (as we’ve moved beyond spinning disks and simple file systems to super fast flash subsystems and massively scaled out object storage systems) is that people are still really only interested in have lots of storage that is fast and reliable. The technologies talked about during this presentation obviously aren’t showing up in consumer products just yet, but it’s an interesting insight into the direction the market is heading. I’m mighty excited about NVMe over Fabrics and looking forward to this technology being widely adopted in the data centre.

If you’ve had the opportunity to watch the video from Storage Field Day 12 (and some other appearances by Intel Storage at Tech Field Day events), you’ll quickly understand that I’ve barely skimmed the surface of what Intel are doing in the storage space, and just how much is going on before your precious bits are hitting the file system / object store / block device. NVMe is the new way of doing things fast, and I think Intel are certainly pioneering the advancement of this technology through real-world applications. This is, after all, the key piece of the puzzle – understanding how to take blazingly fast technology and apply a useful programmatic framework that companies can build upon to deliver useful outcomes.

For another perspective, have a look at Chan’s article here. You also won’t go wrong checking out Glenn’s post here.