Intel – It’s About Getting The Right Kind Of Fast At The Edge

Disclaimer: I recently attended Storage Field Day 22.  Some expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Intel recently presented at Storage Field Day 22. You can see videos of the presentation here, and download my rough notes from here.

 

The Problem

A lot of countries have used lockdowns as a way to combat the community transmission of COVID-19. Apparently, this has led to an uptick in the consumption of streaming media services. If you’re somewhat familiar with streaming media services, you’ll understand that your favourite episode of Hogan’s Heroes isn’t being delivered from a giant storage device sitting in the bowels of your streaming media provider’s data centre. Instead, it’s invariably being delivered to your device from a content delivery network (CDN) device.

 

Content Delivery What?

CDNs are not a new concept. The idea is that you have a bunch of web servers geographically distributed delivering content to users who are also geographically distributed. Think of it as a way to cache things closer to your end users. There are many reasons why this can be a good idea. Your content will load faster for users if it resides on servers in roughly the same area as them. Your bandwidth costs are generally a bit cheaper, as you’re not transmitting as much data from your core all the way out to the end user. Instead, those end users are getting the content from something close to them. You can potentially also deliver more versions of content (in terms of resolution) easily. It can also be beneficial in terms of resiliency and availability – an outage on one part of your network, say in Palo Alto, doesn’t need to necessarily impact end users living in Sydney. Cloudflare does a fair bit with CDNs, and there’s a great overview of the technology here.

 

Isn’t All Content Delivery The Same?

Not really. As Intel covered in its Storage Field Day presentation, there are some differences with the performance requirements of video on demand and live-linear streaming CDN solutions.

Live-Linear Edge Cache

Live-linear video streaming is similar to the broadcast model used in television. It’s basically programming content streamed 24/7, rather than stuff that the user has to search for. Several minutes of content are typically cached to accommodate out-of-sync users and pause / rewind activities. You can read a good explanation of live-linear streaming here.

[image courtesy of Intel]

In the example above, Intel Optane PMem was used to address the needs of live-linear streaming.

  • Live-linear workloads consume a lot of memory capacity to maintain a short-lived video buffer.
  • Intel Optane PMem is less expensive than DRAM.
  • Intel Optane PMem has extremely high endurance, to handle frequent overwrite.
  • Flexible deployment options – Memory Mode or App-Direct, consuming zero drive slots.

With this solution they were able to achieve better channel and stream density per server than with DRAM-based solutions.

Video on Demand (VoD)

VoD providers typically offer a large library of content allowing users to view it at any time (e.g. Netflix and Disney+). VoD servers are a little different to live-linear streaming CDNs. They:

  • Typically require large capacity and drive fanout for performance / failure domains; and
  • Have a read-intensive workload, with typically large IOs.

[image courtesy of Intel]

 

Thoughts and Further Reading

I first encountered the magic of CDNs years ago when working in a data centre that hosted some Akamai infrastructure. Windows Server updates were super zippy, and it actually saved me from having to spend a lot of time standing in the cold aisle. Fast forward about 15 years, and CDNs are being used for all kinds of content delivery on the web. With whatever the heck this is is in terms of the new normal, folks are putting more and more strain on those CDNs by streaming high-quality, high-bandwidth TV and movie titles into their homes (except in backwards places like Australia). As a result, content providers are constantly searching for ways to tweak the throughput of these CDNs to serve more and more customers, and deliver more bandwidth to those users.

I’ve barely skimmed the surface of how CDNs help providers deliver content more effectively to end users. What I did find interesting about this presentation was that it reinforced the idea that different workloads require different infrastructure solutions to deliver the right outcomes. It sounds simple when I say it like this, but I guess I’ve thought about streaming video CDNs as being roughly the same all over the place. Clearly they aren’t, and it’s not just a matter of jamming some SSDs in one RU servers and hoping that your content will be delivered faster to punters. It’s important to understand that Intel Optane PMem and Intel Optane 3D NAND can give you different results depending on what you’re trying to do, with PMem arguably giving you better value for money (per GB) than DRAM. There are some great papers on this topic available on the Intel website. You can read more here and here.

Intel Optane – Challenges and Triumphs

Disclaimer: I recently attended Storage Field Day 21.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Intel recently presented at Storage Field Day 21. You can see videos of the presentation here, and download my rough notes from here.

 

Alive and Kicking

Kristie Mann, Sr. Director Products, Intel Optane Group, kicked off the session by telling us that “Intel Optane is alive and well”. I don’t think anyone thought it was really going away, particularly given the effort that folks inside Intel have gone to to get this product to market. But from a consumer perspective, it’s potentially been a little confusing.

[image courtesy of Intel]

In terms of data centre penetration, it’s been a different story, and taking Optane from idea to reality has been quite a journey. It was also noted that the “strong uptake of PMem in HPC was unexpected”, but no doubt welcome.

 

Learnings

Some of the other learnings that were covered as part of the session were as follows.

Software Really Matters

It’s one thing to come out with some super cool hardware that is absolutely amazing, but it’s quite another to get software support for that hardware. Unfortunately, the hardware doesn’t give you much without the software, no matter how well it performs. While this has been something of a challenge for Optane until recent times, there’s definitely been more noise from the big ISVs about enhanced Optane support.

IaaS Adoption

Adoption in IaaS has not been great, mainly due to some uneven performance. This will only improve as the software support matures. But the IaaS market can be tough for a bunch of reasons. IaaS vendors are trying to do a things at a certain price point. That doesn’t mean that they’re still making people run VMs on spinning disk (hopefully), but rolling out All-Flash support for platforms is something that’s only going to be done when the $/GB makes sense for the providers. You also might have seen in the field that IaaS providers are pretty particular about performance and quality of service. It makes sense when you’re trying to host a whole bunch of different workloads at large scale. So it makes sense that they’d be somewhat cautious about launching new media types on their platforms without running through a whole bunch of performance and integration testing. I’m not saying they’re not going to get there, they just may not be the first cabs off the rank.

Can you spell OEM?

OEM qualifications have been slow to date with Optane. This is key to getting the product out there. Enterprise folks don’t like to buy things until their favourite Tier 1 vendors are offering it as a default option in their server / storage array / fabric switch. If Dell has the Optane Inside sticker (not a real thing, but you know what I mean), the infrastructure architects inside large government entities are more likely to get on board.

Battling The Status Quo

Status quo thinking makes it hard to understand this isn’t just memory or storage. This has been something of a problem for Intel since Optane became a thing. I’m still having conversations with people and running up against significant confusion about the difference between PMem and Optane SSD. I think that’s going to improve as time goes on, but it can make things difficult when it comes to broad market penetration.

Thoughts and Further Reading

I don’t want people reading this to think that I’m down on Intel and what it’s done with Optane. If anything, I’m really into it. I enjoyed the presentation at Storage Field Day 21 tremendously, and not just because my friend Howard was on the panel repping VAST Data. It’s unusual that a vendor as big as Intel would be so frank about some of the challenges that it’s faced with getting new media to market. But I think it’s the willingness to share some of this information that demonstrates how committed Intel is to Optane moving forward. I was lucky enough to speak to Intel Senior Fellow Al Fazio about the Optane journey, and it was clear that there’s a whole lot of innovation and sweat that goes into making a product like this work.

Some folks think that these panel presentations are marketing disguised as a presentation. Invariably, the reference customers are friendly with the company, and you’ll only ever hear good stories. But I think those stories from those customers are still extremely powerful. After all, having a customer jump on a session to tell the world about how good your product has been means you’ve done something right. As a consumer of these products, I find these kind of testimonials invaluable. Ultimately, products are successful in the market when they serve the market’s needs. From what I can see, Intel Optane is on its way to meeting those needs, and it has a bright future.

Intel Optane And The DAOS Storage Engine

Disclaimer: I recently attended Storage Field Day 20.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Intel recently presented at Storage Field Day 20. You can see videos of the presentation here, and download my rough notes from here.

 

Intel Optane Persistent Memory

If you’re a diskslinger, you’ve very likely heard of Intel Optane. You may have even heard of Intel Optane Persistent Memory. It’s a little different to Optane SSD, and Intel describes it as “memory technology that delivers a unique combination of affordable large capacity and support for data persistence”. It looks a lot like DRAM, but the capacity is greater, and there’s data persistence across power losses. This all sounds pretty cool, but isn’t it just another form factor for fast storage? Sort of, but the application of the engineering behind the product is where I think it starts to get really interesting.

 

Enter DAOS

Distributed Asynchronous Object Storage (DAOS) is described by Intel as “an open source software-defined scale-out object store that provides high bandwidth, low latency, and high I/O operations per second (IOPS) storage containers to HPC applications”. It’s ostensibly a software stack built from the ground up to take advantage of the crazy speeds you can achieve with Optane, and at scale. There’s a handy overview of the architecture available on Intel’s website. Traditional object (and other storage systems) haven’t really been built to take advantage of Optane in quite the same way DAOS has.

[image courtesy of Intel]

There are some cool features built into DAOS, including:

  • Ultra-fine grained, low-latency, and true zero-copy I/O
  • Advanced data placement to account for fault domains
  • Software-managed redundancy supporting both replication and erasure code with online rebuild
  • End-to-end (E2E) data integrity
  • Scalable distributed transactions with guaranteed data consistency and automated recovery
  • Dataset snapshot capability
  • Security framework to manage access control to storage pools
  • Software-defined storage management to provision, configure, modify, and monitor storage pools

Exciting? Sure is. There’s also integration with Lustre. The best thing about this is that you can grab it from Github under the Apache 2.0 license.

 

Thoughts And Further Reading

Object storage is in its relative infancy when compared to some of the storage architectures out there. It was designed to be highly scalable and generally does a good job of cheap and deep storage at “web scale”. It’s my opinion that object storage becomes even more interesting as a storage solution when you put a whole bunch of really fast storage media behind it. I’ve seen some media companies do this with great success, and there are a few of the bigger vendors out there starting to push the All-Flash object story. Even then, though, many of the more popular object storage systems aren’t necessarily optimised for products like Intel Optane PMEM. This is what makes DAOS so interesting – the ability for the storage to fundamentally do what it needs to do at massive scale, and have it go as fast as the media will let it go. You don’t need to worry as much about the storage architecture being optimised for the storage it will sit on, because the folks developing it have access to the team that developed the hardware.

The other thing I really like about this project is that it’s open source. This tells me that Intel are both focused on Optane being successful, and also focused on the industry making the most of the hardware it’s putting out there. It’s a smart move – come up with some super fast media, and then give the market as much help as possible to squeeze the most out of it.

You can grab the admin guide from here, and check out the roadmap here. Intel has plans to release a new version every 6 months, and I’m really looking forward to seeing this thing gain traction. For another perspective on DAOS and Intel Optane, check out David Chapa’s article here.

 

 

Intel’s Form Factor Is A Factor

Disclaimer: I recently attended Storage Field Day 17.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

 

The Intel Optane team recently presented at Storage Field Day 17. You can see their videos from Storage Field Day 17 here, and download a PDF copy of my rough notes from here. I urge you to check out the videos, as there was a heck of a lot of stuff in there. But rather than talk about benchmarks and the SPDK, I’m going to focus on what’s happening with Intel’s approach to storage in terms of the form factor.

 

Of Form Factors And Other Matters Of Import

An Abbreviated History Of Drive Form Factors

Let’s start with little bit of history to get you going. IBM introduced the first hard drive – the IBM 350 disk storage unit – in 1956. Over time we’ve gone from a variety of big old drives to smaller form factors. I’m not old enough to reminisce about the Winchester drives, but I do remember the 5.25″ drives in the XT. Wikipedia provides a good a place to start as any if you’re interested in knowing more about hard drives. In any case, we now have the following prevailing form factors in use as hard drive storage:

  • 3.5″ drives – still reasonably common in desktop computers and “cheap and deep” storage systems;
  • 2.5″ drives (SFF) – popular in laptops and used as a “dense” form factor for a variety of server and storage solutions;
  • U.2 – mainstream PCIe SSD form factor that has the same dimensions as 2.5″ drives; and
  • M.2 – designed for laptops and tablets.

Challenges

There are a number of challenges associated with the current drive form factors. The most notable of these is the density issue. Drive (and storage) vendors have been struggling for years to try and cram more and more devices into smaller spaces whilst increasing device capacities as well. This has led to problems with cooling, power, and overall reliability. Basically, there’s only so much you can put in 1RU without the whole lot melting.

 

A Ruler? Wait, what?

Intel’s “Ruler” is a ruler-like, long (or short) drive based on the EDSFF (Enterprise and Datacenter Storage Form Factor) specification. There’s a tech brief you can view here. There are a few different versions (basically long and short), and it still leverages NVMe via PCIe.

[image courtesy of Intel]

It’s Denser

You can cram a lot of these things in a 1RU server, as Super Micro demonstrated a few months ago.

  • Up to 32 E1.L 9.5mm drives per 1RU
  • Up to 48 E1.S drives per 1RU

Which means you could be looking at around a petabyte of raw storage in 1RU (using 32TB E1.L drives). This number is only going to go up as capacities increase. Instead of half a rack of 4TB SSDs, you can do it all in 1RU.

It’s Cooler

Cooling has been a problem for storage systems for some time. A number of storage vendors have found out the hard way that jamming a bunch of drives in a small enclosure has a cost in terms of power and cooling. Intel tell us that they’ve had some (potentially) really good results with the E1.L and E1.S based on testing to date (in comparison to traditional SSDs). They talked about:

  • Up to 2x less airflow needed per E1.L 9.5mm SSD vs. U.2 15mm (based on Intel’s internal simulation results); and
  • Up to 3x less airflow needed per E1.S SSD vs. U.2 7mm.

Still Serviceable

You can also replace these things when they break. Intel say they’re:

  • Fully front serviceable with an integrated pull latch;
  • Support integrated, programmable LEDs; and
  • Support remote, drive specific power cycling.

 

Thoughts And Further Reading

SAN and NAS became popular in the data centre because you could jam a whole bunch of drives in a central location and you weren’t limited by what a single server could support. For some workloads though, having storage decoupled from the server can be problematic either in terms of latency, bandwidth, or both. Some workloads need their storage as close to the processor as possible. Technologies such as NVMe over Fabrics are addressing that issue to an extent, and other vendors are working to bring the compute closer to the storage. But some people just want to do what they do, and they need more and more storage to do it. I think the “ruler” form factor is an interesting approach to the issue traditionally associated with cramming a bunch of capacity in a small space. It’s probably going to be some time before you see this kind of thing in data centres as a matter of course, because it takes a long time to change the way that people design their servers to accommodate new standards. Remember how long it took for SFF drives to become as common in the DC as they are? No? Well it took a while. Server designs are sometimes developed years (or at least months) ahead of their release to the market. That said, I think Intel have come up with a really cool idea here, and if they can address the cooling and capacity issues as well as they say they can, this will likely take off. Of course, the idea of having 1PB of data sitting in 1RU should be at least a little scary in terms of failure domains, but I’m sure someone will work that out. It’s just physics after all, isn’t it?

There’s also an interesting article at The Register on the newer generation of drive form factors that’s worth checking out.

Intel Are Putting Technology To Good Use

Disclaimer: I recently attended Storage Field Day 12.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

 

Here are some notes from Intel‘s presentation at Storage Field Day 12. You can view the video here and download my rough notes here.

 

I/O Can Be Hard Work

With the advent of NVM Express, things go pretty fast nowadays. Or, at least, faster than they used to with those old-timey spinning disks we’ve loved for so long. According to Intel, systems with multiple NVMe SSDs are now capable of performing millions of I/Os per second. Which is great, but it results in many cores of software overhead with a kernel-based interrupt-driven driver model. The answer, according to Intel, is the Storage Performance Development Kit (SPDK). The SPDK enables more CPU cycles for storage services, with lower I/O latency. The great news is that there’s now almost no premium now on capacity to do IOPS with a system. So how does this help in the real world?

 

Real World Applications?

SPDK VM I/O Efficiency

The SPDK offers some excellent performance improvements when dishing up storage to VMs.

  • NVMe ephemeral storage
  • SPDK-based 3rd party storage services

Leverage existing infrastructure for:

  • QEMU vhost-scsi;
  • QEMU/DPDK vhost-net user.

Features and benefits

  • High performance storage virtualisation
  • Reduced VM exit
  • Lower latency
  • Increased VM density
  • Reduced tail latencies
  • Higher throughput

Intel say that Ali Cloud sees ~300% improvement in IOPS and latency using SPDK

 

VM Ephemeral Storage

  • Improves Storage virtualisation
  • Works with KVM/QEMU
  • 6x efficiency vs kernel host
  • 10x efficiency vs QEMU virtuo
  • Increased VM density

 

SPDK and NVMe over Fabrics

SPDK also works a treat with NVMe over Fabrics.

VM Remote Storage

  • Enable disaggregation and migration of VMs using remote storage
  • Improves storage virtualisation and flexibility
  • Works with KVM/QEMU

 

NVMe over Fabrics

NVMe over Fabrics
Feature Benefit
Utilises NVM Express (NVMe) Polled Mode Driver Reduced overhead per NVMe I/O
RDMA Queue Pair Polling No interrupt overhead
Connections pinned to CPU cores No synchronisation overhead

 

NVMe-oF Key Takeaways

  • Preserves the latency-optimised NVMe protocol through network hops
  • Potentially radically efficient, depending on implementation
  • Actually fabric agnostic: InfinBand, RDMA, TCP/IP, FC … all ok!
  • Underlying protocol for existing and emerging technologies
  • Using SPDK, can integrate NVMe and NVMe-oF directly into applications

 

VM I/O Efficiency Key Takeaways

  • Huge improvement in latency for VM workloads
  • Application-level sees 3-4X performance gains
  • Application unmodified: it’s all under the covers
  • Virtuous cycle with VM density
  • Fully compatible with NVMe-oF!

 

Further Reading and Conclusion

Intel said during the presentation that “[p]eople find ways of consuming resources you provide to them”. This is true, and one of the reasons I became interested in storage early in my career. What’s been most interesting about the last few years worth of storage developments (as we’ve moved beyond spinning disks and simple file systems to super fast flash subsystems and massively scaled out object storage systems) is that people are still really only interested in have lots of storage that is fast and reliable. The technologies talked about during this presentation obviously aren’t showing up in consumer products just yet, but it’s an interesting insight into the direction the market is heading. I’m mighty excited about NVMe over Fabrics and looking forward to this technology being widely adopted in the data centre.

If you’ve had the opportunity to watch the video from Storage Field Day 12 (and some other appearances by Intel Storage at Tech Field Day events), you’ll quickly understand that I’ve barely skimmed the surface of what Intel are doing in the storage space, and just how much is going on before your precious bits are hitting the file system / object store / block device. NVMe is the new way of doing things fast, and I think Intel are certainly pioneering the advancement of this technology through real-world applications. This is, after all, the key piece of the puzzle – understanding how to take blazingly fast technology and apply a useful programmatic framework that companies can build upon to deliver useful outcomes.

For another perspective, have a look at Chan’s article here. You also won’t go wrong checking out Glenn’s post here.