Panasas Overview

A good friend of mine is friends with someone who works at Panasas and suggested I might like to hear from them. I had the opportunity to speak to some of the team, and I thought I’d write a brief overview of what they do. Hopefully I’ll have the opportunity to cover them in the future as I think they’re doing some pretty neat stuff.

 

It’s HPC, But Not As You Know It

I don’t often like to include that slide where the vendor compares themselves to other players in the market. In this case, though, I thought Panasas’s positioning of themselves as “commercial” HPC versus the traditional HPC storage (and versus enterprise scale-out NAS) is an interesting one. We talked through this a little, and my impression is that they’re starting to deal more and more with the non-traditional HPC-like use cases, such as media and entertainment, oil and gas, genomics folks, and so forth. A number of these workloads fall outside HPC, in the sense that traditional HPC has lived almost exclusively in government and the academic sphere. The roots are clearly in HPC, but there are “enterprise” elements creeping in, such as ease of use (at scale) and improved management functionality.

[image courtesy of Panasas]

 

Technology

It’s Really Parallel

The really value in Panasas’s offering is the parallel access to the storage. The more nodes you add, the more performance improves. In a serial system, a client can access data via one node in the cluster, regardless of the number of nodes available. In a parallel system, such as this one, a client accesses data that is spread across multiple nodes.

 

What About The Hardware?

The current offering from Panasas is called ActiveStor. The platform is comprised of PanFS running on Director Blades and Storage Blades. Here’s a picture of the Director Blades (ASD-100) and the Storage Blades (ASH-100). The Director has been transitioned to a 2U4N form factor (it used to be sit in the blade chassis).

[image courtesy of Panasas]

 

Director Nodes are the Control Plane of PanFS, and handle:

  • Metadata processing: directories, file names, access control checks, timestamps, etc.
  • Uses a transaction log to ensure atomicity and durability of structural changes
  • Coordination of client system actions to ensure single-system view and data-cache-coherence
  • “Realm” membership (Panasas’s name for the storage cluster), realm self-repair, etc.
  • Realm maintenance: file reconstruction, automatic capacity balancing, scrubbing, etc.

Storage Nodes are the Data Plane of PanFS, and deal with:

  • Storage of bulk user data for the realm, accessed in parallel by client systems
  • Also stores, but does not operate on, all the metadata of the system for the Director Nodes
  • API based upon the T10 SCSI “Object-Based Storage Device” that Panasas helped define

Storage nodes offer a variety of HDD (4TB, 6TB, 8TB, 10TB, or 12TB) and SSD capacities (480GB, 960GB, 1.9TB) depending on the type of workload you’re dealing with. The SSD is used for metadata and files smaller than 60KB. Everything else is stored on the larger drives.

 

DirectFlow Protocol

DirectFlow is a big part fo what differentiates Panasas from your average scale-out NAS offering. It does some stuff that’s pretty cool, including:

  • Support for parallel delivery of data to / from Storage Nodes
  • Support for fully POSIX-compliant semantics, unlike NFS and SMB
  • Support for strong data cache-coherency across client systems

It’s a proprietary protocol between clients and ActiveStor components, and there’s an installable kernel module for each client system (Linux and macOS). They tell me that pNFS is based upon DirectFlow, and they had a hand in defining pNFS.

 

Resilience

Scale out NAS is exciting but us enterprise types want to know about resilience. It’s all fun and games until someone fat fingers a file, or a disk dies. Well, Panasas, as it happens, have a little heritage when it comes to disk resilience. They use a N + 2 RAID 6 (10 wide + P & Q). You could have more disks working for you, but this number seems to work best for Panasas customers. In terms of realms, there are 3, 5 or 7 “rep sets” per realm. There’s also a “realm president”, and every Director has a backup director. There’s also:

  • Per-file erasure coding of striped files allows the whole cluster to help rebuild a file after a failure;
  • Only need to rebuild data protection on specific files instead of entire drives(s); and
  • The percentage of files in the cluster affected by any given failure approaches zero at scale.

 

Thoughts and Further Reading

I’m the first to admit that my storage experience to date has been firmly rooted in the enterprise space. But, much like my fascination with infrastructure associated wth media and entertainment, I fancy myself as an HPC-friendly storage guy. This is for no other reason than I think HPC workloads are pretty cool, and they tend to scale beyond what I normally see in the enterprise space (keeping in mind that I work in a smallish market). You say genomics to someone, or AI, and they’re enthusiastic about the outcomes. You say SQL 2012 to someone and they’re not as interested.

Panasas are positioning themselves as being suitable, primarily, for commercial HPC storage requirements. They have a strong heritage with traditional HPC workloads, and they seem to have a few customers using their systems for more traditional, enterprise-like NAS deployments as well. This convergence of commercial HPC, traditional and enterprise NAS requirements has presented some interesting challenges, but it seems like Panasas have addressed those in the latest iteration of its hardware. Dealing with stonking great big amounts of data at scale is a challenge for plenty of storage vendors, but Panasas have demonstrated an ability adapt to the evolving demands of their core market. I’m looking forward to seeing the next incarnation of their platform, and how they incorporate technologies such as InfiniBand into their offering.

There’s a good white paper available on the Panasas architecture that you can access here (registration required). El Reg also has some decent coverage of the current hardware offering here.

WekaIO Have Been Busy – Really Busy

WekaIO recently announced Version 3.1 of their Matrix software, and I had the good fortune to catch up with David Hiatt. We’d spoken a little while ago when WekaIO came out of stealth and they’ve certainly been busy in the interim. In fact, they’ve been busy to the point that I thought it was worth putting together a brief overview of what’s new.

 

What Is WekaIO?

WekaIO have been around since 2013, gaining their first customers in 2016. They’ve had 17 patents filed, 45 identified, and 8 issued. Their focus has primarily been on delivering, in their words, the “highest performance file system targeted at compute intensive applications”. They deliver a fully POSIX-compliant file system that can run on bare metal, hypervisors, Docker, or in the public or private cloud.

[image courtesy of WekaIO]

Some of the key features of the architecture include the fact that it is distributed, resilient at scale, can perform fast rebuilds, and provides end-to-end protection. Right now, their key use cases include genomics, machine learning, media rendering, semiconductors, financial trading and analytics. The company has staff coming from XIV, NetApp, IBM, EMC, and Intel, amongst others.

 

So What’s News?

Well, there’s been a bit going on:

 

Matrix Version 3.1 – Much Better Than Matrix Revolutions

Not that that’s too hard to do. But there have been a bunch of new features added to WekaIO’s Matrix software. Here’s a table that summarises the new features.

Feature Explanation
Network Redundancy Binding network links and load balancing
Infiniband Native support for InfiniBand
Multiple File Systems Logical partitioning allows more granular allocation of performance and capacity
Cluster Scaling Dynamically shrink and grow clusters
NVMe Native support for NVMe devices
Snapshots and Clones High performance 4K granularity
Snap to Object Store Saving metadata of snap to OBS
Deployment in AWS Install and run Matrix on EC2 clusters

David also took me through what look like some very, very good SPECsfs2014 Software Build results, particularly when compared with some competitive solutions. He also walked me through the Marketplace configurator. This is really cool stuff – flexible and easy to use. You can check out a demo of it here.

 

Conclusion

All the cool kids are doing stuff with AWS. And that’s fine. But I really like that WekaIO also make stuff easy to run on-premises as well. And they also make it really fast. Because sometimes you just need to run stuff near you, and sometimes there needs to be an awful lot of it. WekaIO’s model is flexible, with the annual subscription approach and lack of maintenance contracts bound to appeal to a lot of people. The great thing is it’s easy to manage, easy to scale and supports all the file protocols you’d be interested in. There’s a bunch of (configurable) resiliency built in and support for hybrid workloads if required.

With a Formula One slide including customer testimonials from the likes of DreamWorks and SDSC, I get the impression that WekaIO are up to something pretty cool. Plus, I really enjoy chatting to David about what’s going on in the world of highly scalable file systems, and am looking forward to our next call in a few months’ time to see what they’ve been up to. I get the impression there’s little chance they’ll be sitting still.

Scale Computing Announces Support For Hybrid Storage and Other Good Things

Scale_Logo_High_Res

If you’re unfamiliar with Scale Computing, they’re a hyperconverged infrastructure (HCI) vendor out of Indianapolis that have been around for some time and deliver a solution aimed squarely at the small to mid-size market. They’ve been around since 2008, and launched their HC3 platform in 2012. They have around 1600 customers, and about 6000 units deployed in the field. Justin Warren provides a nice overview here as part of his research for Storage Field Day 5, while Trevor Pott wrote a comprehensive review for El Reg that you can read here. I was fortunate enough to get a briefing from Alan Conboy from Scale Computing and thought it worthy of putting pen to paper, so to speak.

 

So What is a Scale Computing?

Scale describes the HC3 as a scale-out system. It has the following features:

  • 3 or more nodes –fully automated Active/Active architecture;
  • Clustered virtualization compute platform with no virtualization licensing (KVM-based, not VMware);
  • Protocol-less pooled storage resources eliminate external storage requirements entirely with no SAN or VSA;
  • +60% efficiency gains built in to the IO path – Scale made much of this in my briefing, and it certainly looks good on paper;
  • Cluster is self healing and self load balancing – the nodes talk directly to each other;
  • Scale’s State Machine technology makes the cluster Self-Aware with no need for external management servers – so no out of band management servers. When you’ve done as many vSphere deployments as I have this becomes very appealing;

You can read a bit more about how it all hangs together here. Here’s a simple diagram of the how it looks from a networking perspective. Each node has 4 NICs, with two going to the back-end and two ports for the front-end. You can read up on recommended network switches here.

Scale01_HC3
Each node contains:

  • 8 to 40 vCores;
  • 32 to 512GB VM Memory;
  • Quad Network interface ports in 1GbE or 10GbE;
  • 4 or 8 spindles in 7.2k, 10k, or 15k RPM and SSD as a tier.

Here’s an overview of the different models, along with list prices in $US. You can check out the specification sheet here.

Scale02_Node_Models

 

So What’s New?

Flash. Scale tell me “it’s not being used as a simple cache, but as a proper, fluid tier of storage to meet the needs of a growing and changing SMB to SME market”. There are some neat features that have been built in to the interface. I was able to test these during the briefing with Scale. In a nutshell, there’s a level of granularity that the IT generalist should be pleased with.

  • Set different priority for VMs on a per virtual disk basis;
  • Change on the fly as needed;
  • Makes use of SLC SSD as a storage tier not just a cache; and
  • Keep unnecessary workloads off of the SSD tier completely.

Scale is deploying its new HyperCore Enhanced Automated Tiering (HEAT) technology across the HC3 product line and is introducing a flash storage tier as part of its HC2150 and HC4150 appliances. Scale tell me that they are “[a]vailable in 4- or 8-drive units”, and “Scale’s latest offerings include one 400 or 800GB SSD with three NL-SAS HDD in 1-6TB capacities and memory up to 256GB, or two 400 or 800GB SSD with 6 NL-SAS HDD in 1-2TB capacities and up to 512 GB memory respectively. Network connectivity for either system is achieved through two 10GbE SFP+ ports per node”.

It’s also worth noting that the new products can be used to form new clusters, or they can be added to existing HC3 clusters. Existing workloads on those clusters will automatically utilize the new storage tier when the new nodes are added. You can read more on what’s new here.

 

Further Reading and Feelings

As someone who deals with reasonably complex infrastructure builds as part of my day job, it was refreshing to get a briefing from a company who’s focus is on simplicity for a certain market segment, rather than trying to be the HCI vendor everyone goes to. I was really impressed with the intuitive nature of the interface, the simplicity with which tasks could be achieved, and the thought that’s gone into the architecture. The price, for what it offers, is very competitive as well, particularly in the face of more traditional compute + storage stacks aimed at SMEs. I’m working with Scale to get myself some more stick time in the near future and am looking forward to reporting back with the results.