Storage Field Day 7 – Day 2 – Springpath

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Springpath presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Springpath website that covers some of what they presented.


Company Overview

Springpath (formerly StorVisor) came out of stealth in February, just before Storage Field Day 7.

Ravi Parthasarathy, VP of Product Management, presented an overview of the company.

Springpath is essentially storage software deployed on commodity hardware providing an enterprise class solution. It offers:

  • Enterprise grade, scale out capability;
  • Maximum simplicity; and is
  • Completely software-based.

By enterprise grade, Springpath have focussed on:

  • Robustness, resiliency and data integrity
  • Data mirroring and automatic rebalancing
  • Flash / memory performance
  • Native, space efficient snapshots
  • VM / VVOL / File granularity
  • Inline deduplication and compression
  • Lower $/GB using high capacity 7.2K RPM drives

As for maximum simplicity, Springpath have aimed to:

  • Leverage existing mgmt tools
  • Provide for zero learning curve
  • No legacy storage complexity
  • Rapid provisioning of applications
  • Cloud based auto-support monitoring
  • Proactive alerts and rapid resolution

They also offer “software economics”:

  • Choose your (prescribed) servers
  • Choose your platform (VMware 5.5 and above – OpenStack and KVM will be offered in beta shortly)
  • Annual subscriptions, per server, including support
  • Any server, any capacity
  • Upgrade your servers without a “software tax”
  • Scale out compute or performance or capacity
  • Just-in-time scaling in small increments

Sounds pretty good so far.



Here’s a photo of Mallik Mahalingam presenting. Mallik is one of the co-founders of Springpath, did a lot of work on I/O at VMware previously and is, in my opinion, an excellent table tennis player.


The Springpath Data Platform is:

  • 100% software;
  • Provides elastic scaling;
  • Enterprise grade; and
  • Integrates into existing management tools.

It is, ostensibly, data management and storage software on commodity hardware, without compromising features, scale or performance.

Springpath had the following design goals for the platform:

  • Scale out performance and capacity linearly;
  • Scale out the caching tier independently from the capacity tier, with losing data management features;
  • Leverage flash for performance and low speed hard disks for capacity;
  • Maximise utilisation of free space in flash or hard disks, when nodes appear / disappear in cluster;
  • Maximise space usage using inline compression and inline deduplication in all tiers;
  • Provide pointer-based file level snapshots and clones;
  • Support a variety of platforms (VMware, KVM, Hyper-V, Containers …); and
  • Leverage existing management applications and frameworks.

Springpath offers a scale out and distributed file system capability:

  • You can start with as few as 3 servers;
  • The software cluster installs in minutes;
  • Add servers, one or more at a time;
  • Distribute and rebalance data across servers automatically;
  • Retire older servers as required; and
  • Independent scaling of compute, cache or capacity.


The Springpath platform is built on the HALO Architecture – Hardware Agnostic Log-Structured Objects


Here’s the rough outline of the elements of the HALO architecture:

Data Access Layer

  • VMware
  • ESXi

The Springpath Data Platform offers (or will offer) support for:

  • KVM
  • NFS/Cinder/Nova/Glance
  • Hyper-V
  • SMB

Data Distribution

  • Avoid controller hotspots
  • Leverage cache across all SSDs in the cluster

Data Virtualisation – Caching

  • Striping across and within VMs
  • Take a stripe and route it to one of the cache vNodes
  • Wanted to “decouple the ability to server the data from the location that you’re serving it from”
  • Rebalances cache on node addition or removal

Data Virtualisation

  • Write back caching to SSDs with mirroring
    • all writes to cache vNodes go to a write log on SSD
    • synchronously mirror one or two copies for HA
    • acknowledge after mirror writes are complete
  • Maximum write size is 64K
  • De-staging of write log (write log is currently 2GB)
    • writes are de-staged from write log to data and metadata vNodes
    • data and metadata are mirrored to one or two nodes for high availability
    • data can be de-staged to a local or different server based on available space
  • Uniform Space Utilisation
    • utilise free capacity when new nodes are added
    • faster rebuilds
  • Read caching
    • data is cached in both memory and SSD for reads
    • misses are fetched from HDDs from any node in the cluster

Data Optimisation

  • Inline dedupe and compression
    • inline, dedupe of memory, SSD and HDD
    • striping enables dedupe across files
    • inline compression on SSD and HDD

Data Management

  • Native Snapshots
    • Pointer Based Snapshots – fast creations and deletions, no consolidation overhead
    • Fine-grained or coarse-grained – VM-level or VM folder level
    • VAAI / Cinder integrated – quiesced and crash-consistent
    • Use vCenter Snapshot Manager
    • Policy Based – schedules, retention period
  • Native Clones
    • pointer based writeable snapshots
    • VM-level
    • VAAI integrated
    • Batch version GUI – clone names, use customisation spec


Closing Thoughts and Further Reading

Springpath provided the following summary of their offering.


  • Log structured layout
  • Data virtualisation
  • Data distribution
  • Data services
  • Integrated management


  • Flash endurance, compression friendly, faster rebuilds
  • Scale performance and capacity independently, eliminate hotspots
  • Granular scaling and rebalancing
  • Fast efficient snapshots and clones
  • Reduced management

I’m a fan of “software economics” when it’s done properly. I like what Springpath are doing and think they’re taking the right approach to buzzword storage offerings / software-defined storage. It remains to be seen whether they can make their way in what’s becoming a crowded hyper-converged space, but they seem to be making all the right noises. I recommend you check out Keith’s preview blog post on Springpath, as well as Cormac’s typically comprehensive write-up here.