Storage Field Day 7 – Wrap-up and Link-o-rama

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

This is a quick post to say thanks once again to Stephen, Claire and the presenters at Storage Field Day 7. I had a great time, learnt a lot, and didn’t get much sleep. For easy reference, here’s a list of the posts I did covering the event (not necessarily in chronological order).

Storage Field Day – I’ll be at SFD7

Storage Field Day 7 – Day 0

Storage Field Day 7 – Day 1 – Catalogic Software

Storage Field Day 7 – Day 1 – Kaminario

Storage Field Day 7 – Day 1 – Primary Data

Storage Field Day 7 – Day 2 – VMware

Storage Field Day 7 – Day 2 – Connected Data

Storage Field Day 7 – Day 2 – Springpath

Storage Field Day 7 – Day 3 – Cloudian

Storage Field Day 7 – Day 3 – Exablox

Storage Field Day 7 – Day 3 – Maxta

Storage Field Day 7 – (Fairly) Full Disclosure

Also, here’s a number of links to posts by my fellow delegates. They’re all switched-on people, and you’d do well to check out what they’re writing about. I’ll try and update this list as more posts are published. But if it gets stale, the SFD7 landing page has updated links.

 

Ray Lucchesi

Data virtualization surfaces

Transporter, a private Dropbox in a tower

Object store and hybrid clouds at Cloudian

 

Enrico Signoretti

It’s storage showtime! #SFD7

Storage Field Day 7, links and live stream

When looking good is no longer enough

File Transporter, private Sync&Share made easy

Thinking different about storage

Rumors, strategies and facts about Hyper-converged

 

Mark May

I’m going to Storage Field Day 7!

It’s almost time! #SFD7 is next week!

Day 0 of SFD7 – Yankee Gift Swap and delegate dinner

Goodbye to Storage Field Day 7

Storage Field Day 7 – Primary Data

 

Christopher Kusek

I’ll be attending Storage Field Day 7 – Now with Clear Containers!

 

Jon Klaus

Storage Field Day 7, here I come!

Storage Field Day 7 is about to start!

Storage Field Day 7 – Catalogic ECX reducing copy data sprawl

Storage Field Day 7 – Exablox OneBlox: scale-out NAS for SME

 

Vipin V.K

It’s Storage Field Day again…! – #SFD7

 

Keith Townsend

Kaminario – Storage Field Day 7 Preview

Maxta – Storage Field Day 7 Preview

Primary Data – Storage Field Day 7

Springpath – Storage Field Day 7 Preview

Transporter – Storage Field Day 7 Preview

VMware – Storage Field Day 7 Preview

Exablox – Storage Field Day 7 Preview

Cloudian – Storage Field Day 7 Preview

Catalogic Software – Storage Field Day 7 Preview

CopyData yeah… Long live Data Virtualization

Hyperconverged vendor Maxta announces SDN partnership

 

Chris M Evans

Storage Field Day 7 – 11-13 March 2015

Storage Field Day 7 – Initial Thoughts

SFD7 – Catalogic Software Addresses Data Copy Management

SFD7 – Connected Data, Transporter and Private “Cloud” Storage

SFD7 – Primary Data and Data Virtualisation

 

Arjan Timmerman

The Storage Field Day 7 Delegates

Software Defined Dockerized Springpath HALO at #SFD7

 

Finally, thanks again to Stephen, Claire (and Tom in absentia). It was a great few days and I really valued the opportunity I was given to attend.

IMG_1238

Storage Field Day 7 – Day 2 – VMware

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the VMware presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the VMware website that covers some of what they presented.

 

Overview

I’d like to say a few things about the presentation. Firstly, it was held in the “Rubber Chicken” Room at VMware HQ.

Secondly, Rawlinson was there, but we ran out of time to hear him present. This seems to happen each time I see him in real life. Still, it’s not everyday you get to hear Christos Karamanolis (@XtosK) talk about this stuff, so I’ll put my somewhat weird @PunchingClouds fanboy thing to the side for the moment.

SFD7_Day2_VMware_XtosK_HA

Thirdly, and I’ll be upfront about this, I was a bit disappointed that VMware didn’t go outside some fairly fixed parameters as far as what they could and couldn’t talk about with regards to Virtual SAN. I understand that mega software companies have to be a bit careful about what they can say publicly, but I had hoped for something fresher in this presentation. In any case, I’ve included my notes on Christos’s view on the VSAN architecture – I hope it’s useful.

 

Architecture

VMware adopted the following principles when designing VSAN.

Hyper-converged

  • Compute + storage scalability
  • Unobtrusive to existing data centre architecture
  • Distributed software running on every host
  • Pools local storage (flash + HDD) on hosts (virtual shared datastore)
  • Symmetric architecture – no single point of failure, no bottleneck

The hypervisor opens up new opportunities, with the virtualisation platform providing:

  • Visibility to individual VMs and application storage
  • Manages all applications’ resource requirements
  • Sits directly in the I/O path
  • A global view of underlying infrastructure
  • Supports an extensive hardware compatibility list (HCL)

Critical paths in ESX kernel

The cluster service allows for

  • Fast failure detection
  • High performance (especially for writes)

The data path provides

  • Low latency
  • Minimal CPU per IO
  • Minimal Mem consumption
  • Physical access to devices

This equals minimal impact on consolidation rates. This is a Good Thing™.

Optimized internet protocol

As ESXi is both the “consumer” and “producer” of data there is no need for a standard data access protocol.

Per-object coordinator = client

  • Distributed “metadata server”
  • Transactions span only object distribution

Efficient reliable data transport (RDT)

  • Protocol agnostic (now TCP/IP)
  • RDMA friendly

Standard protocol for external access?

Two tiers of storage: Hybrid

Optimise the cost of physical storage resources

  • HDDS: cheap capacity, expensive IOPS
  • Flash: expensive capacity, cheap IOPS

Combine best of both worlds

  • Performance from flash (read cache + write back)
  • Capacity from HDD (capacity tier)

Optimise workload per tier

  • Random IO to flash (high IOPS)
  • Sequential IO to HDD (high throughput)

Storage organised in disk groups (flash device and magnetic disks) – up to 5 disk groups, 1 SSD + 7 HDDs – this is the fault domain. 70% of flash is read cache, 30% is write buffer. Writes are accumulated, then staged in a magnetic disk-friendly fashion. Proximal IO – writing blocks within a certain number of cylinders. Filesystem on the magnetic disks is slightly different to the one on the SSDs. Uses the back-end of the Virsto filesystem, but doesn’t use the log-structure filesystem component.

Distributed caching

Flash device: cache of disk group (70% read cache, 30% write-back buffer)

No caching on “local” flash where VM runs

  • Flash latencies 100x network latencies
  • No data transfers, no perf hit during VM migration
  • Better overall flash utilisation (most expensive resource)

Use local cache when it matters

  • In-memory CBRC (RAM << Network latency)
  • Lots of block sharing (VDI)
  • More options in the future …

Deduplicated RAM-based caching

Object-based storage

  • VM consists of a number of objects – each object individually distributed
  • VSAN doesn’t know about VMs and VMDKs
  • Up to 62TB useable
  • Single namespace, multiple mount points
  • VMFS created in sub-namespace

The VM Home directory object is formatted with VMFS to allow a VM’s config files to be stored on it. Mounted under the root dir vsanDatastore.

  • Availability policy reflected on number of replicas
  • Performance policy may include a stripe width per replica
  • Object “components” may reside in different disks and / or hosts

VSAN cluster = vSphere cluster

Ease of management

  • Piggyback on vSphere management workflow, e.g. EMM
  • Ensure coherent configuration of hosts in vSphere cluster

Adapt to the customer’s data centre architecture while working with network topology constraints.

Maintenance mode – planned downtime.

Three options:

  • Ensure accessibility;
  • Full data migration; and
  • No data migration.

HA Integration

VM-centric monitoring and troubleshooting

VMODL APIs

  • Configure, manage, monitor

Policy compliance reporting

Combination of tools for monitoring in 5.5

  • CLI commmands
  • Ruby vSphere console
  • VSAN observer

More to come soon …

Real *software* defined storage

Software + hardware – component based (individual components), Virtual SAN ready node (40 OEM validated server configurations are ready for VSAN deployment)

VMware EVO:RAIL = Hyper-converged infrastructure

It’s a big task to get all of this working with everything (supporting the entire vSphere HCL).

 

Closing Thoughts and Further Reading

I like VSAN. And I like that VMware are working so hard at getting it right. I don’t like some of the bs that goes with their marketing of the product, but I think it has its place in the enterprise and is only going to go from strength to strength with the amount of resources VMware is throwing at it. In the meantime, check out Keith’s background post on VMware here. In my opinion, you can’t go past Cormac’s posts on VSAN if you want a technical deep dive. Also, buy his book.

Storage Field Day 7 – Day 3 – Cloudian

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Cloudian presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Cloudian website that covers some of what they presented.

 

Overview

Michael Tso, CEO and co-founder of Cloudian, provided us with a brief overview of the company. It was founded about 4 years ago, and a lot of the staff’s background was experience with hyper-scale messaging systems for big telcos. They now have about 65 staff.

Cloudian  offers a software version as well as a hardware appliance that runs their HyperStore software. The hardware appliance comes in 3 different flavours:

  • Entry Level;
  • Capacity Optimised; and
  • Performance Optimised.

The software is supported on RedHat and CentOS.

 

Architecture

Paul Turner, Chief Marketing and Product Officer, gave us an introduction to the architecture behind Cloudian. Their focus is on using commodity servers, that provide scale out capability, are durable, and simple to use. “If you don’t make it dead easy to add nodes or remove nodes on the fly you don’t have a good platform”.

The platform uses

  • Erasure Coding;
  • Replication; and
  • Compression

Here’s a picture of what’s inside:

SFD7_Day3_Cloudian_What's_Inside

Features include:

  • Natively S3;
  • Hybrid Storage Cloud;
  • Extreme durability;
  • Multi-tenant;
  • Geo-distribution;
  • Scale out;
  • Intelligence in Software;
  • Smart Support;
  • Data Protection;
  • QoS;
  • Programmable; and
  • Billing and Reporting.

They also make use of an Adaptive Policy Engine (multi-tenant, continuous, adaptive, policy engine), which offers:

  • Policy controlled virtual storage pools (buckets like Amazon);
  • Scale / reduce storage on demand;
  • Multi-tenanted with many application tenants on same infrastructure;
  • Dynamically adjust protection policies;
  • Optimise for small objects by policy; and
  • Cloud archiving by virtual pool.

 

Here’s a diagram of the logical architecture.

SFD7_Day3_Cloudian_Architecture

They use Cassandra as the core metadata and distribution mechanism. Why Cassandra? Well it’s

Scalable

  • Supports 1000s of nodes
  • Adds capacity by adding nodes to running system
  • Distributed shared-nothing P2P architecture, with no single point of failure

Reliable

  • Data durability, synced to disk
  • Resilient to network or hardware failures
  • Multi-DC replication
  • Tuneable data consistency level

Provides Features such as

  • Vnodes, TTL, secondary indexes, compression, encryption

Performant

  • Write path especially fast

Multiple data protection policies, including:

  • NoSQL DB, Replicas, Erasure Coding

Policy features

  • ACL, QoS, Tiering, versioning, etc.

vnodes

  • Nodes remapped to physical disks. then one disk failure only affects those nodes;
  • Maximum 256 nodes per physical node. no token management. tokens randomly assigned;
  • Parallel I/O across nodes;
  • Increased repair speed in case of disk or node failure; and
  • Allows heterogeneous machines in a cluster.

 

Further Reading and Final Thoughts

If you’re doing a bit with cloud storage, I think these guys are worth checking out. I particularly like the use case for Cloudian deployed as an on-premises S3 cloud behind the firewall. There’s also a Community Edition available for download. You can use HyperStore Community Edition software for:

  • For product evaluation;
  • Testing HyperStore software features in a single or multi-node install; and
  • Building 10TB object storage systems free of charge.

I think that’s pretty neat. I also recommend checking out Keith’s preview of Cloudian.

 

Transporter Revisited – Part 2 – Testing and Final Thoughts

Disclaimer: I recently received a second Transporter (Individual) unit from Connected Data in Australia to review how synchronisation worked between individual units on a LAN and WAN, amongst other things. I provided my own hard drive for the second unit. Big thanks to Philippe from Connected Data in Australia for reaching out to me in the first place and Josh from Kayell for organising the unit to be sent my way.

I recently wrote about adding a second Transporter to my home network. This post covers the results of some of the scenarios I wanted to look at from a functionality perspective. The scenarios I worked through included:

  • Photo data sync between laptop / Transporter over a LAN and WAN;
  • Video data sync between Transporters over a LAN;
  • Sharing video files to a non-Transporter user; and
  • Accessing files using the mobile app on iOS.

 

Photo data sync

Photos were pretty easy to move around. For my test data I used a 450MB folder of photos of sneakers. I copied it to the Transporter and noticed within a minute that the Transporter client on my Mac was picking up the changes.

Untitled1

Once the folder was on the Transporter and syncing with my devices I then had a copy of the photos in a number of locations. Pretty simple stuff.

 

Video data sync

Let’s be clear – copying large files to the Transporter, even over a LAN connection, can be slow. There’s a lot getting in the way of this being a speedy operation, including the fact that the Transporter itself just isn’t a blazingly fast unit. So don’t waste your money putting in a flash drive, or think that this is going to be the right tool in a video rendering workflow – because I don’t think it is.

However, if you keep things simple, and count on stuff taking a little while, you can certainly do a bit with this unit. I copied about 40GB of files directly to the Transporter. It took close to an hour to complete, but I expected it would. While that was happening, a few other things happened. Firstly, the other Transporter on my LAN got the message that there was stuff on the Transporter that should be synchronised. Cool.

Untitled7

 

Secondly, I could then decide to share the files on a limited basis, either via the Transporter application, or via SMB if I really wanted to. You can read the Transporter FAQ on SMB here.

Untitled2

Note that even when you turn on SMB on the unit, you won’t see the files until you share the folder in question.

Untitled8

Untitled3

Once you’ve done that, you’ll see the files via SMB.

Untitled5

 

Sharing video files

So now I have about 40GB of video files on my Transporters. What if I want to share those with someone who doesn’t have a Transporter? It’s pretty simple. I can right-click on the file I want to share, create a link, and then send links to people I want to share the files with. Note, however, that links are generated on a per-file basis. You’re better off just inviting  people to access the shared folder.

Untitled6

Another cool thing you can do is control which Transporters will store the files you’re sharing.

Untitled9

I tested playback of the video over both Wifi and LAN connection. The video files were standard definition MPEG files in PAL (720 x 576) format running at about 4Mbps. They played back well with some choppiness. Still, as far as a simple way to distribute a bunch of files, this is one of the easiest ways I’ve found to do it, particularly when it comes to sharing with people outside the network.

 

Accessing data over mobile

The Transporter mobile app is a snap to use, and works well on both the phone and iPad. I only tested the iOS version, so I can’t tell you how the other flavours behave.

mobile1

You can do cool things like setting it to automatically upload photos. I can see that this is going to be a handy feature when I’m travelling and don’t necessarily have my Crashplan-protected laptop with me.

mobile2

 

Final Thoughts

I gushed about Transporter when I first came across it at SFD7, and after testing the use of multiple units, I’m still a fan. For the most part, it does what it says on the box, and it’s a snap to setup. The key thing for me is the mobile access and ability to securely share files with the outside world in a controlled fashion. I like that there’s a nod to SMB in there, and the ability to create read-only shares as required. I also like that my daughters (both of whom use their iPads heavily for school work) can easily access files at home and at school without bloody e-mailing to me them all the time. I’m giving it two thumbs up – it does what I need it to do. Obviously your mileage might vary.

 

 

Transporter Revisited – Part 1 – Introduction

Disclaimer: I recently received a second Transporter (Individual) unit from Connected Data in Australia to review how synchronisation worked between individual units on a LAN and WAN, amongst other things. I provided my own hard drive for the second unit. Big thanks to Philippe from Connected Data in Australia for reaching out to me in the first place and Josh from Kayell for organising the unit to be sent my way.

Firstly, if you’d like some background on Connected Data, check out my Storage Field Day 7 post here.

Secondly, the Transporter User Guide is the best resource to get started with the Transporter. Most operations with the Transporter are pretty simple, although the user guide provides some useful background on how and why things work the way they do.

My second Transporter arrived without a hard drive, so I went and bought a 1TB 2.5″ WD Blue drive [WD10JPVX] to use with it. I chose this model because it was the same as the one in my first unit, it was reasonably priced, and I’d had some good experiences with WD drives recently. The drive you choose is up to you, although going with an SSD will not improve the performance of the unit. You can find a list of the drive requirements here. Connected Data have also developed a useful video entitled “Transporter Hard Drive Installation Video”.

Once you’ve got your drive in, you’ll want to add the Transporter to your account. If you need assistance with this, the Quick Start Guide is a pretty handy place to start in my opinion.

Once you’re all setup, you can get to the interesting bit – sharing data between Transporters and other users. The first thing to understand is whether you want to store files only on your Transporter, or whether you want the files to sync to the machines you’ve installed the client on as well. The differences between the Transporter folder and Library are covered fairly comprehensively here. Broadly speaking, if I had some photos I wanted to keep a copy of on my Transporter, I’d probably copy them to the Transporter folder and have them synchronise with my laptop and any other Transporters in my control. If I wanted to copy GBs of video, for example, I’d probably store that in the Transporter Library. This would keep the files only on the Transporters, not my laptop as well. Note that the mobile application only downloads files as they’re accessed, it doesn’t automatically download files.

Note also I’m not super interested in performance from a synchronisation perspective, as I’m hamstrung from a WAN perspective with a pretty awful ADSL connection at my house. What I did want to cover, however, was a few of the different ways files could be accessed and move around using these units. These are the scenarios I looked at testing:

  • Photo data sync between laptop / Transporter over a LAN and WAN;
  • Video data sync between Transporters over a LAN;
  • Sharing video files to a non-Transporter user; and
  • Accessing files using the mobile app on iOS.

It’s not super scientific, but I was looking at scenarios that I thought would be useful to me as a consumer. Note, also, that for the large data tests, I had the Transporter units and laptop sitting on the same gigabit LAN. In the next post I’ll be running through the results of the testing.

Storage Field Day 7 – Day 3 – Maxta

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Maxta presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Maxta website that covers some of what they presented.

 

Company Overview

Yoram Novick, CEO of Maxta, took us through a little of the company’s history and an overview of the products they offer.

Founded 2009, Maxta “maximises the promise of hyper-convergence” through:

  • Choice;
  • Simplicity;
  • Scalability; and
  • Cost.

They currently offer a buzzword-compliant storage platform via their MxSP product, while producing hyper-converged appliances via the MaxDeploy platform. They’re funded by Andreessen Horowitz, Intel Capital, and Tenaya Capital amongst others and are seeking to “[a]lign the storage construct with the abstraction layer”. They do this through:

  • Dramatically simplified management;
  • “World class” VM-level data services;
  • Eliminating storage arrays and storage networking; and
  • Leveraging flash / disk and capacity optimization.

 

Solutions

MaxDeploy is Maxta’s Hyper-Converged Appliance, running on a combination of preconfigured servers and Maxta software. Maxta suggest you can go from zero to running VMs in 15 minutes. They offer peace of mind through:

  • Interoperability;
  • Ease of ordering and deployment; and
  • Predictability of performance.

MxSP is Maxta’s Software-Defined Storage product. Not surprisingly, it is software only, and offered via a perpetual license or via subscription. Like a number of SDS products, the benefits are as follows:

  • Flexibility
    • DIY – your choice in hardware
    • Works with existing infrastructure – no forklift upgrades
  • Full-featured
    • Enterprise class data services
    • Support latest and greatest technologies
  • Customised configuration for users
    • Major server vendors supported
    • Proposed configuration validated
    • Fulfilled by partners

 

Architecture

MaxtaMaxDeployArchitecture

The Maxta Architecture is built around the following key features:

Data Services

  • Data integrity
  • Data protection / snapshots / clones
  • High availability
  • Capacity optimisation (thin / deduplication / compression)
  • Linear scalability

Simplicity

  • VM-centric
  • Tight integration with orchestration software / tools
  • Policy based management
  • Multi-hypervisor support (VMware, KVM, OpenStack integration)

What’s the value proposition?

  • Maximise choice – any server, hypervisor, storage, workload
  • Maximise IT simplicity – manage VMs, not storage
  • Maximise Cost Savings – standard components and capacity optimisation
  • Provide high levels of data resiliency, availability and protection

I get the impression that Maxta thought a bit about data layout, with the following points being critical to the story:

  • Cluster-wide capacity balancing
  • Favours placement of new data on new / under-utilised disks / nodes
  • Periodic rebalancing across disks / nodes
  • Proactive data relocation

 

Closing Thoughts and Further Reading

I like Maxta’s story. I like the two-pronged  approach they’ve taken with their product set, and appreciate the level of thought they’ve put into their architecture. I have no idea how much this stuff costs, so can’t say whether it represents good value or no, but on the basis of the presentation I saw I certainly think they’re worth looking at if you’re looking to get into either mega-converged appliances or buzzword-storage platforms. You should also check out Keith’s preview blog post on Maxta here, while Cormac did a great write-up late last year that is well worth checking out.

 

Storage Field Day 7 – Day 2 – Springpath

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Springpath presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Springpath website that covers some of what they presented.

 

Company Overview

Springpath (formerly StorVisor) came out of stealth in February, just before Storage Field Day 7.

Ravi Parthasarathy, VP of Product Management, presented an overview of the company.

Springpath is essentially storage software deployed on commodity hardware providing an enterprise class solution. It offers:

  • Enterprise grade, scale out capability;
  • Maximum simplicity; and is
  • Completely software-based.

By enterprise grade, Springpath have focussed on:

  • Robustness, resiliency and data integrity
  • Data mirroring and automatic rebalancing
  • Flash / memory performance
  • Native, space efficient snapshots
  • VM / VVOL / File granularity
  • Inline deduplication and compression
  • Lower $/GB using high capacity 7.2K RPM drives

As for maximum simplicity, Springpath have aimed to:

  • Leverage existing mgmt tools
  • Provide for zero learning curve
  • No legacy storage complexity
  • Rapid provisioning of applications
  • Cloud based auto-support monitoring
  • Proactive alerts and rapid resolution

They also offer “software economics”:

  • Choose your (prescribed) servers
  • Choose your platform (VMware 5.5 and above – OpenStack and KVM will be offered in beta shortly)
  • Annual subscriptions, per server, including support
  • Any server, any capacity
  • Upgrade your servers without a “software tax”
  • Scale out compute or performance or capacity
  • Just-in-time scaling in small increments

Sounds pretty good so far.

 

Architecture

Here’s a photo of Mallik Mahalingam presenting. Mallik is one of the co-founders of Springpath, did a lot of work on I/O at VMware previously and is, in my opinion, an excellent table tennis player.

SFD7_Day2_Springpath_Mallik_Presents

The Springpath Data Platform is:

  • 100% software;
  • Provides elastic scaling;
  • Enterprise grade; and
  • Integrates into existing management tools.

It is, ostensibly, data management and storage software on commodity hardware, without compromising features, scale or performance.

Springpath had the following design goals for the platform:

  • Scale out performance and capacity linearly;
  • Scale out the caching tier independently from the capacity tier, with losing data management features;
  • Leverage flash for performance and low speed hard disks for capacity;
  • Maximise utilisation of free space in flash or hard disks, when nodes appear / disappear in cluster;
  • Maximise space usage using inline compression and inline deduplication in all tiers;
  • Provide pointer-based file level snapshots and clones;
  • Support a variety of platforms (VMware, KVM, Hyper-V, Containers …); and
  • Leverage existing management applications and frameworks.

Springpath offers a scale out and distributed file system capability:

  • You can start with as few as 3 servers;
  • The software cluster installs in minutes;
  • Add servers, one or more at a time;
  • Distribute and rebalance data across servers automatically;
  • Retire older servers as required; and
  • Independent scaling of compute, cache or capacity.

 

The Springpath platform is built on the HALO Architecture – Hardware Agnostic Log-Structured Objects

SFD7_Day2_Springpath_HALO_Architecture_2

Here’s the rough outline of the elements of the HALO architecture:

Data Access Layer

  • VMware
  • ESXi
  • NFS/VAAI/VVOL

The Springpath Data Platform offers (or will offer) support for:

  • KVM
  • NFS/Cinder/Nova/Glance
  • Hyper-V
  • SMB

Data Distribution

  • Avoid controller hotspots
  • Leverage cache across all SSDs in the cluster

Data Virtualisation – Caching

  • Striping across and within VMs
  • Take a stripe and route it to one of the cache vNodes
  • Wanted to “decouple the ability to server the data from the location that you’re serving it from”
  • Rebalances cache on node addition or removal

Data Virtualisation

  • Write back caching to SSDs with mirroring
    • all writes to cache vNodes go to a write log on SSD
    • synchronously mirror one or two copies for HA
    • acknowledge after mirror writes are complete
  • Maximum write size is 64K
  • De-staging of write log (write log is currently 2GB)
    • writes are de-staged from write log to data and metadata vNodes
    • data and metadata are mirrored to one or two nodes for high availability
    • data can be de-staged to a local or different server based on available space
  • Uniform Space Utilisation
    • utilise free capacity when new nodes are added
    • faster rebuilds
  • Read caching
    • data is cached in both memory and SSD for reads
    • misses are fetched from HDDs from any node in the cluster

Data Optimisation

  • Inline dedupe and compression
    • inline, dedupe of memory, SSD and HDD
    • striping enables dedupe across files
    • inline compression on SSD and HDD

Data Management

  • Native Snapshots
    • Pointer Based Snapshots – fast creations and deletions, no consolidation overhead
    • Fine-grained or coarse-grained – VM-level or VM folder level
    • VAAI / Cinder integrated – quiesced and crash-consistent
    • Use vCenter Snapshot Manager
    • Policy Based – schedules, retention period
  • Native Clones
    • pointer based writeable snapshots
    • VM-level
    • VAAI integrated
    • Batch version GUI – clone names, use customisation spec

 

Closing Thoughts and Further Reading

Springpath provided the following summary of their offering.

Technology

  • Log structured layout
  • Data virtualisation
  • Data distribution
  • Data services
  • Integrated management

Benefits

  • Flash endurance, compression friendly, faster rebuilds
  • Scale performance and capacity independently, eliminate hotspots
  • Granular scaling and rebalancing
  • Fast efficient snapshots and clones
  • Reduced management

I’m a fan of “software economics” when it’s done properly. I like what Springpath are doing and think they’re taking the right approach to buzzword storage offerings / software-defined storage. It remains to be seen whether they can make their way in what’s becoming a crowded hyper-converged space, but they seem to be making all the right noises. I recommend you check out Keith’s preview blog post on Springpath, as well as Cormac’s typically comprehensive write-up here.

 

Storage Field Day 7 – Day 1 – Primary Data

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Primary Data presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Primary Data website that covers some of what they presented.

Company Overview

Here’s a slightly wonky photo of Lance Smith providing the company overview.

SFD7_Day1_Primary_Data_CEO_Lance

 

If you haven’t heard of Primary Data before, they came out of stealth in November. Their primary goal is “Automated data mobility through data virtualisation” with a focus on intelligent, policy-driven automation for storage environments. Some of the key principles driving the development of the product are:

  • Dynamic Data Mobility – see and place data across all storage resources within a single global data space;
  • Policy-driven agility – non-disruptive, policy-based data movement;
  • Intelligent automation – automated real-time infrastructure dynamically aligns supply demand;
  • Linear scalability – performance and capacity scales linearly and incrementally; and
  • Global compatibility – single hardware-agnostic solution enhances coexisting legacy it and modern scale-out and hybrid cloud architectures.

 

Architecture

David Flynn then launched into an engaging whiteboard session on the Primary Data architecture.

SFD7_PrimaryData

With storage, you have three needs – Performance, Price, Protection (Fast, Safe, Cheap). As most of us know but few of us wish to admit, you can’t have all three at the same time. This isn’t going to change, according to David. Indeed, the current approach of “Managing data via the storage container that holds it. This is the tail wagging the dog.”

So how does Primary Data get around the problem? Separate the metadata from the data.

Primary Data:

  • Uses pNFS client;
  • Offers file on file, on block, on object, on DAS;
  • Block as file;
  • Object; and
  • Splits the metadata and control path off to the side.

Primary Data also claim that 80% of IOPS to primary storage is to storage that doesn’t need to exist after a crash (temp, scratch, swap, etc).

David talked about when VMware first did virtualisation, there were a few phases:

1. Utilisation – This was the “doorknocker” use case that got people interested in virtualisation.

2. Manageability – this is what got people sticking with virtualisation.

Now along comes Primary Data, doing Data Virtualisation that also offers:

3. Performance.

Because, once you’ve virtualised the data, the problem becomes setting the objectives for the storage and the needs of the data. This is where Primary Data claim that their policy-based automation really helps organisations get the most from their storage platforms, and thus, their applications and data.

 

Closing Thoughts and Further Reading

Primary Data have some great pedigree and a lot of prior experience in the storage industry. There’s a lot more to the product than I’ve covered here, and it’s worth your while revisiting the video presentation they did at SFD7. They’ve taken an interesting approach, and I’m looking forward to hearing more about how it goes for them when they start shipping GA code (which they expect to do later this year).

 

Mark has a good write-up here, while Keith’s preview blog post is here and his excellent post-presentation discussion post can be found here.

Storage Field Day 7 – Day 1 – Kaminario

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Kaminario presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Kaminario website that covers some of what they presented.

 

Overview

Dani Golan, CEO of Kaminario, gave us a quick overview of the company. They’ve recently launched the 5th generation of their all-flash array (AFA), with the majority (80%) of customers being in the midrange (rev $100m – $5B).

The entry level for the solution is 20TB, with the average capacity being between 50 and 150TB. The largest implementation runs to 1.5PB.

Use cases are primarily:

  • VDI / Virtualisation;
  • Analytics; and
  • OLTP.

Kaminario state that they’re balanced across all verticals and offer general purpose storage.

 

Architecture

Kaminario state that architecture is key. I think we’re all agreed on that point. Kaminario’s design goals are to:

  • scale easily and cost-efficiently; and
  • provide the lowest overhead on the storage system to fulfil the customer’s needs.

Kaminario want to offer capacity, performance and flexibility. They do this by offering scale up and scale out.

Customers want somewhere in between best $/capacity and best $/performance.

The K2 basic building block (K-blocks, not 2K blocks) is:

  • Off the shelf hardware;
  • 2x K-nodes (1U server);
  • Infiniband;
  • SSD Shelf (24 SSDs – 2RU); and
  • SSD expansion shelf (24 SSDs – 2RU).

Here’s a diagram of the K2 scale up model.

SFD7_Day1_Kaminario_ScaleUp

And here’s what it looks like when you scale out.

SFD7_Day1_Kaminario_Scale_Out

I want to do both! Sure, here’s what scale up and out looks like.

SFD7_Day1_Kaminario_ScaleUpandOut

In the K2 scale-out architecture:

  • Data is spread across all nodes;
  • Metadata is spread across all nodes;
  • Provides the ability to mix and match different generations of servers and SSDs;
  • Offers global deduplication; and
  • Provides resiliency for multiple simultaneous failures.

Data is protected against block (nodes and storage) failure, but the system will go down to secure the data.

As for metadata scalability, modern data reduction means fine grain metadata:

  • Pointer per 4KB of addressable; and
  • Signature per 4KB of unique data.

According to Kaminario, reducing the metadata footprint is crucial.

  • The adaptive block size architecture means less pointers;
  • Deduplication with weak hash reduces signature footprint; and
  • Density per node is critical.

K-RAID

SFD7_Day1_Kaminario_K-RAID

K-RAID is Kaminario’s interpretation of RAID 6, and works thusly:

  • 2P + Q – 2 R5 groups, single Q parity on them;
  • Fully rotating, RAID is fully balanced;
  • Fully automatic, no manual configuration; and
  • High utilisation (87.5%), no dedicated spares.

The K2 architecture also offers the following data reduction technologies:

Deduplication

  • Global and adaptive;
  • Selective – can be turned off per volume; and
  • Weak hash and compare – low MD and CPU footprint, fits well with flash.

Compression

  • Byte-aligned compression;
  • Adaptive block size – large chunks are stored contiguously, each 4k compressed separately;
  • Standard LZ4 algorithm; and
  • Optimized zero elimination.

From a resiliency perspective, K2 supports:

  • Two concurrent SSD failures per shelf;
  • Consistent, predictable and high performance under failure; and
  • Fast SSD firmware upgrades.

The architecture currently scales to 8 K-Blocks, with the sweet spot being around 2 – 4 K-Blocks. I strongly recommend you check out the Kaminario architecture white paper – it’s actually very informative.

 

Final Thoughts and Further Reading

I first came across Kaminario at VMworld last year, and I liked what they had to say. Their presentation at SFD7 backs that up for me, along with the reading I’ve done and the conversations I’ve had with people from the company. I like the approach, but I think they have a bit of an uphill battle to crack what seems to be a fairly congested AFA market. With a little bit more marketing, they might yet get there. Yes, I said more marketing. While we all like to criticise the marketing of products by IT vendors, I think it’s still a fairly critical piece of the overall solution puzzle, particularly when it comes to getting in front of customers who want to spend money. But that’s just my view. In any case, Enrico did a great write-up on Kaminario – you can read it here. I also recommend checking out Keith’s preview blog of Kaminario.

Storage Field Day 7 – Day 1 – Catalogic Software

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Catalogic Software presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Catalogic Software website that covers some of what they presented.

 

Overview

According to their website, “ECX is an intelligent copy data management [IDM] platform that allows you to manage, orchestrate and analyze your copy data lifecycle across your enterprise and cloud”. If you’ve ever delivered storage in an enterprise environment before, you’ll understand that copy data management (CDM) is something that can have a significant impact on your infrastructure, and it’s not always something people do well, or even understand.

Ed Walls, CEO of Catalogic, talked a bit about current challenges – growth, manageability, business agility. We’re drowning in a deluge of copy data, with most of these copies sit completely idle. This observation certainly aligns with my experience in a number of environments.

Catalogic’s IDM is a combination of your storage (currently only NetApp) and a CDM platform (provided via an agentless, downloadable VM). You can use this platform to provide “copy data leverage”, enabling orchestration and automation of your copy data. Catalogic also state that this enables you to:

  • Simplify business processes with ‘copy data’ / ‘use data’ workflows;
  • Extract more value from your copy data services;
  • Provide protection compliance / snapshots; and
  • File analytics / Search, Report and Analyse.

In addition to this, Catalogic spoke about ECX’s ability to provide:

  • Next-generation Data protection, with instant recovery and disaster recovery leveraging snap data;
  • Killer App for Hybrid Cloud, enabling business to leverage cloud “scale and economics”; and
  • Copy Data Analytics with snapshots, file analytics, protection compliance. This gives you the ability to search, report and analyse.

It’s not in-line, but rather uses public APIs to orchestrate. In this scenario, tape’s not dead, it’s just not used for operational recovery. You can use it for archive instead.

 

Architecture

The basic architecture is as follows:

  • Layer 0 – OS Services (Linux)
  • Layer 1 – Core Services – NoSQL (MongoDB) amongst them, scheduler, reporting, dir, lic mgmt, index search, web, java / REST, DBMS (PostgreSQL), Messaging
  • ECX MGMT REST APIs
  • Layer 2 – Management Services – account, policy, job, catalog, report, resource, event, alert, provision, search
  • Layer 3 – Policy-based Services – NTAP catalog, VMware catalog, NTAP CDM, VMware CDM
  • HTTPS
  • Layer 4 – Presentation Services

Here’s a picture that takes those dot points, and adds visualisation.

Catalogic_SW_ECX_Architecture

 

Demo

Catalogic went through a live demo with us, and it *looks* reasonably straightforward. A few things to note:

  • Configure – uses a provider model (one-time registration process for the NTAP controller or VMware)
  • ECX is an abstraction layer – workflow, notification, submit
  • Uses a site-based model
  • You can have a VMs and Templates or Datastore view

 

SFD7_Day1_Catalogic_Software_Demo

 

  • VM snapshots are quiesced sequentially
  • Creating trees of snapshots via workflow
  • Everything is driven via REST API

Is it a replacement for backup? No. But businesses are struggling with traditional backup and recovery methods. Combination of snapshots and tapes is appealing for some people. It “Doesn’t replace it, but reduces the dependency on backups”.

In my opinion, searching the catalogue is pretty cool. They don’t crack open the VMDK to catalogue yet, but it’s been requested by a lot of people and is on their radar.

 

Final Thoughts and Further Reading

There’s a lot to like about ECX in my opinion, although a number of delegates (myself included) were mildly disappointed that this is currently tied to NetApp. Catalogic, in their defence, are well aware of this as a limitation and are working really hard to broaden the storage platform support.

The cataloguing capability of the product looked great in the demo I saw, and I know I have a few customers who could benefit from a different approach to CDM. Or, more accurately, it would better is they had any approach at all.

Keith had some interesting thoughts on CDM as a potential precursor to data virtualisation here, as well as a preview post here – both of which are worth checking out.