Random Short Take #22

Oh look, another semi-regular listicle of random news items that might be of some interest.

  • I was at Pure Storage’s //Accelerate conference last week, and heard a lot of interesting news. This piece from Chris M. Evans on FlashArray//C was particularly insightful.
  • Storage Field Day 18 was a little while ago, but that doesn’t mean that the things that were presented there are no longer of interest. Stephen Foskett wrote a great piece on IBM’s approach to data protection with Spectrum Protect Plus that’s worth read.
  • Speaking of data protection, it’s not just for big computers. Preston wrote a great article on the iOS recovery process that you can read here. As someone who had to recently recover my phone, I agree entirely with the idea that re-downloading apps from the app store is not a recovery process.
  • NetApp were recently named a leader in the Gartner Magic Quadrant for Primary Storage. Say what you will about the MQ, a lot of folks are still reading this report and using it to help drive their decision-making activities. You can grab a copy of the report from NetApp here. Speaking of NetApp, I’m happy to announce that I’m now a member of the NetApp A-Team. I’m looking forward to doing a lot more with NetApp in terms of both my day job and the blog.
  • Tom has been on a roll lately, and this article on IT hero culture, and this one on celebrity keynote speakers, both made for great reading.
  • VMworld US was a little while ago, but Anthony‘s wrap-up post had some great content, particularly if you’re working a lot with Veeam.
  • WekaIO have just announced some work their doing Aiden Lab at the Baylor College of Medicine that looks pretty cool.
  • Speaking of analyst firms, this article from Justin over at Forbes brought up some good points about these reports and how some of them are delivered.

Storage Field Day 18 – Wrap-up and Link-o-rama

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

This is a quick post to say thanks once again to Stephen and Ben, and the presenters at Storage Field Day 18. I had a super fun and educational time. For easy reference, here’s a list of the posts I did covering the events (they may not match the order of the presentations).

Storage Field Day – I’ll Be At Storage Field Day 18

Storage Field Day 18 – Day 0

Storage Field Day 18 – (Fairly) Full Disclosure

Cohesity Is (Data)Locked In

NetApp And The Space In Between

StorPool And The Death of Hardware-Defined Storage

IBM Spectrum Protect Plus – More Than Meets The Eye

Western Digital Are Keeping Composed

VAST Data – No More Tiers Mean No More Tears?

WekaIO Continues To Evolve

Datera and the Rise of Enterprise Software-Defined Storage

 

Also, here’s a number of links to posts by my fellow delegates (in no particular order). They’re all very smart people, and you should check out their stuff, particularly if you haven’t before. I’ll attempt to keep this updated as more posts are published. But if it gets stale, the Storage Field Day 18 landing page will have updated links.

 

Becky Elliott (@BeckyLElliott)

California Dreamin’ My Way to Storage Field Day 18

A VAST-ly Different Storage Story

 

Chin-Fah Heoh (@StorageGaga)

A Storage Field 18 I will go – for the fun of it

VAST Data must be something special

Catch up (fast) – IBM Spectrum Protect Plus

Clever Cohesity

Storpool – Block storage managed well

Bridges to the clouds and more – NetApp NDAS

WekaIO controls their performance destiny

The full force of Western Digital

 

Chris M Evans (@ChrisMEvans)

Podcast #3 – Chris & Matt review the SFD18 presenters

Exploiting secondary data with NDAS from NetApp

VAST Data launches with new scale-out storage platform

Can the WekaIO Matrix file system be faster than DAS?

#91 – Storage Field Day 18 in Review

 

Erik Ableson (@EAbleson)

SFD18-Western Digital

Vast Data at Storage Field Day 18

 

Ray Lucchesi (@RayLucchesi)

StorPool, fast storage for fast times

For data that never rests, NetApp NDAS

 

Jon Klaus (@JonKlaus)

My brain will be melting at Storage Field Day 18!

Faster and bigger SSDs enable us to talk about something else than IOps

How To: Clone Windows 10 from SATA SSD to M.2 SSD (& fix inaccessible boot device)

The fast WekaIO file system saves you money!

Put all your data on flash with VAST Data

 

Enrico Signoretti (@ESignoretti)

A Packed Field Day

Democratizing Data Management

How IBM is rethinking its data protection line-up

NetApp, cloudier than ever

Voices in Data Storage – Episode 10: A Conversation with Boyan Ivanov

Voices in Data Storage – Episode 11: A Conversation with Renen Hallak

Voices in Data Storage – Episode 12: A Conversation with Bill Borsari

 

Josh De Jong (@EuroBrew)

 

Matthew Leib (@MBLeib)

I Am So Looking Forward to #SFD18

#SFD18 introduces us to VAST Data

Dual Actuator drives: An interesting trend

Weka.IO and my first official briefing

Cohesity: More on the real value of data

 

Max Mortillaro (@DarkkAvenger)

Storage Field Day 18 – It’s As Intense As Storage Field Day Gets

Storage Field Day 18 – Fifty Shades of Disclosure

Cohesity – The Gold Standard in Data Management

EP17 – Storpool: Being the best in Block Based storage – with Boyan Ivanov

Developing Data Protection Solutions in the Era of Data Management

Western Digital : Innovation in 3D NAND and Low Latency Flash NAND

 

Paul L. Woodward Jr (@ExploreVM)

Storage Field Day 18, Here I Come!

 

[photo courtesy of Stephen Foskett]

Datera and the Rise of Enterprise Software-Defined Storage

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Datera recently presented at Storage Field Day 18. You can see videos of their presentation here, and download my rough notes from here.

 

Enterprise Software-Defined Storage

Datera position themselves as delivering “Enterprise Software-Defined Storage”. But what does that really mean? Enterprise IT gives you:

  • High Performance
  • Enterprise Features
    • QoS
    • Fault Domains
    • Stretched Cluster
    • L3 Networking
    • Deduplication
    • Replication
  • HA
  • Resiliency

Software-defined storage gives you:

  • Automation
  • DC Awareness Agility
  • Continuous Availability
  • Targeted Data Placement
  • Continuous Optimisation
  • Rapid technology adoption

Combine both of these and you get Datera.

[image courtesy of Datera]

 

Why Datera?

There are some other features built in to the platform that differentiate Datera’s offering, including:

  • L3 Networking – Datera brings standard protocols with modern networking to data centre storage. Resources are designed to float to allow for agility, availability, and scalability.
  • Policy-based Operations – Datera was built from day 1 with policy controls and policy templates to easy operations at scale while maintaining agility and availability.
  • Targeted Data Placement – ensure data is distributed correctly across the physical infrastructure to meet policies around perfromance, availability, data protection while controlling cost

 

Thoughts and Further Reading

I’ve waxed lyrical about Datera’s intent-based approach previously. I like the idea that they’re positioning themselves as “Enterprise SDS”. While my day job is now at a service provider, I spent a lot of time in enterprise shops getting crusty applications to keep on running, as best as they could, on equally crusty storage arrays. Something like Datera comes along with a cool hybrid storage approach and the enterprise guys get a little nervous. They want replication, they want resiliency, they want to apply QoS policies to it.

The software-defined data centre is the darling architecture of the private cloud world. Everyone wants to work with infrastructure that can be easily automated, highly available, and extremely scalable. Historically, some of these features have flown in the face of what the enterprise wants: stability, performance, resiliency. The enterprise guys aren’t super keen on updating platforms in the middle of the day. They want to buy multiples of infrastructure components. And they want multiple sets of infrastructure protecting applications. They aren’t that far away from those software-defined folks in any case.

The ability to combine continuous optimisation with high availability is a neat part of Datera’s value proposition. Like a number of software-defined storage solutions, the ability to rapidly iterate new features within the platform, while maintaining that “enterprise” feel in terms of stability and resiliency, is a pretty cool thing. Datera are working hard to bring the best of both worlds together, and managing to deliver the agility that enterprise wants, while maintaining the availability within the infrastructure that they crave.

I’ve spoken at length before about the brutally slow pace of working in some enterprise storage shops. Operations staff are constantly being handed steamers from under-resourced or inexperienced project delivery staff. Change management people are crippling the pace. And the CIO wants to know why you’ve not moved your SQL 2005 environment to AWS. There are some very good reasons why things work the way they do (and also some very bad ones), and innovation can be painfully hard to make happen in these environments. The private cloud kids, on the other hand, are all in on the fast paced, fail fast, software-defined life. They’ve theoretically got it all humming along without a whole lot of involvement on a daily basis. Sure, they’re living on the edge (do I sound old and curmudgeonly yet?). In my opinion, Datera are doing a pretty decent job of bringing these two worlds together. I’m looking forward to seeing what they do in the next 12 months to progress that endeavour.

WekaIO Continues To Evolve

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

WekaIO recently presented at Storage Field Day 18. You can see videos of their presentation here, and download my rough notes from here. I’ve written about WekaIO before, and you can read those posts here and here.

 

WekaIO

Barbara Murphy described WekaIO Matrix as “the fastest, most scalable parallel file system for AI and technical compute workloads that ensure applications never wait for data”.

 

What They Do

So what exactly does WekaIO Matrix do?

  • WekaIO Matrix is software-defined storage solution that runs on anything from bare metal, VMs, containers, on-premises or in the cloud;
  • Fully-coherent POSIX file system that’s faster than a local file system;
  • Distributed Coding, More Resilient at Scale, Fast Rebuilds, End-to-End Data Protection; and
  • InfiniBand or Ethernet, Converged or Dedicated, on-premises or cloud.

[image courtesy of WekaIO]

 

Lots of Features

WekaIO Matrix now has a bunch of features, including:

  • Support for S3, SMB, and NFS protocols;
  • Cloud backup, Snapshots, Clones, and Snap-2-Obj;
  • Active Directory support and authentication;
  • POSIX;
  • Network High Availability;
  • Encryption;
  • Quotas;
  • HDFS; and
  • Tiering.

Flexible deployment models

  • Appliance model – compute and storage on separate infrastructure; and
  • Converged model – compute and storage on shared infrastructure.

Both models are cloud native because “[e]verybody wants the ability to be able to move to the cloud, or leverage the cloud”

 

Architectural Considerations

WekaIO is focused on delivering super fast storage via NVMe-oF, and say that NFS and SMB deliver legacy protocol support for convenience.

The Front-End

WekaIO front-ends are cluster-aware

  • Incoming read requests optimised re location and loading conditions – incoming writes can go anywhere
  • Metadata fully distributed
  • No redirects required

SR-IOV optimises network access WekaIO directly access NVMe Flash

  • Bypassing the kernel leads to better performance.

The Back-End

The WekaIO parallel clustered filesystem is

  • Optimised flash-native data placement
    • Not designed for HDD
    • No “cylinder groups” or other anachronisms – data protection (similar to EC)
    • 3-16 data drives, +2 or +4 parity drives
    • Optional hot spares – uses a “virtual” hot spare

Global namespace = hot tier + Object storage tier

  • Tiering to S3-API Object storage
    • Additional capacity with lower cost per GB
    • Files shared to object storage layer (parallelised access optimise performances, simplifies partial or offset reads)

WekaIO uses the S3-API as its equivalent of “SCSI” for HDD.

 

Conclusion and Further Reading

I like the WekaIO story. They take away a lot of the overheads associated with non-DAS storage through the use of a file system and control of the hardware. You can make DAS run really fast, but it’s invariably limited to the box that it’s in. Scale-out pools of storage still have a place, particularly in the enterprise, and WekaIO are demonstrating that the performance is there for the applications that need it. There’s a good story in terms of scale, performance, and enterprise resilience features.

Perhaps you like what you see with WekaIO Matrix but don’t want to run stuff on-premises? There’s a good story to be had with Matrix on AWS as well. You’ll be able to get some serious performance, and chances are it will fit in nicely with your cloud-native application workflow.

WekaIO continues to evolve, and I like seeing the progress they’ve been making to this point. It’s not always easy to convince the DAS folks that you can deliver a massively parallel file system and storage solution based on commodity hardware, but WekaIO are giving it a real shake. I recommend checking out Chris M. Evans’s take on WekaIO as well.

VAST Data – No More Tiers Means No More Tears?

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

VAST Data recently presented at Storage Field Day 18. You can see videos of their presentation here, and download my rough notes from here.

 

VAST Enough?

VAST Data have a solution that basically offers massive scale with Tier 1 performance, without the cost traditionally associated with Tier 1 storage.

Foundational Pieces

Some of the key pieces of the solution are technologies that weren’t commonly available until recently, including:

  • NVMe-oF – DC-scale storage protocol that enables remote NVMe devices to be accessed with direct attached performance.
  • QLC Flash – A new Flash architecture that costs less than enterprise Flash while delivering enterprise levels of performance.
  • Storage Class Memory – Persistent, NVMe memory that can be used to reliably buffer perfect writes to QLC and create large, global metadata structures to enable added efficiency.

If you read their blog post, you’ll notice that there are some interesting ideas behind the VAST Data solution, including the ideas that:

  • Flash is the only media that can be used to bring the cost of storage under what people pay today for HDD-based systems.
  • NFS and S3 can be used for applications that up until now required a level of performance that could only come from block storage.
  • Low-endurance QLC flash can be used for even the most transactional of workloads.
  • Storage computing can be disaggregated from storage media to enable greater simplicity than shared-nothing and hyper-converged architectures.
  • Data protection codes can reduce overhead to only 2% while enabling levels of resiliency 10 orders of magnitude more than classic RAID.
  • Compressed files provide evidence that data can be reduced further when viewed on a global scale.
  • Parallel storage architectures can be built without any amount of code parallelism.
  • Customers can build shared storage architectures that can compose and assign dedicated performance and security isolation to tenants on the fly.
  • One well-engineered, scalable storage system can be ‘universal’ and can enable a diverse array of workloads and requirements.

Architecture

[image courtesy of VAST Data]

  • VAST Servers – A cluster can be built with 2- 10,000 stateless servers. Servers can be collocated with applications as containers and made to auto-scale with application demand.
  • NVMe Fabric – A scalable, shared-everything cluster can be built by connecting every server and device in the cluster over commodity data center networks (Ethernet or InfiniBand).
  • NVMe Enclosures – Highly-Available NVMe Enclosures manage over one usable PB per RU. Enclosures can be scaled independent of Servers and clusters can be built to manage exabytes.

Rapid Rebuild Encoding

VAST codes accelerate rebuild speed by using a new type of algorithm that gets faster with more redundancy data. Everything is fail-in-place.

  • 150+4: 3x faster than HDD erasure rebuilds, 2.7% overhead
  • 500+10: 2x faster than HDD erasure rebuilds, 2% overhead Additional redundancy enables MTBF of over 100,000 years at scale.

Read more about that here.

Global Data Reduction

  • Data is fingerprinted in large blocks after the write is persisted in SCM
  • Fingerprints are compared to measure relative distance, similar chunks are clustered
  • Clustered data is compressed together; byte-level deltas are extracted & stored

Read more about that here.

Deployment Options

  • Full Appliance – VAST-provided turn-key appliance
  • Software-Defined – enclosures and container software
  • Software-only – run VAST SW on certified QLC hardware

 

Specifications

The storage is the VAST DF-5615 Active / Active NVMe Enclosure.

[image courtesy of VAST Data]

 

I/O Modules 2 x Active/Active IO Modules
I/O Connectivity 4 x 100Gb Ethernet or 4 x 100Gb InfiniBand
Management (optional) 4 x 1GbE
NVMe Flash Storage 44 x 15.36TB QLC Flash
NVMe Persistent Memory 12 x 1.5TB U.2 Devices
Dimensions (without cable mgmt.) 2U Rackmount

H: 3.2”, W: 17.6”, D: 37.4”

Weight 85 lbs.
Power Supplies 4 x 1500W
Power Consumption 1200W Avg / 1450W Max
Maximum Scale Up to 1,000 Enclosures

 

Compute is housed in the VAST Quad Server Chassis.

[image courtesy of VAST Data]

 

Servers 4 x Stateless VAST Servers
I/O Connectivity 8 x 50 Gb Ethernet 4 x 100 Gb InfiniBand
Management (optional) 4 x 1GbE
Physical CPU Cores 80 x 2.4 GHz
Memory 32 x 32GB 2400 MHz RDIMM
Dimensions 2U Rackmount

H: 3.42”, W: 17.24”, D: 28.86”

Weight 78 lbs.
Power Supplies 2 x 1600W
Power Consumption 750W Avg / 900W Max
Maximum Scale Up to 10,000 VAST Servers

 

Thoughts And Other Reading

One of my favourite things about the VAST Data story is the fact that they’re all in on a greenfield approach to storage architecture. Their ace in the hole is that they’re leveraging Persistent Memory, QLC and NVMe-oF to make it all work. Coupled with the disaggregated shared everything architecture, this seems to me like a fresh approach to storage. There are also some flexible options available for deployment. I haven’t seen what the commercials look like for this solution, so I can’t put my hand on my heart and tell you that this will be cheaper than a mechanical drive based solution. That said, the folks working at VAST have some good experience with doing smart things with Flash, and if anyone can make this work, they can. I look forward to reading more about VAST Data, particularly when they get some more customers that can publicly talk about what they’re doing. It also helps that my friend Howard has joined the company. In my opinion that says a lot about what they have to offer.

VAST Data have published a reasonably comprehensive overview of their soilution that can be found here. There’s also a good overview of VAST Data by Chris Mellor that you can read here. You can also read more from Chris here, and here. Glenn K. Lockwood provides one of the best overviews on VAST Data you can read here.

Western Digital Are Keeping Composed

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Western Digital recently presented at Storage Field Day 18. You can see videos of their presentation here, and download my rough notes from here.

 

Getting Composed

Scott Hamilton (Senior Director, Product Management) spoke to the delegates about Western Digital’s vision for composable infrastructure. I’m the first to admit that I haven’t really paid enough attention to composability in the recent past, although I do know that it messes with my computer’s spell check mechanism – so it must be new and disruptive.

There’s Work To Be Done

Hamilton spoke a little about the increasingly dynamic workloads in the DC, with a recent study showing that:

  • 45% of compute hours and storage capacity are utilised
  • 70% report inefficiencies in the time required to provision compute and storage resources

There are clearly greater demands on:

  • Scalability
  • Efficiency
  • Agility
  • Performance

Path to Composability

I remember a few years ago when I was presenting to customers about hyper-converged solutions. I’d talk about the path to HCI, with build it yourself being the first step, followed by converged, and then hyper-converged. The path to Composable is similar, with converged, and hyper-converged being the precursor architectures in the modern DC.

Converged

  • Preconfigured hardware / software for a specific application and workload (think EMC Vblock or NetApp FlexPod)

Hyper-Converged

  • Software-defined with deeper levels of abstraction and automation (think Nutanix or EMC’s VxRail)

Composable

  • Disaggregated compute and storage resources
  • Shared pool of resources that can be composed and made available on demand

[image courtesy of Western Digital]

The idea is that you have a bunch of disaggregated resources that can be really used as a pool for various applications or hosts. In this architecture, there are

  • No physical systems – only composed systems;
  • No established hierarchy – CPU doesn’t own the GPU or the memory; and
  • All elements are peers on the network and they communicate with each other.

 

Can You See It?

Western Digital outlined their vision for composable infrastructure thusly:

Composable Infrastructure Vision

  • Open – open in both form factor and API for management and orchestration of composable resources
  • Scalable – independent performance and capacity scaling from rack-level to multi-rack
  • Disaggregated – true disaggregation of storage and compute for independent scaling to maximise efficiency, agility snd to reduce TCO
  • Extensible – flash, disk and future compassable entities can be independently scaled, managed and shared over the same fabric

Western Digital’s Open Composability API is also designed for DC Composability, with:

  • Logical composability of resources abstracted from the underlying physical hardware, and
  • It discovers, assembles, and composes self-virtualised resources via peer-to-peer communication.

The idea is that it enables virtual system composition of existing HCI and Next-generation SCI environments. It also

  • Future proofs the transition from hyper-converged to disaggregated architectures
  • Complements existing Redfish / Swordfish usage

You can read more about OpenFlex here. There’s also an excellent technical brief from Western Digital that you can access here.

 

OpenFlex Composable Infrastructure

We’re talking about infrastructure to support an architecture though. In this instance, Western Digital offer the:

  • OpenFlex F3000 – Fabric device and enclosure; and
  • OpenFlex D3000 – High capacity for big data

 

F3000 and E3000

The F3000 and E3000 (F is for Flash Fabric and E is for Enclosure) has the following specification:

  • Dual-port, high-performance, low-latency, fabric-attached SSD
  • 3U enclosure with 10 dual-port slots offering up to 614TB
  • Self-virtualised device with up to 256 namespaces for dynamic provisioning
  • Multiple storage tiers over the same wire – Flash and Disk accessed via NVMf

D3000

The D3000 (D is for Disk / Dense) is as follows:

  • Dual-port fabric-attached high-capacity device to balance cost and capacity
  • 1U network addressable device offering up to 168TB
  • Self-virtualised device with up to 256 namespaces for dynamic provisioning
  • Multiple storage tiers over the same wire – Flash and Disk accessed via NVMe-oF

You can get a better look at them here.

 

Thoughts and Further Reading

Western Digital covered an awful lot of ground in their presentation at Storage Field Day 18. I like the story behind a lot of what they’re selling, particularly the storage part of it. I’m still playing wait and see when it comes to the composability story. I’m a massive fan of the concept. It’s my opinion that virtualisation gave us an inkling of what could be done in terms of DC resource consumption, but there’s still an awful lot of resources wasted in modern deployments. Technologies such as containers help a bit with that resource control issue, but I’m not sure the enterprise can effectively leverage them in their current iteration, primarily because the enterprise is very, well, enterprise-y.

Composability, on the other hand, might just be the kind of thing that can free the average enterprise IT shop from the shackles of resource management ineptitude that they’ve traditionally struggled with. Much like the public cloud has helped (and created consumption problems), so too could composable infrastructure. This is assuming that we don’t try and slap older style thinking on top of the infrastructure. I’ve seen environments where operations staff needed to submit change requests to perform vMotions of VMs from one host to another. So, like anything, some super cool technology isn’t going to magically fix your broken processes. But the idea is so cool, and if companies like Western Digital can continue to push the boundaries of what’s possible with the infrastructure, there’s at least a chance that things will improve.

If you’d like to read more about the storage-y part of Western Digital, check out Chin-Fah’s post here, Erik’s post here, and Jon’s post here. There was also some talk about dual actuator drives as well. Matt Leib wrote some thoughts on that. Look for more in this space, as I think it’s starting to really heat up.

IBM Spectrum Protect Plus – More Than Meets The Eye

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

IBM recently presented at Storage Field Day 18. You can see videos of their presentation here, and download my rough notes from here.

 

We Want A Lot From Data Protection

Data protection isn’t just about periodic protection of applications or files any more. Or, at the very least, we seem to want more than that from our data protection solutions. We want:

  • Application / data recovery – providing data availability;
  • Disaster Recovery – recovering from a minor to major data loss;
  • BCP – reducing the risk to the business, employees, market perception;
  • Application / data reuse – utilise for new routes to market; and
  • Cyber resiliency – recover the business from a compromised attack.

There’s a lot to cover there. And it could be argued that you’d need five different solutions to meet those requirements successfully. With IBM Spectrum Protect Plus (SPP) though, you’re able to meet a number of those requirements.

 

There’s Much That Can Be Done

IBM are positioning SPP as a tool that can help you extend your protection options beyond the traditional periodic data protection solution. You can use it for:

  • Data management / operational recovery – modernise and expanded use cases with instant data access, instant recovery leveraging snapshots;
  • Backup – traditional backup / recovery using streaming backups; and
  • Archive – long-term data retention / compliance, corporate governance.

 

Key Design Principles

Easy Setup

  • Deploy Anywhere: virtual appliance, cloud, bare metal;
  • Zero touch application agents;
  • Automated deployment for IBM Cloud for VMware; and
  • IBM SPP Blueprints.

The benefits of this include:

  • Easy to get started;
  • Reduced deployment costs; and
  • Hybrid and multi-cloud configurations.

Protect

  • Protect databases and applications hosted on-premises or in cloud;
  • Incremental forever using native hypervisor, database, and OS APIs; and
  • Efficient data reduction using deduplication and compression.

The benefits of this include:

  • Efficiency through reduced storage and network usage;
  • Stringent RPOs compliance with a reduced backup window; and
  • Application backup with multi-cloud portability.

Manage

  • Centralised, SLA-driven management;
  • Simple, secure RBAC based user self service; and
  • Lifecycle management of space efficient point-in-time snapshots.

The benefits of this include:

  • Lower TCO by reducing operational costs;
  • Consistent management / governance of multi-cloud environments; and
  • Secure by design with RBAC.

Recover, Reuse

  • Instant access / sandbox for DevOps and test environments;
  • Recover applications in cloud or data centre; and
  • Global file search and recovery.

The benefits of this include:

  • Improved RTO via instant access;
  • Eliminate time finding the right copy (file search across all snapshots with a globally indexed namespace);
  • Data reuse (versus backup as just an insurance policy); and
  • Improved agility; efficiently capture and use copy of production data for test.

 

One Workflow, Multiple Use Cases

There’s a lot you can with SPP, and the following diagram shows the breadth of the solution.

[image courtesy of IBM]

 

Thoughts and Further Reading

When I first encountered IBM SPP at Storage Field Day 15, I was impressed with their approach to policy-driven protection. It’s my opinion that we’re asking more and more of modern data protection solutions. We don’t just want to use them as insurance for our data and applications any more. We want to extract value from the data. We want to use the data as part of test and development workflows. And we want to manipulate the data we’re protecting in ways that have proven difficult in years gone by. It’s not just about having a secondary copy of an important file sitting somewhere safe. Nor is it just about using that data to refresh an application so we can test it with current business problems. It’s all of those things and more. This add complexity to the solution, as many people who’ve administered data protection solutions have found out over the years. To this end, IBM have worked hard with SPP to ensure that it’s a relatively simple process to get up and running, and that you can do what you need out of the box with minimal fuss.

If you’re already operating in the IBM ecosystem, a solution like SPP can make a lot of sense, as there are some excellent integration points available with other parts of the IBM portfolio. That said, there’s no reason you can’t benefit from SPP as a standalone offering. All of the normal features you’d expect in a modern data protection platform are present, and there’s good support for enhanced protection use cases, such as analytics.

Enrico had some interesting thoughts on IBM’s data protection lineup here, and Chin-Fah had a bit to say here.

StorPool And The Death of Hardware-Defined Storage

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

StorPool recently presented at Storage Field Day 18. You can see their videos from Storage Field Day 18 here, and download a PDF copy of my rough notes from here.

 

StorPool?

StorPool delivers block storage software. Fundamentally, it “pools the attached storage (hard disks or SSDs) of standard servers to create a single pool of shared block storage. The StorPool software is installed on each server in the cluster and combines the performance and capacity of all drives attached to the servers into one global namespace”. There’s a useful technical overview that you can read here.

[image courtesy of StorPool]

StorPool position themselves as a software company delivering scale-out, block storage software. They say they’ve been doing this before SDS / SDN / SDDC & “marketing-defined storage” were popular terms. The idea is that it is always delivered as a working storage solution on customer’s hardware. There are a few ways that the solution can be used, including:

  1. Fully-Managed software + 24/7/365 support, SLAs, etc
  2. On HCL-compatible hardware; or
  3. As a pre-integrated solution.

Data Integrity

The kind of data management features you’d expect from modern storage systems are present here as well, including:

  • Thin provisioning / reclaim;
  • Copy on Write snapshots, clones; and
  • Changed block tracking, incremental recovery, and transfer.

There’s also support for multi-site deployments:

  • Connect 2 or more StorPool clusters over public Internet; and
  • Send snapshots between clusters for backup and DR.

Developed from Scratch

One of the cool things about StorPool is that whole thing has been developed from scratch. They use their own on-disk format, protocol, quorum, client, etc. They’ve had systems running in production for 6+ years, as well as:

  • Numerous 1PB+ flash systems;
  • 17 major releases; and
  • Global customers.

Who Uses It?

So who uses StorPool? Their target customers are companies building private and public clouds, including:

  • Service Providers and folk operating public clouds; and
  • Enterprises and various private cloud implementations.

That’s obviously a fairly broad spectrum of potential customers, but I think that speaks somewhat to the potential versatility of software-defined solutions.

 

Thoughts and Further Reading

“Software-defined” storage solutions have become more and more popular in the last few years. Customers seem to be getting more comfortable with using and supporting their own hardware (up to a point), and vendors seem to be more willing to position these kinds of solutions as viable, production-ready platforms. It helps tremendously, in my opinion, that a lot of the heavy lifting previously done with dedicated silicon on traditional storage systems can now be done by a core on an x86 or ARM-based CPU. And there seem to be a lot more cores going around, giving vendors the option to do a lot more with these software-defined systems too.

There are a number of benefits to adopting software-defined solutions, including the ability to move from one hardware supplier to another without the need to dramatically change the operation environment. There’s a good story to be had in terms of updates too, and it’s no secret that people like that they aren’t tied to the vendor’s professional services arm to get installations done in quite the same way they perhaps were with dedicated storage arrays. It’s important to remember, though, that software isn’t magic. If you throw cruddy hardware at a solution like StorPool, it’s not going to somehow exceed the limitations of that hardware. You still need to give it some grunt to get some good performance in return. That said, there are plenty of examples where software-defined solutions can be improved dramatically through code optimisations, without changing hardware at all.

The point of all this is that, whilst I don’t really think hardware-defined storage solutions are going anywhere for the moment, companies like StorPool are certainly delivering compelling solutions in code that mean you don’t need to be constrained by what the big box storage vendors are selling you. StorPool have put some careful consideration into the features they offer with their platform, and have also focused heavily on the possible performance that could be achieved with the solution. There’s a good resilience story there, and it seems to be very service provider-friendly. Of course, everyone’s situation is different, and not everyone will get what they need from something like StorPool. But if you’re in the market for a distributed block storage system, and have a particular hankering to run it on your own, preferred, flavour of hardware, something like StorPool is certainly worthy of further investigation. If you want to dig in a little more, I recommend checking out the resources section on the StorPool website – it’s packed with useful information. And have a look at Ray’s article as well.

NetApp And The Space In Between

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

NetApp recently presented at Storage Field Day 18. You can see their videos from Storage Field Day 18 here, and download a PDF copy of my rough notes from here.

 

Bye, Dave

We were lucky enough to have Dave Hitz (now “Founder Emeritus” at NetApp) spend time with us on his last day in the office. I’ve only met him a few times but I’ve always enjoyed listening to his perspectives on what’s happening in the industry.

Cloud First?

In a previous life I worked in a government department architecting storage and virtualisation solutions for a variety of infrastructure scenarios. The idea, generally speaking, was that those solutions would solve particular business problems, or at least help to improve the processes to resolve those problems. At some point, probably late 2008 or early 2009, we started to talk about developing a “Cloud First” architecture policy, with the idea being that we would resolve to adopt cloud technologies where we could, and reduce our reliance on on-premises solutions as time passed. The beauty of working in enterprise environments is that things can take an awfully long time to happen, so that policy didn’t really come into effect until some years later.

So what does cloud first really mean? It’s possibly not as straightforward as having a “virtualisation first” policy. With the virtualisation first approach, there was a simple qualification process we undertook to determine whether a particular workload was suited to run on our virtualisation platform. This involved all the standard stuff, like funding requirements, security constraints, anticipated performance needs, and licensing concerns. We then pushed the workload one of two ways. With cloud though, there are a few more ways you can skin the cat, and it’s becoming more obvious to me that cloud means different things to different people. Some people want to push workloads to the cloud because they have a requirement to reduce their capital expenditure. Some people have to move to cloud because the CIO has determined that there needs to be a reduction in the workforce managing infrastructure activities. Some people go to cloud because they saw a cool demo at a technology conference. Some people go to cloud because their peers in another government department told them it would be easy to do. The common thread is that “people’s paths to the cloud can be so different”.

Can your workload even run in the cloud? Hitz gave us a great example of some stuff that just can’t (a printing press). The printing press needs to pump out jobs at a certain time of the day every day. It’s not going to necessarily benefit from elastic scalability for its compute workload. The workloads driving the presses would likely run a static workload.

Should it run in the cloud?

It’s a good question to ask. Most of the time, I’d say the answer is yes. This isn’t just because I work for a telco selling cloud products. There are a tonne of benefits to be had in running various, generic workloads in the cloud. Hitz suggests though, that the should it question is a corporate strategy question, and I think he’s spot on. When you embed “cloud first” in your infrastructure architecture, you’re potentially impacting a bunch of stuff outside of infrastructure architecture, including financial models, workforce management, and corporate security postures. It diens’t have to be a big deal, but it’s something that people sometimes don’t think about. And just because you start with that as your mantra, doesn’t mean you need to end up in cloud.

Does It Feel Cloudy?

Cloudy? It’s my opinion that NetApp’s cloud story is underrated. But, as Hitz noted, they’ve had the occasional misstep. When they first introduced Cloud ONTAP, Anthony Lye said it “didn’t smell like cloud”. Instead, Hitz told us he said it “feels like a product for storage administrators”. Cloudy people don’t want that, and they don’t want to talk to storage administrators. Some cloudy people were formerly storage folks, and some have never had the misfortune of managing over-provisioned midrange arrays at scale. Cloud comes in all different flavours, but it’s clear that just shoving a traditional on-premises product on a public cloud provider’s infrastructure isn’t really as cloudy as we’d like to think.

 

Bridging The Gap

NetApp are focused now on “finding the space between the old and the new, and understanding that you’ll have both for a long time”. And that’s what NetApp’s focusing on moving forward. They’re not just working on cloud-only solutions, and they have no plans to ditch their on-premises. Indeed, as Hitz noted in his presentation, “having good cloudy solutions will help them gain share in on-premises footprint”. It’s a good strategy, as the on-premises market will be around for some time to come (do you like how vague that is?). It’s been my belief for some time that companies, like NetApp, that can participate in both the on-premises and cloud market effectively will be successful.

 

Thoughts and Further Reading

So why did I clumsily paraphrase a How To Destroy Angels song title and ramble on about the good old days of my career in this article instead of waxing lyrical about Charlotte Brooks’s presentation on NetApp Data Availability Services? I’m not exactly sure. I do recommend checking out Charlotte’s demo and presentation, because she’s really quite good at getting the message across, and NDAS looks pretty interesting.

Perhaps I spent the time focusing on the “cloud first” conversation because it was Dave Hitz, and it’s likely the last time I’ll see him presenting in this kind of forum. But whether it was Dave or not, conversations like this one are important, in my opinion. It often feels like we’re putting the technology ahead of the why. I’m a big fan of cloud first, but I’m an even bigger fan of people understanding the impact that their technology decisions can have on the business they’re working for. It’s nice to see a vendor who can comfortably operate on both sides of the equation having this kind of conversation, and I think it’s one that more businesses need to be having with their vendors and their internal staff.

Cohesity Is (Data)Locked In

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Cohesity recently presented at Storage Field Day 18. You can see their videos from Storage Field Day 18 here, and download a PDF copy of my rough notes from here.

 

The Cohesity Difference?

Cohesity covered a number of different topics in its presentation, and I thought I’d outline some of the Cohesity features before I jump into the meat and potatoes of my article. Some of the key things you get with Cohesity are:

  • Global space efficiency;
  • Data mobility;
  • Data resiliency & compliance;
  • Instant mass restore; and
  • Apps integration.

I’m going to cover 3 of the 5 here, and you can check the videos for details of the Cohesity MarketPlace and the Instant Mass Restore demonstration.

Global Space Efficiency

One of the big selling points for the Cohesity data platform is the ability to deliver data reduction and small file optimisation.

  • Global deduplication
    • Modes: inline, post-process
  • Archive to cloud is also deduplicated
  • Compression
    • Zstandard algorithm (read more about that here)
  • Small file optimisation
    • Better performance for reads and writes
    • Benefits from deduplication and compression

Data Mobility

There’s also an excellent story when it comes to data mobility, with the platform delivering the following data mobility features:

  • Data portability across clouds
  • Multi-cloud replication and archival (1:many)
  • Integrated indexing and search across locations

You also get simultaneous, multi-protocol access and a comprehensive set of file permissions to work with.

 

But What About Archives And Stuff?

Okay, so all of that stuff is really cool, and I could stop there and you’d probably be happy enough that Cohesity delivers the goods when it comes to a secondary storage platform that delivers a variety of features. In my opinion, though, it gets a lot more interesting when you have a look at some of the archival features that are built into the platform.

Flexible Archive Solutions

  • Archive either on-premises or to cloud;
  • Policy driven archival schedule for long term data retention
  • Data an be retrieved to the same or a different Cohesity cluster; and
  • Archived data is subject to further deduplication.

Data Resiliency and Compliance – ensures data integrity

  • Erasure coding;
  • Highly available; and
  • DataLock and legal hold.

Achieving Compliance with File-level DataLock

In my opinion, DataLock is where it gets interesting in terms of archive compliance.

  • DataLock enables WORM functionality at a file level;
  • DataLock adheres to regulatory acts;
  • Can automatically lock a file after a period of inactivity;
  • Files can be locked manually by setting file attributes;
  • Minimum and maximum retention times can be set; and
  • Cohesity provides a unique RBAC role for Data Security administration.

DataLock on Backups

  • DataLock enables WORM functionality;
  • Prevent changes by locking Snapshots;
  • Applied via backup policy; and
  • Operations performed by Data Security administrators.

 

Ransomware Detection

Cohesity also recently announced the ability to look within Helios for Ransomware. The approach taken is as follows: Prevent. Detect. Respond.

Prevent

There’s some good stuff built into the platform to help prevent ransomware in the first place, including:

  • Immutable file system
  • DataLock (WORM)
  • Multi-factor authentication

Detect

  • Machine-driven anomaly detection (backup data, unstructured data)
  • Automated alert

Respond

  • Scalable file system to store years worth of backup copies
  • Google-like global actionable search
  • Instant mass restore

 

Thoughts and Further Reading

The conversation with Cohesity got a little spirited in places at Storage Field Day 18. This isn’t unusual, as Cohesity has had some problems in the past with various folks not getting what they’re on about. Is it data protection? Is it scale-out NAS? Is it an analytics platform? There’s a lot going on here, and plenty of people (both inside and outside Cohesity) have had a chop at articulating the real value of the solution. I’m not here to tell you what it is or isn’t. I do know that a lot of the cool stuff with Cohesity wasn’t readily apparent to me until I actually had some stick time with the platform and had a chance to see some of its key features in action.

The DataLock / Security and Compliance piece is interesting to me though. I’m continually asking vendors what they’re doing in terms of archive platforms. A lot of them look at me like I’m high. Why wouldn’t you just use software to dump your old files up to the cloud or onto some cheap and deep storage in your data centre? After all, aren’t we all using software-defined data centres now? That’s certainly an option, but what happens when that data gets zapped? What if the storage platform you’re using, or the software you’re using to store the archive data, goes bad and deletes the data you’re managing with it? Features such as DataLock can help with protecting you from some really bad things happening.

I don’t believe that data protection data should be treated as an “archive” as such, although I think that data protection platform vendors such as Cohesity are well placed to deliver “archive-like” solutions for enterprises that need to retain protection data for long periods of time. I still think that pushing archive data to another, dedicated, tier is a better option than simply calling old protection data “archival”. Given Cohesity’s NAS capabilities, it makes sense that they’d be an attractive storage target for dedicated archive software solutions.

I like what Cohesity have delivered to date in terms of a platform that can be used to deliver data insights to derive value for the business. I think sometimes the message is a little muddled, but in my opinion some of that is because everyone’s looking for something different from these kinds of platforms. And these kinds of platforms can do an awful lot of things nowadays, thanks in part to some pretty smart software and some grunty hardware. You can read some more about Cohesity’s Security and Compliance story here,  and there’s a fascinating (if a little dated) report from Cohasset Associates on Cohesity’s compliance capabilities that you can access here. My good friend Keith Townsend also provided some thoughts on Cohesity that you can read here.