Storage Field Day 18 was a little while ago, but that doesn’t mean that the things that were presented there are no longer of interest. Stephen Foskett wrote a great piece on IBM’s approach to data protection with Spectrum Protect Plus that’s worth read.
Speaking of data protection, it’s not just for big computers. Preston wrote a great article on the iOS recovery process that you can read here. As someone who had to recently recover my phone, I agree entirely with the idea that re-downloading apps from the app store is not a recovery process.
NetApp were recently named a leader in the Gartner Magic Quadrant for Primary Storage. Say what you will about the MQ, a lot of folks are still reading this report and using it to help drive their decision-making activities. You can grab a copy of the report from NetApp here. Speaking of NetApp, I’m happy to announce that I’m now a member of the NetApp A-Team. I’m looking forward to doing a lot more with NetApp in terms of both my day job and the blog.
Disclaimer: I recently attended Storage Field Day 18. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
This is a quick post to say thanks once again to Stephen and Ben, and the presenters at Storage Field Day 18. I had a super fun and educational time. For easy reference, here’s a list of the posts I did covering the events (they may not match the order of the presentations).
Also, here’s a number of links to posts by my fellow delegates (in no particular order). They’re all very smart people, and you should check out their stuff, particularly if you haven’t before. I’ll attempt to keep this updated as more posts are published. But if it gets stale, the Storage Field Day 18 landing page will have updated links.
Disclaimer: I recently attended Storage Field Day 18. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Datera recently presented at Storage Field Day 18. You can see videos of their presentation here, and download my rough notes from here.
Enterprise Software-Defined Storage
Datera position themselves as delivering “Enterprise Software-Defined Storage”. But what does that really mean? Enterprise IT gives you:
High Performance
Enterprise Features
QoS
Fault Domains
Stretched Cluster
L3 Networking
Deduplication
Replication
HA
Resiliency
Software-defined storage gives you:
Automation
DC Awareness Agility
Continuous Availability
Targeted Data Placement
Continuous Optimisation
Rapid technology adoption
Combine both of these and you get Datera.
[image courtesy of Datera]
Why Datera?
There are some other features built in to the platform that differentiate Datera’s offering, including:
L3 Networking – Datera brings standard protocols with modern networking to data centre storage. Resources are designed to float to allow for agility, availability, and scalability.
Policy-based Operations – Datera was built from day 1 with policy controls and policy templates to easy operations at scale while maintaining agility and availability.
Targeted Data Placement – ensure data is distributed correctly across the physical infrastructure to meet policies around perfromance, availability, data protection while controlling cost
Thoughts and Further Reading
I’ve waxed lyrical about Datera’s intent-based approach previously. I like the idea that they’re positioning themselves as “Enterprise SDS”. While my day job is now at a service provider, I spent a lot of time in enterprise shops getting crusty applications to keep on running, as best as they could, on equally crusty storage arrays. Something like Datera comes along with a cool hybrid storage approach and the enterprise guys get a little nervous. They want replication, they want resiliency, they want to apply QoS policies to it.
The software-defined data centre is the darling architecture of the private cloud world. Everyone wants to work with infrastructure that can be easily automated, highly available, and extremely scalable. Historically, some of these features have flown in the face of what the enterprise wants: stability, performance, resiliency. The enterprise guys aren’t super keen on updating platforms in the middle of the day. They want to buy multiples of infrastructure components. And they want multiple sets of infrastructure protecting applications. They aren’t that far away from those software-defined folks in any case.
The ability to combine continuous optimisation with high availability is a neat part of Datera’s value proposition. Like a number of software-defined storage solutions, the ability to rapidly iterate new features within the platform, while maintaining that “enterprise” feel in terms of stability and resiliency, is a pretty cool thing. Datera are working hard to bring the best of both worlds together, and managing to deliver the agility that enterprise wants, while maintaining the availability within the infrastructure that they crave.
I’ve spoken at length before about the brutally slow pace of working in some enterprise storage shops. Operations staff are constantly being handed steamers from under-resourced or inexperienced project delivery staff. Change management people are crippling the pace. And the CIO wants to know why you’ve not moved your SQL 2005 environment to AWS. There are some very good reasons why things work the way they do (and also some very bad ones), and innovation can be painfully hard to make happen in these environments. The private cloud kids, on the other hand, are all in on the fast paced, fail fast, software-defined life. They’ve theoretically got it all humming along without a whole lot of involvement on a daily basis. Sure, they’re living on the edge (do I sound old and curmudgeonly yet?). In my opinion, Datera are doing a pretty decent job of bringing these two worlds together. I’m looking forward to seeing what they do in the next 12 months to progress that endeavour.
Disclaimer: I recently attended Storage Field Day 18. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
WekaIO recently presented at Storage Field Day 18. You can see videos of their presentation here, and download my rough notes from here. I’ve written about WekaIO before, and you can read those posts here and here.
WekaIO
Barbara Murphy described WekaIO Matrix as “the fastest, most scalable parallel file system for AI and technical compute workloads that ensure applications never wait for data”.
What They Do
So what exactly does WekaIO Matrix do?
WekaIO Matrix is software-defined storage solution that runs on anything from bare metal, VMs, containers, on-premises or in the cloud;
Fully-coherent POSIX file system that’s faster than a local file system;
Distributed Coding, More Resilient at Scale, Fast Rebuilds, End-to-End Data Protection; and
InfiniBand or Ethernet, Converged or Dedicated, on-premises or cloud.
[image courtesy of WekaIO]
Lots of Features
WekaIO Matrix now has a bunch of features, including:
Support for S3, SMB, and NFS protocols;
Cloud backup, Snapshots, Clones, and Snap-2-Obj;
Active Directory support and authentication;
POSIX;
Network High Availability;
Encryption;
Quotas;
HDFS; and
Tiering.
Flexible deployment models
Appliance model – compute and storage on separate infrastructure; and
Converged model – compute and storage on shared infrastructure.
Both models are cloud native because “[e]verybody wants the ability to be able to move to the cloud, or leverage the cloud”
Architectural Considerations
WekaIO is focused on delivering super fast storage via NVMe-oF, and say that NFS and SMB deliver legacy protocol support for convenience.
The Front-End
WekaIO front-ends are cluster-aware
Incoming read requests optimised re location and loading conditions – incoming writes can go anywhere
No “cylinder groups” or other anachronisms – data protection (similar to EC)
3-16 data drives, +2 or +4 parity drives
Optional hot spares – uses a “virtual” hot spare
Global namespace = hot tier + Object storage tier
Tiering to S3-API Object storage
Additional capacity with lower cost per GB
Files shared to object storage layer (parallelised access optimise performances, simplifies partial or offset reads)
WekaIO uses the S3-API as its equivalent of “SCSI” for HDD.
Conclusion and Further Reading
I like the WekaIO story. They take away a lot of the overheads associated with non-DAS storage through the use of a file system and control of the hardware. You can make DAS run really fast, but it’s invariably limited to the box that it’s in. Scale-out pools of storage still have a place, particularly in the enterprise, and WekaIO are demonstrating that the performance is there for the applications that need it. There’s a good story in terms of scale, performance, and enterprise resilience features.
Perhaps you like what you see with WekaIO Matrix but don’t want to run stuff on-premises? There’s a good story to be had with Matrix on AWS as well. You’ll be able to get some serious performance, and chances are it will fit in nicely with your cloud-native application workflow.
WekaIO continues to evolve, and I like seeing the progress they’ve been making to this point. It’s not always easy to convince the DAS folks that you can deliver a massively parallel file system and storage solution based on commodity hardware, but WekaIO are giving it a real shake. I recommend checking out Chris M. Evans’stake on WekaIO as well.
Disclaimer: I recently attended Storage Field Day 18. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
VAST Data have a solution that basically offers massive scale with Tier 1 performance, without the cost traditionally associated with Tier 1 storage.
Foundational Pieces
Some of the key pieces of the solution are technologies that weren’t commonly available until recently, including:
NVMe-oF – DC-scale storage protocol that enables remote NVMe devices to be accessed with direct attached performance.
QLC Flash – A new Flash architecture that costs less than enterprise Flash while delivering enterprise levels of performance.
Storage Class Memory – Persistent, NVMe memory that can be used to reliably buffer perfect writes to QLC and create large, global metadata structures to enable added efficiency.
If you read their blog post, you’ll notice that there are some interesting ideas behind the VAST Data solution, including the ideas that:
Flash is the only media that can be used to bring the cost of storage under what people pay today for HDD-based systems.
NFS and S3 can be used for applications that up until now required a level of performance that could only come from block storage.
Low-endurance QLC flash can be used for even the most transactional of workloads.
Storage computing can be disaggregated from storage media to enable greater simplicity than shared-nothing and hyper-converged architectures.
Data protection codes can reduce overhead to only 2% while enabling levels of resiliency 10 orders of magnitude more than classic RAID.
Compressed files provide evidence that data can be reduced further when viewed on a global scale.
Parallel storage architectures can be built without any amount of code parallelism.
Customers can build shared storage architectures that can compose and assign dedicated performance and security isolation to tenants on the fly.
One well-engineered, scalable storage system can be ‘universal’ and can enable a diverse array of workloads and requirements.
Architecture
[image courtesy of VAST Data]
VAST Servers – A cluster can be built with 2- 10,000 stateless servers. Servers can be collocated with applications as containers and made to auto-scale with application demand.
NVMe Fabric – A scalable, shared-everything cluster can be built by connecting every server and device in the cluster over commodity data center networks (Ethernet or InfiniBand).
NVMe Enclosures – Highly-Available NVMe Enclosures manage over one usable PB per RU. Enclosures can be scaled independent of Servers and clusters can be built to manage exabytes.
Rapid Rebuild Encoding
VAST codes accelerate rebuild speed by using a new type of algorithm that gets faster with more redundancy data. Everything is fail-in-place.
150+4: 3x faster than HDD erasure rebuilds, 2.7% overhead
500+10: 2x faster than HDD erasure rebuilds, 2% overhead Additional redundancy enables MTBF of over 100,000 years at scale.
Software-Defined – enclosures and container software
Software-only – run VAST SW on certified QLC hardware
Specifications
The storage is the VAST DF-5615 Active / Active NVMe Enclosure.
[image courtesy of VAST Data]
I/O Modules
2 x Active/Active IO Modules
I/O Connectivity
4 x 100Gb Ethernet or 4 x 100Gb InfiniBand
Management (optional)
4 x 1GbE
NVMe Flash Storage
44 x 15.36TB QLC Flash
NVMe Persistent Memory
12 x 1.5TB U.2 Devices
Dimensions (without cable mgmt.)
2U Rackmount
H: 3.2”, W: 17.6”, D: 37.4”
Weight
85 lbs.
Power Supplies
4 x 1500W
Power Consumption
1200W Avg / 1450W Max
Maximum Scale
Up to 1,000 Enclosures
Compute is housed in the VAST Quad Server Chassis.
[image courtesy of VAST Data]
Servers
4 x Stateless VAST Servers
I/O Connectivity
8 x 50 Gb Ethernet 4 x 100 Gb InfiniBand
Management (optional)
4 x 1GbE
Physical CPU Cores
80 x 2.4 GHz
Memory
32 x 32GB 2400 MHz RDIMM
Dimensions
2U Rackmount
H: 3.42”, W: 17.24”, D: 28.86”
Weight
78 lbs.
Power Supplies
2 x 1600W
Power Consumption
750W Avg / 900W Max
Maximum Scale
Up to 10,000 VAST Servers
Thoughts And Other Reading
One of my favourite things about the VAST Data story is the fact that they’re all in on a greenfield approach to storage architecture. Their ace in the hole is that they’re leveraging Persistent Memory, QLC and NVMe-oF to make it all work. Coupled with the disaggregated shared everything architecture, this seems to me like a fresh approach to storage. There are also some flexible options available for deployment. I haven’t seen what the commercials look like for this solution, so I can’t put my hand on my heart and tell you that this will be cheaper than a mechanical drive based solution. That said, the folks working at VAST have some good experience with doing smart things with Flash, and if anyone can make this work, they can. I look forward to reading more about VAST Data, particularly when they get some more customers that can publicly talk about what they’re doing. It also helps that my friend Howard has joined the company. In my opinion that says a lot about what they have to offer.
VAST Data have published a reasonably comprehensive overview of their soilution that can be found here. There’s also a good overview of VAST Data by Chris Mellor that you can read here. You can also read more from Chris here, and here. Glenn K. Lockwood provides one of the best overviews on VAST Data you can read here.
Disclaimer: I recently attended Storage Field Day 18. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Scott Hamilton (Senior Director, Product Management) spoke to the delegates about Western Digital’s vision for composable infrastructure. I’m the first to admit that I haven’t really paid enough attention to composability in the recent past, although I do know that it messes with my computer’s spell check mechanism – so it must be new and disruptive.
There’s Work To Be Done
Hamilton spoke a little about the increasingly dynamic workloads in the DC, with a recent study showing that:
45% of compute hours and storage capacity are utilised
70% report inefficiencies in the time required to provision compute and storage resources
There are clearly greater demands on:
Scalability
Efficiency
Agility
Performance
Path to Composability
I remember a few years ago when I was presenting to customers about hyper-converged solutions. I’d talk about the path to HCI, with build it yourself being the first step, followed by converged, and then hyper-converged. The path to Composable is similar, with converged, and hyper-converged being the precursor architectures in the modern DC.
Converged
Preconfigured hardware / software for a specific application and workload (think EMC Vblock or NetApp FlexPod)
Hyper-Converged
Software-defined with deeper levels of abstraction and automation (think Nutanix or EMC’s VxRail)
Composable
Disaggregated compute and storage resources
Shared pool of resources that can be composed and made available on demand
[image courtesy of Western Digital]
The idea is that you have a bunch of disaggregated resources that can be really used as a pool for various applications or hosts. In this architecture, there are
No physical systems – only composed systems;
No established hierarchy – CPU doesn’t own the GPU or the memory; and
All elements are peers on the network and they communicate with each other.
Can You See It?
Western Digital outlined their vision for composable infrastructure thusly:
Composable Infrastructure Vision
Open – open in both form factor and API for management and orchestration of composable resources
Scalable – independent performance and capacity scaling from rack-level to multi-rack
Disaggregated – true disaggregation of storage and compute for independent scaling to maximise efficiency, agility snd to reduce TCO
Extensible – flash, disk and future compassable entities can be independently scaled, managed and shared over the same fabric
Western Digital’s Open Composability API is also designed for DC Composability, with:
Logical composability of resources abstracted from the underlying physical hardware, and
It discovers, assembles, and composes self-virtualised resources via peer-to-peer communication.
The idea is that it enables virtual system composition of existing HCI and Next-generation SCI environments. It also
Future proofs the transition from hyper-converged to disaggregated architectures
Complements existing Redfish / Swordfish usage
You can read more about OpenFlex here. There’s also an excellent technical brief from Western Digital that you can access here.
OpenFlex Composable Infrastructure
We’re talking about infrastructure to support an architecture though. In this instance, Western Digital offer the:
OpenFlex F3000 – Fabric device and enclosure; and
OpenFlex D3000 – High capacity for big data
F3000 and E3000
The F3000 and E3000 (F is for Flash Fabric and E is for Enclosure) has the following specification:
Western Digital covered an awful lot of ground in their presentation at Storage Field Day 18. I like the story behind a lot of what they’re selling, particularly the storage part of it. I’m still playing wait and see when it comes to the composability story. I’m a massive fan of the concept. It’s my opinion that virtualisation gave us an inkling of what could be done in terms of DC resource consumption, but there’s still an awful lot of resources wasted in modern deployments. Technologies such as containers help a bit with that resource control issue, but I’m not sure the enterprise can effectively leverage them in their current iteration, primarily because the enterprise is very, well, enterprise-y.
Composability, on the other hand, might just be the kind of thing that can free the average enterprise IT shop from the shackles of resource management ineptitude that they’ve traditionally struggled with. Much like the public cloud has helped (and created consumption problems), so too could composable infrastructure. This is assuming that we don’t try and slap older style thinking on top of the infrastructure. I’ve seen environments where operations staff needed to submit change requests to perform vMotions of VMs from one host to another. So, like anything, some super cool technology isn’t going to magically fix your broken processes. But the idea is so cool, and if companies like Western Digital can continue to push the boundaries of what’s possible with the infrastructure, there’s at least a chance that things will improve.
If you’d like to read more about the storage-y part of Western Digital, check out Chin-Fah’s post here, Erik’s post here, and Jon’s post here. There was also some talk about dual actuator drives as well. Matt Leib wrote some thoughts on that. Look for more in this space, as I think it’s starting to really heat up.
Disclaimer: I recently attended Storage Field Day 18. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
IBM recently presented at Storage Field Day 18. You can see videos of their presentation here, and download my rough notes from here.
We Want A Lot From Data Protection
Data protection isn’t just about periodic protection of applications or files any more. Or, at the very least, we seem to want more than that from our data protection solutions. We want:
Application / data recovery – providing data availability;
Disaster Recovery – recovering from a minor to major data loss;
BCP – reducing the risk to the business, employees, market perception;
Application / data reuse – utilise for new routes to market; and
Cyber resiliency – recover the business from a compromised attack.
There’s a lot to cover there. And it could be argued that you’d need five different solutions to meet those requirements successfully. With IBM Spectrum Protect Plus (SPP) though, you’re able to meet a number of those requirements.
There’s Much That Can Be Done
IBM are positioning SPP as a tool that can help you extend your protection options beyond the traditional periodic data protection solution. You can use it for:
Data management / operational recovery – modernise and expanded use cases with instant data access, instant recovery leveraging snapshots;
Backup – traditional backup / recovery using streaming backups; and
Archive – long-term data retention / compliance, corporate governance.
Key Design Principles
Easy Setup
Deploy Anywhere: virtual appliance, cloud, bare metal;
Zero touch application agents;
Automated deployment for IBM Cloud for VMware; and
Protect databases and applications hosted on-premises or in cloud;
Incremental forever using native hypervisor, database, and OS APIs; and
Efficient data reduction using deduplication and compression.
The benefits of this include:
Efficiency through reduced storage and network usage;
Stringent RPOs compliance with a reduced backup window; and
Application backup with multi-cloud portability.
Manage
Centralised, SLA-driven management;
Simple, secure RBAC based user self service; and
Lifecycle management of space efficient point-in-time snapshots.
The benefits of this include:
Lower TCO by reducing operational costs;
Consistent management / governance of multi-cloud environments; and
Secure by design with RBAC.
Recover, Reuse
Instant access / sandbox for DevOps and test environments;
Recover applications in cloud or data centre; and
Global file search and recovery.
The benefits of this include:
Improved RTO via instant access;
Eliminate time finding the right copy (file search across all snapshots with a globally indexed namespace);
Data reuse (versus backup as just an insurance policy); and
Improved agility; efficiently capture and use copy of production data for test.
One Workflow, Multiple Use Cases
There’s a lot you can with SPP, and the following diagram shows the breadth of the solution.
[image courtesy of IBM]
Thoughts and Further Reading
When I first encountered IBM SPP at Storage Field Day 15, I was impressed with their approach to policy-driven protection. It’s my opinion that we’re asking more and more of modern data protection solutions. We don’t just want to use them as insurance for our data and applications any more. We want to extract value from the data. We want to use the data as part of test and development workflows. And we want to manipulate the data we’re protecting in ways that have proven difficult in years gone by. It’s not just about having a secondary copy of an important file sitting somewhere safe. Nor is it just about using that data to refresh an application so we can test it with current business problems. It’s all of those things and more. This add complexity to the solution, as many people who’ve administered data protection solutions have found out over the years. To this end, IBM have worked hard with SPP to ensure that it’s a relatively simple process to get up and running, and that you can do what you need out of the box with minimal fuss.
If you’re already operating in the IBM ecosystem, a solution like SPP can make a lot of sense, as there are some excellent integration points available with other parts of the IBM portfolio. That said, there’s no reason you can’t benefit from SPP as a standalone offering. All of the normal features you’d expect in a modern data protection platform are present, and there’s good support for enhanced protection use cases, such as analytics.
Enrico had some interesting thoughts on IBM’s data protection lineup here, and Chin-Fah had a bit to say here.
Disclaimer: I recently attended Storage Field Day 18. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
StorPool recently presented at Storage Field Day 18. You can see their videos from Storage Field Day 18 here, and download a PDF copy of my rough notes from here.
StorPool?
StorPool delivers block storage software. Fundamentally, it “pools the attached storage (hard disks or SSDs) of standard servers to create a single pool of shared block storage. The StorPool software is installed on each server in the cluster and combines the performance and capacity of all drives attached to the servers into one global namespace”. There’s a useful technical overview that you can read here.
[image courtesy of StorPool]
StorPool position themselves as a software company delivering scale-out, block storage software. They say they’ve been doing this before SDS / SDN / SDDC & “marketing-defined storage” were popular terms. The idea is that it is always delivered as a working storage solution on customer’s hardware. There are a few ways that the solution can be used, including:
The kind of data management features you’d expect from modern storage systems are present here as well, including:
Thin provisioning / reclaim;
Copy on Write snapshots, clones; and
Changed block tracking, incremental recovery, and transfer.
There’s also support for multi-site deployments:
Connect 2 or more StorPool clusters over public Internet; and
Send snapshots between clusters for backup and DR.
Developed from Scratch
One of the cool things about StorPool is that whole thing has been developed from scratch. They use their own on-disk format, protocol, quorum, client, etc. They’ve had systems running in production for 6+ years, as well as:
Numerous 1PB+ flash systems;
17 major releases; and
Global customers.
Who Uses It?
So who uses StorPool? Their target customers are companies building private and public clouds, including:
Service Providers and folk operating public clouds; and
Enterprises and various private cloud implementations.
That’s obviously a fairly broad spectrum of potential customers, but I think that speaks somewhat to the potential versatility of software-defined solutions.
Thoughts and Further Reading
“Software-defined” storage solutions have become more and more popular in the last few years. Customers seem to be getting more comfortable with using and supporting their own hardware (up to a point), and vendors seem to be more willing to position these kinds of solutions as viable, production-ready platforms. It helps tremendously, in my opinion, that a lot of the heavy lifting previously done with dedicated silicon on traditional storage systems can now be done by a core on an x86 or ARM-based CPU. And there seem to be a lot more cores going around, giving vendors the option to do a lot more with these software-defined systems too.
There are a number of benefits to adopting software-defined solutions, including the ability to move from one hardware supplier to another without the need to dramatically change the operation environment. There’s a good story to be had in terms of updates too, and it’s no secret that people like that they aren’t tied to the vendor’s professional services arm to get installations done in quite the same way they perhaps were with dedicated storage arrays. It’s important to remember, though, that software isn’t magic. If you throw cruddy hardware at a solution like StorPool, it’s not going to somehow exceed the limitations of that hardware. You still need to give it some grunt to get some good performance in return. That said, there are plenty of examples where software-defined solutions can be improved dramatically through code optimisations, without changing hardware at all.
The point of all this is that, whilst I don’t really think hardware-defined storage solutions are going anywhere for the moment, companies like StorPool are certainly delivering compelling solutions in code that mean you don’t need to be constrained by what the big box storage vendors are selling you. StorPool have put some careful consideration into the features they offer with their platform, and have also focused heavily on the possible performance that could be achieved with the solution. There’s a good resilience story there, and it seems to be very service provider-friendly. Of course, everyone’s situation is different, and not everyone will get what they need from something like StorPool. But if you’re in the market for a distributed block storage system, and have a particular hankering to run it on your own, preferred, flavour of hardware, something like StorPool is certainly worthy of further investigation. If you want to dig in a little more, I recommend checking out the resources section on the StorPool website – it’s packed with useful information. And have a look at Ray’s article as well.
Disclaimer: I recently attended Storage Field Day 18. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
NetApp recently presented at Storage Field Day 18. You can see their videos from Storage Field Day 18 here, and download a PDF copy of my rough notes from here.
Bye, Dave
We were lucky enough to have Dave Hitz (now “Founder Emeritus” at NetApp) spend time with us on his last day in the office. I’ve only met him a few times but I’ve always enjoyed listening to his perspectives on what’s happening in the industry.
Cloud First?
In a previous life I worked in a government department architecting storage and virtualisation solutions for a variety of infrastructure scenarios. The idea, generally speaking, was that those solutions would solve particular business problems, or at least help to improve the processes to resolve those problems. At some point, probably late 2008 or early 2009, we started to talk about developing a “Cloud First” architecture policy, with the idea being that we would resolve to adopt cloud technologies where we could, and reduce our reliance on on-premises solutions as time passed. The beauty of working in enterprise environments is that things can take an awfully long time to happen, so that policy didn’t really come into effect until some years later.
So what does cloud first really mean? It’s possibly not as straightforward as having a “virtualisation first” policy. With the virtualisation first approach, there was a simple qualification process we undertook to determine whether a particular workload was suited to run on our virtualisation platform. This involved all the standard stuff, like funding requirements, security constraints, anticipated performance needs, and licensing concerns. We then pushed the workload one of two ways. With cloud though, there are a few more ways you can skin the cat, and it’s becoming more obvious to me that cloud means different things to different people. Some people want to push workloads to the cloud because they have a requirement to reduce their capital expenditure. Some people have to move to cloud because the CIO has determined that there needs to be a reduction in the workforce managing infrastructure activities. Some people go to cloud because they saw a cool demo at a technology conference. Some people go to cloud because their peers in another government department told them it would be easy to do. The common thread is that “people’s paths to the cloud can be so different”.
Can your workload even run in the cloud? Hitz gave us a great example of some stuff that just can’t (a printing press). The printing press needs to pump out jobs at a certain time of the day every day. It’s not going to necessarily benefit from elastic scalability for its compute workload. The workloads driving the presses would likely run a static workload.
Should it run in the cloud?
It’s a good question to ask. Most of the time, I’d say the answer is yes. This isn’t just because I work for a telco selling cloud products. There are a tonne of benefits to be had in running various, generic workloads in the cloud. Hitz suggests though, that the should it question is a corporate strategy question, and I think he’s spot on. When you embed “cloud first” in your infrastructure architecture, you’re potentially impacting a bunch of stuff outside of infrastructure architecture, including financial models, workforce management, and corporate security postures. It diens’t have to be a big deal, but it’s something that people sometimes don’t think about. And just because you start with that as your mantra, doesn’t mean you need to end up in cloud.
Does It Feel Cloudy?
Cloudy? It’s my opinion that NetApp’s cloud story is underrated. But, as Hitz noted, they’ve had the occasional misstep. When they first introduced Cloud ONTAP, Anthony Lye said it “didn’t smell like cloud”. Instead, Hitz told us he said it “feels like a product for storage administrators”. Cloudy people don’t want that, and they don’t want to talk to storage administrators. Some cloudy people were formerly storage folks, and some have never had the misfortune of managing over-provisioned midrange arrays at scale. Cloud comes in all different flavours, but it’s clear that just shoving a traditional on-premises product on a public cloud provider’s infrastructure isn’t really as cloudy as we’d like to think.
Bridging The Gap
NetApp are focused now on “finding the space between the old and the new, and understanding that you’ll have both for a long time”. And that’s what NetApp’s focusing on moving forward. They’re not just working on cloud-only solutions, and they have no plans to ditch their on-premises. Indeed, as Hitz noted in his presentation, “having good cloudy solutions will help them gain share in on-premises footprint”. It’s a good strategy, as the on-premises market will be around for some time to come (do you like how vague that is?). It’s been my belief for some time that companies, like NetApp, that can participate in both the on-premises and cloud market effectively will be successful.
Thoughts and Further Reading
So why did I clumsily paraphrase a How To Destroy Angels song title and ramble on about the good old days of my career in this article instead of waxing lyrical about Charlotte Brooks’s presentation on NetApp Data Availability Services? I’m not exactly sure. I do recommend checking out Charlotte’s demo and presentation, because she’s really quite good at getting the message across, and NDAS looks pretty interesting.
Perhaps I spent the time focusing on the “cloud first” conversation because it was Dave Hitz, and it’s likely the last time I’ll see him presenting in this kind of forum. But whether it was Dave or not, conversations like this one are important, in my opinion. It often feels like we’re putting the technology ahead of the why. I’m a big fan of cloud first, but I’m an even bigger fan of people understanding the impact that their technology decisions can have on the business they’re working for. It’s nice to see a vendor who can comfortably operate on both sides of the equation having this kind of conversation, and I think it’s one that more businesses need to be having with their vendors and their internal staff.
Disclaimer: I recently attended Storage Field Day 18. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Cohesity recently presented at Storage Field Day 18. You can see their videos from Storage Field Day 18 here, and download a PDF copy of my rough notes from here.
The Cohesity Difference?
Cohesity covered a number of different topics in its presentation, and I thought I’d outline some of the Cohesity features before I jump into the meat and potatoes of my article. Some of the key things you get with Cohesity are:
Global space efficiency;
Data mobility;
Data resiliency & compliance;
Instant mass restore; and
Apps integration.
I’m going to cover 3 of the 5 here, and you can check the videos for details of the Cohesity MarketPlace and the Instant Mass Restore demonstration.
Global Space Efficiency
One of the big selling points for the Cohesity data platform is the ability to deliver data reduction and small file optimisation.
There’s also an excellent story when it comes to data mobility, with the platform delivering the following data mobility features:
Data portability across clouds
Multi-cloud replication and archival (1:many)
Integrated indexing and search across locations
You also get simultaneous, multi-protocol access and a comprehensive set of file permissions to work with.
But What About Archives And Stuff?
Okay, so all of that stuff is really cool, and I could stop there and you’d probably be happy enough that Cohesity delivers the goods when it comes to a secondary storage platform that delivers a variety of features. In my opinion, though, it gets a lot more interesting when you have a look at some of the archival features that are built into the platform.
Flexible Archive Solutions
Archive either on-premises or to cloud;
Policy driven archival schedule for long term data retention
Data an be retrieved to the same or a different Cohesity cluster; and
Archived data is subject to further deduplication.
Data Resiliency and Compliance – ensures data integrity
Erasure coding;
Highly available; and
DataLock and legal hold.
Achieving Compliance with File-level DataLock
In my opinion, DataLock is where it gets interesting in terms of archive compliance.
DataLock enables WORM functionality at a file level;
DataLock adheres to regulatory acts;
Can automatically lock a file after a period of inactivity;
Files can be locked manually by setting file attributes;
Minimum and maximum retention times can be set; and
Cohesity provides a unique RBAC role for Data Security administration.
DataLock on Backups
DataLock enables WORM functionality;
Prevent changes by locking Snapshots;
Applied via backup policy; and
Operations performed by Data Security administrators.
Scalable file system to store years worth of backup copies
Google-like global actionable search
Instant mass restore
Thoughts and Further Reading
The conversation with Cohesity got a little spirited in places at Storage Field Day 18. This isn’t unusual, as Cohesity has had some problems in the past with various folks not getting what they’re on about. Is it data protection? Is it scale-out NAS? Is it an analytics platform? There’s a lot going on here, and plenty of people (both inside and outside Cohesity) have had a chop at articulating the real value of the solution. I’m not here to tell you what it is or isn’t. I do know that a lot of the cool stuff with Cohesity wasn’t readily apparent to me until I actually had some stick time with the platform and had a chance to see some of its key features in action.
The DataLock / Security and Compliance piece is interesting to me though. I’m continually asking vendors what they’re doing in terms of archive platforms. A lot of them look at me like I’m high. Why wouldn’t you just use software to dump your old files up to the cloud or onto some cheap and deep storage in your data centre? After all, aren’t we all using software-defined data centres now? That’s certainly an option, but what happens when that data gets zapped? What if the storage platform you’re using, or the software you’re using to store the archive data, goes bad and deletes the data you’re managing with it? Features such as DataLock can help with protecting you from some really bad things happening.
I don’t believe that data protection data should be treated as an “archive” as such, although I think that data protection platform vendors such as Cohesity are well placed to deliver “archive-like” solutions for enterprises that need to retain protection data for long periods of time. I still think that pushing archive data to another, dedicated, tier is a better option than simply calling old protection data “archival”. Given Cohesity’s NAS capabilities, it makes sense that they’d be an attractive storage target for dedicated archive software solutions.
I like what Cohesity have delivered to date in terms of a platform that can be used to deliver data insights to derive value for the business. I think sometimes the message is a little muddled, but in my opinion some of that is because everyone’s looking for something different from these kinds of platforms. And these kinds of platforms can do an awful lot of things nowadays, thanks in part to some pretty smart software and some grunty hardware. You can read some more about Cohesity’s Security and Compliance story here, and there’s a fascinating (if a little dated) report from Cohasset Associates on Cohesity’s compliance capabilities that you can access here. My good friend Keith Townsend also provided some thoughts on Cohesity that you can read here.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.