Cohesity Is (Data)Locked In

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Cohesity recently presented at Storage Field Day 18. You can see their videos from Storage Field Day 18 here, and download a PDF copy of my rough notes from here.

 

The Cohesity Difference?

Cohesity covered a number of different topics in its presentation, and I thought I’d outline some of the Cohesity features before I jump into the meat and potatoes of my article. Some of the key things you get with Cohesity are:

  • Global space efficiency;
  • Data mobility;
  • Data resiliency & compliance;
  • Instant mass restore; and
  • Apps integration.

I’m going to cover 3 of the 5 here, and you can check the videos for details of the Cohesity MarketPlace and the Instant Mass Restore demonstration.

Global Space Efficiency

One of the big selling points for the Cohesity data platform is the ability to deliver data reduction and small file optimisation.

  • Global deduplication
    • Modes: inline, post-process
  • Archive to cloud is also deduplicated
  • Compression
    • Zstandard algorithm (read more about that here)
  • Small file optimisation
    • Better performance for reads and writes
    • Benefits from deduplication and compression

Data Mobility

There’s also an excellent story when it comes to data mobility, with the platform delivering the following data mobility features:

  • Data portability across clouds
  • Multi-cloud replication and archival (1:many)
  • Integrated indexing and search across locations

You also get simultaneous, multi-protocol access and a comprehensive set of file permissions to work with.

 

But What About Archives And Stuff?

Okay, so all of that stuff is really cool, and I could stop there and you’d probably be happy enough that Cohesity delivers the goods when it comes to a secondary storage platform that delivers a variety of features. In my opinion, though, it gets a lot more interesting when you have a look at some of the archival features that are built into the platform.

Flexible Archive Solutions

  • Archive either on-premises or to cloud;
  • Policy driven archival schedule for long term data retention
  • Data an be retrieved to the same or a different Cohesity cluster; and
  • Archived data is subject to further deduplication.

Data Resiliency and Compliance – ensures data integrity

  • Erasure coding;
  • Highly available; and
  • DataLock and legal hold.

Achieving Compliance with File-level DataLock

In my opinion, DataLock is where it gets interesting in terms of archive compliance.

  • DataLock enables WORM functionality at a file level;
  • DataLock adheres to regulatory acts;
  • Can automatically lock a file after a period of inactivity;
  • Files can be locked manually by setting file attributes;
  • Minimum and maximum retention times can be set; and
  • Cohesity provides a unique RBAC role for Data Security administration.

DataLock on Backups

  • DataLock enables WORM functionality;
  • Prevent changes by locking Snapshots;
  • Applied via backup policy; and
  • Operations performed by Data Security administrators.

 

Ransomware Detection

Cohesity also recently announced the ability to look within Helios for Ransomware. The approach taken is as follows: Prevent. Detect. Respond.

Prevent

There’s some good stuff built into the platform to help prevent ransomware in the first place, including:

  • Immutable file system
  • DataLock (WORM)
  • Multi-factor authentication

Detect

  • Machine-driven anomaly detection (backup data, unstructured data)
  • Automated alert

Respond

  • Scalable file system to store years worth of backup copies
  • Google-like global actionable search
  • Instant mass restore

 

Thoughts and Further Reading

The conversation with Cohesity got a little spirited in places at Storage Field Day 18. This isn’t unusual, as Cohesity has had some problems in the past with various folks not getting what they’re on about. Is it data protection? Is it scale-out NAS? Is it an analytics platform? There’s a lot going on here, and plenty of people (both inside and outside Cohesity) have had a chop at articulating the real value of the solution. I’m not here to tell you what it is or isn’t. I do know that a lot of the cool stuff with Cohesity wasn’t readily apparent to me until I actually had some stick time with the platform and had a chance to see some of its key features in action.

The DataLock / Security and Compliance piece is interesting to me though. I’m continually asking vendors what they’re doing in terms of archive platforms. A lot of them look at me like I’m high. Why wouldn’t you just use software to dump your old files up to the cloud or onto some cheap and deep storage in your data centre? After all, aren’t we all using software-defined data centres now? That’s certainly an option, but what happens when that data gets zapped? What if the storage platform you’re using, or the software you’re using to store the archive data, goes bad and deletes the data you’re managing with it? Features such as DataLock can help with protecting you from some really bad things happening.

I don’t believe that data protection data should be treated as an “archive” as such, although I think that data protection platform vendors such as Cohesity are well placed to deliver “archive-like” solutions for enterprises that need to retain protection data for long periods of time. I still think that pushing archive data to another, dedicated, tier is a better option than simply calling old protection data “archival”. Given Cohesity’s NAS capabilities, it makes sense that they’d be an attractive storage target for dedicated archive software solutions.

I like what Cohesity have delivered to date in terms of a platform that can be used to deliver data insights to derive value for the business. I think sometimes the message is a little muddled, but in my opinion some of that is because everyone’s looking for something different from these kinds of platforms. And these kinds of platforms can do an awful lot of things nowadays, thanks in part to some pretty smart software and some grunty hardware. You can read some more about Cohesity’s Security and Compliance story here,  and there’s a fascinating (if a little dated) report from Cohasset Associates on Cohesity’s compliance capabilities that you can access here. My good friend Keith Townsend also provided some thoughts on Cohesity that you can read here.

Rubrik Basics – Archival Locations

I’ve been doing some work with Rubrik in our lab and thought it worth covering some of the basic features that I think are pretty neat. In this edition of Rubrik Basics, I thought I’d quickly cover off how to get started with the Archival Locations feature. You can read the datasheet here.

 

Rubrik and Archiving Policies

So what can you do with Archival Locations? Well, the idea is that you can copy data to another location for safe-keeping. Normally this data will live in that location for a longer period than it will in the on-premises Brik you’re using. You might, for example, keep data on your appliance for 30 days, and have archive data living in a cloud location for another 2 years.

 

Archival Location Support

Rubrik supports a variety of Archival Locations, including:

  • Public Cloud: Amazon Web Services S3, S3-IA, S3-RRS and Glacier; Microsoft Azure Blob Storage LRS, ZRS and GRS; Google Cloud Platform Nearline, Coldline, Multi-Regional and Regional; (also includes support for Government Cloud Options in AWS and Azure);
  • Private Cloud (S3 Object Store): Basho Riak, Cleversafe, Cloudian, EMC ECS, Hitachi Content Platform, IIJ GIO, Red Hat Ceph, Scality;
  • NFS: Any NFS v3 Compliant Target; and
  • Tape: All Major Tape Vendors via QStar.

What’s cool is that multiple, active archival locations can be configured for a Rubrik cluster. You can then select an archival location when an SLA policy is created or edited. This is particularly useful when you have a number of different tenants hosted on the same Brik.

 

Setup

To setup an Archival Location, click on the “Gear” icon in the Rubrik interface (in this example I’m using Rubrik CDM 4.1) and select “Archival Locations”.

Click on the + sign.

You can then choose the archival type, selecting from Amazon S3 (or Glacier), Azure, Google Cloud Platform, NFS or Tape (via QStar). In this example I’m setting up an Amazon S3 bucket.

You then need to select the Region and Storage Class, and provide your AWS Access Key, Secret Key and S3 Bucket.

You also need to choose the encryption type. I’m not using an external KMS in our lab, so I’ve used OpenSSL to generate a key using the following command.

Once you run that command, paste the contents of the PEM file.

Once you’ve added the location, you’ll see it listed, along with some high level statistics.

Once you have an Archival Location configured, you can add it to existing SLA Domains, or use it when you create a new SLA Domain.

Instant Archive

The Instant Archive feature can also be used to immediately queue a task to copy a new snapshot to a specified archival location. Note that the Instant Archive feature does not change the amount of time that a snapshot is retained locally on the Rubrik cluster. The Retention On Brik setting determines how long a snapshot is kept on the Rubrik cluster.

 

Thoughts

Rubrik’s Data Archival is flexible as well as simple to use. It’s easy to setup and works as promised. There is a bunch of stuff happening within the Rubrik environment that means that you can access protection data across multiple locations as well, so you might find that a combination of a Rubrik Brik and some cheap and deep NFS storage is a good option to store backup data for an extended period of time. You might also think about using this feature as a way to do data mobility or disaster recovery, depending on the type of disaster you’re trying to recover from.

Rubrik – Cloud Data What?

I’ve done a few posts on Cohesity in the past, and I have some friends who work at Rubrik. So it seemed like a good idea to put up a short article on what Rubrik do. Thanks to Andrew Miller at Rubrik for helping out with the background info.

 

The Platform

It’s converged hardware and software (called “Briks” – there are different models but 2RU (4 nodes) are the most common).

[image via Rubrik’s website]

The Rubrik solution:

  • Is fundamentally built on a scale out architecture;
  • Provides a built-in backup application/catalogue with deduplication and compression;
  • Uses a custom file system, distributed task scheduler, distributed metadata, etc;
  • Delivers cloud native archiving, policy driven at the core around imperative vs. declarative;
  • Can leverage cloud native archive (with native hooks into AWS/Azure/etc.);
  • Has a custom VSS provider to help with STUN (super VMware friendly),
  • Provides a native API since day one (REST-based), and along with vSphere (VADP, CBT, NBDSSL), handles SQL and Linux natively (there’s apparently more to come on that front); and
  • There’s an edge appliance for ROBO, amongst other things.

 

Cloud Data Management

Rubrik position their solution as “Cloud Data Management”.

In a similar fashion to Cohesity, Rubrik are focused on a bunch of stuff, not just backup and recovery or copy data management. There’s a bunch of stuff you can do around archive and compliance, and Rubrik tell me the search capabilities are pretty good too.

It also works well with technologies such as VMware vSAN. Chris Wahl and Cormac Hogan wrote a whitepaper on the integration that you can get here (registration required).

 

Thoughts

As you can see from this post there’s a lot to look into with Rubrik (and Cohesity for that matter) and I’ve really only barely scratched the surface. The rising popularity of smarter secondary storage solutions such as these points to a desire in the marketplace to get sprawling data under control via policy rather than simple tiers of disk. This is a good thing. Add in the heavy focus on API-based control and I think we’re in for exciting times (or as exciting as this kind of stuff gets in any case). If you’re interested in some of what you can do with Rubrik there’s a playlist on YouTube with some demos that give a reasonable view of what you can do. I’m hoping to dig a little deeper into the Rubrik solution in the next little while, and I’m particularly interested to see what it can do from an archiving perspective, so stay tuned.