Rubrik – Cloud Data What?

I’ve done a few posts on Cohesity in the past, and I have some friends who work at Rubrik. So it seemed like a good idea to put up a short article on what Rubrik do. Thanks to Andrew Miller at Rubrik for helping out with the background info.


The Platform

It’s converged hardware and software (called “Briks” – there are different models but 2RU (4 nodes) are the most common).

[image via Rubrik’s website]

The Rubrik solution:

  • Is fundamentally built on a scale out architecture;
  • Provides a built-in backup application/catalogue with deduplication and compression;
  • Uses a custom file system, distributed task scheduler, distributed metadata, etc;
  • Delivers cloud native archiving, policy driven at the core around imperative vs. declarative;
  • Can leverage cloud native archive (with native hooks into AWS/Azure/etc.);
  • Has a custom VSS provider to help with STUN (super VMware friendly),
  • Provides a native API since day one (REST-based), and along with vSphere (VADP, CBT, NBDSSL), handles SQL and Linux natively (there’s apparently more to come on that front); and
  • There’s an edge appliance for ROBO, amongst other things.


Cloud Data Management

Rubrik position their solution as “Cloud Data Management”.

In a similar fashion to Cohesity, Rubrik are focused on a bunch of stuff, not just backup and recovery or copy data management. There’s a bunch of stuff you can do around archive and compliance, and Rubrik tell me the search capabilities are pretty good too.

It also works well with technologies such as VMware vSAN. Chris Wahl and Cormac Hogan wrote a whitepaper on the integration that you can get here (registration required).



As you can see from this post there’s a lot to look into with Rubrik (and Cohesity for that matter) and I’ve really only barely scratched the surface. The rising popularity of smarter secondary storage solutions such as these points to a desire in the marketplace to get sprawling data under control via policy rather than simple tiers of disk. This is a good thing. Add in the heavy focus on API-based control and I think we’re in for exciting times (or as exciting as this kind of stuff gets in any case). If you’re interested in some of what you can do with Rubrik there’s a playlist on YouTube with some demos that give a reasonable view of what you can do. I’m hoping to dig a little deeper into the Rubrik solution in the next little while, and I’m particularly interested to see what it can do from an archiving perspective, so stay tuned.

Cohesity Continues to Evolve


I’ve been following Cohesity for some time now, and have covered a number of their product announcements and saw them in action at Storage Field Day 8. They announced version 3.0 at the end of June, and Gaetan Castelein kindly offered to give me a brief on where they’re at in the lead up to VMworld US.


What’s a Cohesity?

Cohesity’s goal is to take the complexity out of secondary storage. They argue that SDS has done a good job of this on primary storage platforms, but we’ve all ignored the issues around running secondary storage. The primary vehicle for this is Cohesity DataPlatform, combined with Cohesity DataProtect. Cohesity have a number of use cases for the platform that they cover, and I thought it might be handy to go over these here.


Use Case 1 – DataPlatform as a “better backup target”


Cohesity are taking aim at the likes of Data Domain, and are keen to replace them as backup targets. Cohesity tell me that DataPlatform offers the following features:

  • Scale-out platform (with no single point of failure), simple capacity planning, no forklift upgrades;
  • Global deduplication;
  • Native cloud integration;
  • High performance with parallelized ingest; and
  • QoS and multitenancy.

These all seem like nice things to have.


Use Case 2 – Simpler Data Protection


Cohesity tell me that the DataPlatform also makes a great option for VMware-based backups, providing data protection folks with the ability to leverage the following features:

  • Converged infrastructure with single pane of glass;
  • Policy-based automation;
  • Fast SLAs (15 min RPOs and instantaneous RTOs); and
  • Productive data (instant clones for test/dev, deep visibility into the data for indexing, custom analytics, etc).

While the single pane of glass often becomes the single pain, the last point about making data productive, depending on the environment you’re working in, is particularly important. There’re a tonne of enterprises out there where people are following some mighty cumbersome processes on snapshots of data to do analytics on the data. Any platform that makes this easier and more accessible seems like a great idea.


Use Case 3 – NFS & SMB Interfaces


You can also use the DataPlatform for file consolidation. Cohesity have even started positioning a combination of VMware VSAN as your primary storage platform (great for running VMs), with Cohesity offering secondary storage and the ability to deliver it over SMB or NFS. You can read more about this here.


Use Case 4 – Test/Dev


Cohesity’s first foray into the market revolved around providing enhanced capabilities for developers, and this remains a key selling point of the platform, with a full set of APIs exposed (which can be easily leveraged for use with Chef, Puppet, etc).


Use Case 5 – Analytics
Analytics have also been a major part of Cohesity’s early forays into secondary storage, with native reporting providing:

  • Utilization metrics (storage utilization, capacity forecasting); and
  • Performance metrics (ingrest rates, date reduction, IOPS, latency).

There’s also content indexing and search, providing data indexing (index upon ingest, VM and file metadata, files within VMs), and “Google-like” search. You can also access an analytics workbench with built-in MapReduce.


What Have You Done For Me Lately?

So with the Cohesity 3.0 Announcement a bunch of expanded application and OS integrations were announced, with a particular focus on SQL, Exchange, SharePoint, MS Windows, Linux, Oracle DBs (RMAN and remote adapter). Here’s a table that Cohesity provided that covers off a lot of the new features.


In addition to the DataProtect enhancements, a number of enhancements have been made to both the DataPlatform and File Services components of the product. I’m particularly interested in the ROBO solution, and I think this could end up being a very clever attempt by Cohesity at capturing the secondary storage market at a very broad level.




Cohesity have been moving ahead in leaps and bounds, and I’ve been impressed by what they’ve had to say, and the development of their narrative compared to some of the earlier messaging. It remains to be seen whether they’ll get to where they want to be, but I think they’re giving it a good shake. They’ll be present at VMworld US next week (Booth 827), where you can hear more about what they’re doing with VSAN and vRealize Automation.

Storage Field Day 7 – Day 1 – Catalogic Software

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Catalogic Software presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Catalogic Software website that covers some of what they presented.



According to their website, “ECX is an intelligent copy data management [IDM] platform that allows you to manage, orchestrate and analyze your copy data lifecycle across your enterprise and cloud”. If you’ve ever delivered storage in an enterprise environment before, you’ll understand that copy data management (CDM) is something that can have a significant impact on your infrastructure, and it’s not always something people do well, or even understand.

Ed Walls, CEO of Catalogic, talked a bit about current challenges – growth, manageability, business agility. We’re drowning in a deluge of copy data, with most of these copies sit completely idle. This observation certainly aligns with my experience in a number of environments.

Catalogic’s IDM is a combination of your storage (currently only NetApp) and a CDM platform (provided via an agentless, downloadable VM). You can use this platform to provide “copy data leverage”, enabling orchestration and automation of your copy data. Catalogic also state that this enables you to:

  • Simplify business processes with ‘copy data’ / ‘use data’ workflows;
  • Extract more value from your copy data services;
  • Provide protection compliance / snapshots; and
  • File analytics / Search, Report and Analyse.

In addition to this, Catalogic spoke about ECX’s ability to provide:

  • Next-generation Data protection, with instant recovery and disaster recovery leveraging snap data;
  • Killer App for Hybrid Cloud, enabling business to leverage cloud “scale and economics”; and
  • Copy Data Analytics with snapshots, file analytics, protection compliance. This gives you the ability to search, report and analyse.

It’s not in-line, but rather uses public APIs to orchestrate. In this scenario, tape’s not dead, it’s just not used for operational recovery. You can use it for archive instead.



The basic architecture is as follows:

  • Layer 0 – OS Services (Linux)
  • Layer 1 – Core Services – NoSQL (MongoDB) amongst them, scheduler, reporting, dir, lic mgmt, index search, web, java / REST, DBMS (PostgreSQL), Messaging
  • Layer 2 – Management Services – account, policy, job, catalog, report, resource, event, alert, provision, search
  • Layer 3 – Policy-based Services – NTAP catalog, VMware catalog, NTAP CDM, VMware CDM
  • Layer 4 – Presentation Services

Here’s a picture that takes those dot points, and adds visualisation.




Catalogic went through a live demo with us, and it *looks* reasonably straightforward. A few things to note:

  • Configure – uses a provider model (one-time registration process for the NTAP controller or VMware)
  • ECX is an abstraction layer – workflow, notification, submit
  • Uses a site-based model
  • You can have a VMs and Templates or Datastore view




  • VM snapshots are quiesced sequentially
  • Creating trees of snapshots via workflow
  • Everything is driven via REST API

Is it a replacement for backup? No. But businesses are struggling with traditional backup and recovery methods. Combination of snapshots and tapes is appealing for some people. It “Doesn’t replace it, but reduces the dependency on backups”.

In my opinion, searching the catalogue is pretty cool. They don’t crack open the VMDK to catalogue yet, but it’s been requested by a lot of people and is on their radar.


Final Thoughts and Further Reading

There’s a lot to like about ECX in my opinion, although a number of delegates (myself included) were mildly disappointed that this is currently tied to NetApp. Catalogic, in their defence, are well aware of this as a limitation and are working really hard to broaden the storage platform support.

The cataloguing capability of the product looked great in the demo I saw, and I know I have a few customers who could benefit from a different approach to CDM. Or, more accurately, it would better is they had any approach at all.

Keith had some interesting thoughts on CDM as a potential precursor to data virtualisation here, as well as a preview post here – both of which are worth checking out.