Cohesity Continues to Evolve


I’ve been following Cohesity for some time now, and have covered a number of their product announcements and saw them in action at Storage Field Day 8. They announced version 3.0 at the end of June, and Gaetan Castelein kindly offered to give me a brief on where they’re at in the lead up to VMworld US.


What’s a Cohesity?

Cohesity’s goal is to take the complexity out of secondary storage. They argue that SDS has done a good job of this on primary storage platforms, but we’ve all ignored the issues around running secondary storage. The primary vehicle for this is Cohesity DataPlatform, combined with Cohesity DataProtect. Cohesity have a number of use cases for the platform that they cover, and I thought it might be handy to go over these here.


Use Case 1 – DataPlatform as a “better backup target”


Cohesity are taking aim at the likes of Data Domain, and are keen to replace them as backup targets. Cohesity tell me that DataPlatform offers the following features:

  • Scale-out platform (with no single point of failure), simple capacity planning, no forklift upgrades;
  • Global deduplication;
  • Native cloud integration;
  • High performance with parallelized ingest; and
  • QoS and multitenancy.

These all seem like nice things to have.


Use Case 2 – Simpler Data Protection


Cohesity tell me that the DataPlatform also makes a great option for VMware-based backups, providing data protection folks with the ability to leverage the following features:

  • Converged infrastructure with single pane of glass;
  • Policy-based automation;
  • Fast SLAs (15 min RPOs and instantaneous RTOs); and
  • Productive data (instant clones for test/dev, deep visibility into the data for indexing, custom analytics, etc).

While the single pane of glass often becomes the single pain, the last point about making data productive, depending on the environment you’re working in, is particularly important. There’re a tonne of enterprises out there where people are following some mighty cumbersome processes on snapshots of data to do analytics on the data. Any platform that makes this easier and more accessible seems like a great idea.


Use Case 3 – NFS & SMB Interfaces


You can also use the DataPlatform for file consolidation. Cohesity have even started positioning a combination of VMware VSAN as your primary storage platform (great for running VMs), with Cohesity offering secondary storage and the ability to deliver it over SMB or NFS. You can read more about this here.


Use Case 4 – Test/Dev


Cohesity’s first foray into the market revolved around providing enhanced capabilities for developers, and this remains a key selling point of the platform, with a full set of APIs exposed (which can be easily leveraged for use with Chef, Puppet, etc).


Use Case 5 – Analytics
Analytics have also been a major part of Cohesity’s early forays into secondary storage, with native reporting providing:

  • Utilization metrics (storage utilization, capacity forecasting); and
  • Performance metrics (ingrest rates, date reduction, IOPS, latency).

There’s also content indexing and search, providing data indexing (index upon ingest, VM and file metadata, files within VMs), and “Google-like” search. You can also access an analytics workbench with built-in MapReduce.


What Have You Done For Me Lately?

So with the Cohesity 3.0 Announcement a bunch of expanded application and OS integrations were announced, with a particular focus on SQL, Exchange, SharePoint, MS Windows, Linux, Oracle DBs (RMAN and remote adapter). Here’s a table that Cohesity provided that covers off a lot of the new features.


In addition to the DataProtect enhancements, a number of enhancements have been made to both the DataPlatform and File Services components of the product. I’m particularly interested in the ROBO solution, and I think this could end up being a very clever attempt by Cohesity at capturing the secondary storage market at a very broad level.




Cohesity have been moving ahead in leaps and bounds, and I’ve been impressed by what they’ve had to say, and the development of their narrative compared to some of the earlier messaging. It remains to be seen whether they’ll get to where they want to be, but I think they’re giving it a good shake. They’ll be present at VMworld US next week (Booth 827), where you can hear more about what they’re doing with VSAN and vRealize Automation.

Caringo Announces SwarmNFS

Caringo recently announced SwarmNFS, and I recently had the opportunity to be briefed by Caringo’s Adrian J Herrera (VP Marketing). If you’re not familiar with Caringo, their main platform is Swarm, which “provides a platform for data protection, management, organization and search at massive scale”. You can read an overview of Swarm here, and there’s also a technical overview here.


So what is it?

SwarmNFS is a “stateless Linux process that integrates directly with Caringo Swarm. It delivers a global namespace across NFSv4, HTTP, SCSP (Caring’s protocol), S3, and HDFS, delivering data distribution and data management at scale”.

SwarmNFS is basically an NFS server modified with proprietary code. It is:

  • Stateless and lightweight;
  • Has no caching or spooling;
  • Supports parallel data streaming; and
  • Has no single point of failure, with built-in high availability.

Caringo tell me this makes it a whole lot easier to centralise, distribute and manage data, while using a bunch less resources than a traditional file gateway. You can run it as either a Linux process, an appliance or via a VM. Caringo also tell me that, since they connect directly into Swarm, there are less bottlenecks than the traditional approach using gateways, FUSE and proxies.


Everything in the UI can be done via the API as well, and it has support for multi-tenancy. As I mentioned before, there’s a global namespace with “Universal Access”, meaning that files can be written, read and edited through any interface (NFSv4, SCSP/HTTP, S3, HDFS). Having been a protocol prisoner in previous roles it’s nice to think the there’s a different way to do things.


What do I use it for?

You can use this for all kinds of stuff Adrian ran me through some use cases, including:

  • Media and entertainment (think media streaming / content delivery); and
  • Street view type image storage.

One of the key things here is that, because the platform uses NFS, a lot of application re-work doesn’t necessarily need to occur to take advantage of the object storage platform. In my opinion this is a pretty cool feature of the platform, and one that should definitely see people look at SwarmNFS fairly seriously when evaluating their object storage options.



Caringo are doing some really cool stuff. If you haven’t checked out FileFly before, it’s also worth a look. The capabilities of the Swarm platform are growing at a rapid place. And the storage world is becoming more object and less block and file as each day passes. Enrico‘s been telling me that for ages now, and everything I’m seeing supports that. Caringo’s approach to metadata – storing metadata with the object itself – also means you can do a bunch of cool stuff with it fairly easily, like replicating it, applying erasure coding to it, and so forth. The upshot is that now the data’s truly portable. So, if you’re object-curious but still hang out with file types, maybe SwarmNFS might be a nice compromise for everyone.


Datera Announces Integration With Google’s Kubernetes


It’s the season for interesting announcements in the storage world. I don’t post about everything I get briefed on, but I do like to put up information on things I think are pretty cool. I first came across Datera at Storage Field Day 10. You can read my write-up on them here. I’m a fan of what they’re doing, and their platform is developing at quite a pace. So I was pleased to get a message from their CEO Marc Fleischmann wanting to tell me about a new integration they’ve developed for Google’s Kubernetes. Rather than go into it here, I thought it simpler to link to their press release and an article on the Datera blog.

According to Datera, this integration gives Kubernetes some additional grunt, including the ability to automatically:

  • Tailor runtime storage capabilities for each stateful application;
  • Scale applications; and
  • Isolate and protect them with dedicated storage segments.

I think it’s worth checking out the excellent demo video which covers a lot of the capability. If you’re looking to add some scalable, persistent storage to your Kubernetes deployment, Datera might be just what you need.




Tech Field Day – I’ll Be At TFD Extra at VMworld US 2016


Sure, the title is a bit of a mouthful. But I think it gets the point across. I mentioned recently that I’ll be heading to the US in less than a week for VMworld. This is a quick post to say that I’ll also have the opportunity to participate in my first Tech Field Day Extra event while at VMworld.  If you haven’t heard of the very excellent Tech Field Day events, you should check them out. You can also check back on the TFDx website during the event as there’ll likely be video streaming along with updated links to additional content. You can also see the list of delegates and event-related articles that they’ve published.

I think it’s a great line-up of companies this time around, with some I’m familiar with and some not so much. I’m attending the Tuesday session and will be hearing from ClearSky Storage, NooBaa and Paessler.


It should be a lot of fun!

Testing Tintri’s Lightning Lab and Pizza

Disclaimer: I was offered a pizza to write this post.  I haven’t taken up the offer yet, but I will be.


I had the opportunity to test drive Tintri’s “Lightning Lab” about six months ago and the nice folks at Tintri thought I might like to post about my experiences. They’ve offered me a pizza for my troubles which, coincidentally, ties in nicely with their current promotion “The Tintri Pizza Challenge“. If you’re in the US or Canada it’s worth checking it out.

In any case, the Lightning Lab is Tintri’s internet accessible lab that showcases a number of its arrays and provides you with an opportunity to take their gear for a spin. From a hardware perspective it’s pretty well provisioned, with T5060, T880, T620 & T540 arrays, along with a Dell R720 host with 128GB of RAM and 2 Dell R610 servers with 48GB of RAM. From a software perspective, the version of the lab I used had VMware vSphere 5.5U2b installed, but I believe this has been since updated. There’s also a functional version of Tintri Global Center, and both the Web Client Plug-in and the vROps plugin configured. Networking wise, management runs overs a 1GbE Dell switch, with Data travelling via a 10GbE Arista switch.


Global Center has a pretty neat login screen. Like all good admins, I use many dots in my password too.


There’s a bunch of stuff I could show from the interface, but one of my favourite bits is the ability to see an aggregated view of your deployed VMstores.


The interface is simple to operate and painfully colourful too. It’s also simple to navigate and makes it really easy to get a quick view of what’s going on in your environment without having to do a lot of digging.



There’s a lot more I could write about Tintri. If you’re aligned with their use case (NFS-only), they have a compelling offering that’s worth checking out. The Lightning Lab is an excellent tool to take their platform for a spin and gain a good understanding of just what you can do with the VMstore and Global Center. I think these kind of offerings are great, and not just because there’s pizza involved. If more storage vendors read this and think that they should be doing something like this, then that’s a great thing. I’ve barely scratched the surface, so you should head over to Andrea Mauro’s blog and check out his thorough write-up of his Lightning Lab experience.

OT – Top 78

Eric Siebert recently published (okay, fine, it was three weeks ago) the full results of the Top vBlog voting. I was pleased to find I’d made a jump up from last year.


I’ve previously changed my tune on asking for votes in this competition, not because I don’t think it’s a good bit of fun, but I think there’re a bunch of other bloggers you should be voting for. A few people like to huff and puff about it being a popularity contest, but if nothing else I’ve found these types of lists (and Eric’s site in general) to be extremely useful when tracking down links to things on the internet that I know I need but can’t remember how I googled them in the first place. A lot of work goes into the site, so thanks Eric, and please keep it up! Thanks also to anyone who did throw a vote my way, I do actually appreciate it.

Storage Field Day 10 – Wrap-up and Link-o-rama

Disclaimer: I recently attended Storage Field Day 10.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.


This is a quick post to say thanks once again to Stephen, Tom, Megan and the presenters at Storage Field Day 10. I had an enjoyable and educational time. For easy reference, here’s a list of the posts I did covering the event (they may not match the order of the presentations).

Storage Field Day – I’ll Be At SFD10

Storage Field Day 10 – Day 0

Storage Field Day 10 – (Fairly) Full Disclosure

Kaminario are doing some stuff we’ve seen before, but that’s okay

Pure Storage really aren’t a one-trick pony

Tintri Keep Doing What They Do, And Well

Nimble Storage are Relentless in Their Pursuit of Support Excellence

Cloudian Does Object Smart and at Scale

Exablox Isn’t Just Pretty Hardware

It’s Hedvig, not Hedwig

The Cool Thing About Datera Is Intent

Data Virtualisation is More Than Just Migration for Primary Data


Also, here’s a number of links to posts by my fellow delegates (and Tom!). They’re all really quite smart, and you should check out their stuff, particularly if you haven’t before. I’ll try keep this updated as more posts are published. But if it gets stale, the SFD10 landing page has updated links.


Chris M Evans (@ChrisMEvans)

Storage Field Day 10 Preview: Hedvig

Storage Field Day 10 Preview: Primary Data

Storage Field Day 10 Preview: Exablox

Storage Field Day 10 Preview: Nimble Storage

Storage Field Day 10 Preview: Datera

Storage Field Day 10 Preview: Tintri

Storage Field Day 10 Preview: Pure Storage

Storage Field Day 10 Preview: Kaminario

Storage Field Day 10 Preview: Cloudian

Object Storage: Validating S3 Compatibility


Ray Lucchesi (@RayLucchesi)

Surprises in flash storage IO distributions from 1 month of Nimble Storage customer base

Has triple parity Raid time come?

Pure Storage FlashBlade well positioned for next generation storage

Exablox, bring your own disk storage

Hedvig storage system, Docker support & data protection that spans data centers


Jon Klaus (@JonKlaus)

I will be flying out to Storage Field Day 10!

Ready for Storage Field Day 10!

Simplicity with Kaminario Healthshield & QoS

Breaking down storage silos with Primary Data DataSphere

Cloudian Hyperstore: manage more PBs with less FTE

FlashBlade: custom hardware still makes sense


Enrico Signoretti (@ESignoretti)

VM-aware storage, is it still a thing?

Scale-out, flash, files and objects. How cool is Pure’s FlashBlade?


Josh De Jong (@EuroBrew)


Max Mortillaro (@DarkkAvenger)

Follow us live at Storage Field Day 10

Primary Data: a true Software-defined Storage platform?

If you’re going to SFD10 be sure to wear microdrives in your hair

Hedvig Deep Dive – Is software-defined the future of storage?

Pure Storage’s FlashBlade – Against The Grain

Pure Storage Flashblade is now available!


Gabe Maentz (@GMaentz)

Heading to Tech Field Day


Arjan Timmerman (@ArjanTim)

We’re almost live…

Datera: Elastic Data Fabric


Francesco Bonetti (@FBonez)

EXABLOX – A different and smart approach to NAS for SMB


Marco Broeken (@MBroeken)


Rick Schlander (@VMRick)

Storage Field Day 10 Next Week

Hedvig Overview


Tom Hollingsworth (@networkingnerd)

Flash Needs a Highway


Finally, thanks again to Stephen, Tom, Megan (and Claire in absentia). It was an educational and enjoyable few days and I really valued the opportunity I was given to attend.


Data Virtualisation is More Than Just Migration for Primary Data

Disclaimer: I recently attended Storage Field Day 10.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.


Before I get started, you can find a link to my raw notes on Primary Data’s presentation here. You can also see videos of the presentation here. I’ve seen Primary Data present at SFD7 and SFD8, and I’ve typically been impressed with their approach to Software-Defined Storage (SDS) and data virtualisation generally. And I’m also quite a fan of David Flynn‘s whiteboarding chops.



Data Virtualisation is More Than Just Migration

Primary Data spent  some time during their presentation at SFD10 talking about Data Migration vs Data Mobility.


[image courtesy of Primary Data]

Data migration can be a real pain to manage. It’s quite often a manual process and is invariably tied to the capabilities of the underlying storage platform hosting the data. The cool thing about Primary Data’s solution is that it offers dynamic data mobility, aligning “data’s needs (objectives) with storage capabilities (service levels) through automated mobility, arbitrated by economic value and reported as compliance”. Sounds like a mouthful, but it’s a nice way of defining pretty much what everyone’s been trying to achieve with storage virtualisation solutions for the last decade or longer.

What I like about this approach is that it’s a data-centric, rather than employing a storage platform focused approach. Primary Data supports “anything that can be presented to Linux as a block device”, so the options to deploy this stuff are fairly broad. Once you’ve presented your data to DSX, there’s some smart service level objectives (SLOs) that can be applied to the data. These can be broken down into the categories of protection, performance, and price/penalty:


  • Durability
  • Availability
  • Recoverability – Security
  • Priority
  • Sovereignty


  • IOPS / Bandwidth / Latency – Read / Write
  • Sustained / Burst

Price / Penalty

  • Per File
  • Per Byte
  • Per Operation

Access Control can also be applied to your data. With Primary Data, “[e]very storage container is a landlord with floorspace to lease and utilities available (capacity and performance)”.


Further Reading and Final Thoughts

I like the approach to data virtualisation that Primary Data have taken. There are a number of tools on the market that claim to fully virtualise storage and offer mobility across platforms. Some of them do it well, and some focus more on the benefits provided around ease of migration from one platform to another.

That said, there’s certainly some disagreement in the market place on whether Primary Data could be considered a fully-fledged SDS solution. Be that as it may, I really like the focus on data, rather than silos of storage. I’m also a big fan of applying SLOs to data, particularly when it can be automated to improve the overall performance of the solution and make the data more accessible and, ultimately, more valuable.

Primary Data has a bunch of use cases that extend beyond data mobility as well, including deployment options ranging from Hyperconverged, software-defined NAS and clustering across existing storage platforms. Primary Data want to “do for storage what VMware did for compute”. I think the approach they’ve taken has certainly gotten them on the right track, and the platform has matured greatly in the last few years.

If you’re after some alternative (and better thought out) posts on Primary Data, you can read Jon‘s post here. Max also did a good write-up here, while Chris M.Evans did a nice preview post on Primary Data that you can find here.

The Cool Thing About Datera Is Intent

Disclaimer: I recently attended Storage Field Day 10.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.


Before I get started, you can find a link to my raw notes on Datera‘s presentation here. You can also see videos of their presentation here.


What’s a Datera?

Datera’s Elastic Data Fabric is “software defined storage appliance that takes over the hardware”. It’s currently available in two flavours:

  • Software available with qualified hardware (this is prescriptive, and currently based on a SuperMicro platform); and
  • Can be licensed as software-only as well with 2 SKUs available in 50TB or 100TB chunks.


What Can I Do With a Datera?


[image courtesy of Datera]

There are a couple of features that make Datera pretty cool, including:

  • Intent defined – you can use templates to enable intelligent placement of application data;
  • Economic flexibility – heterogeneous nodes can be deployed in the same cluster (capacity, performance, media type);
  • Works with an API first or Dev/Ops model – treating your infrastructure as code, programmable/composable;
  • Multi-tenant capability – this includes network isolation and QoS features;
  • Infrastructure awareness – auto-forming, optimal allocation of infrastructure resources.


What Do You Mean “Intent”?

According to Datera, Application Intent is “[a] way of describing what your application wants and then letting the system allocate the data”. You can define the following capabilities with an application template:

  • Policies for management (e.g. QoS) – data redundancy, data protection, data placement;
  • Storage template – defines how many volumes you want and the size you want; and
  • Pools of resources that will be consumed.

I think this is a great approach, and really provides the infrastructure operator with a fantastic level of granularity when it comes to deploying their applications.

Datera don’t use RAID, currently using 1->5 replication (synchronous) within the cluster to protect data. Snapshots are copy on write (at an application intent level).

Further Reading and Final Thoughts

I know I’ve barely scratched the surface of some of the capabilities of the Datera platform. I am super enthusiastic about the concept of Application Intent, particularly as it relates to scale-out, software-defined storage platforms. I think we spend a lot of time talking about how fast product X can go, and why technology Y is the best at emitting long beeps or performing firmware downgrades. We tend to forget about why we’re buying product X or deploying technology Y. It’s to run the business, isn’t it? Whether it’s teaching children or saving lives or printing pamphlets, the “business” is the reason we need the applications, and thus the reason we need the infrastructure to power those applications. So it’s nice to see vendors such as Datera (and others) working hard to build application-awareness as a core capability of their architecture. When I spoke to Datera, they had four customers announced, with more than 10 “not announced”. They’re obviously keen to get traction, and as their product improves and more people get to know about them, I’ve no doubt that this number will increase dramatically.

While I haven’t had stick-time with the product, and thus can’t talk to the performance or otherwise, I can certainly vouch for the validity of the approach from an architectural perspective. If you’re looking to read up on software-defined storage, I wouldn’t hesitate to recommend Enrico‘s recent post on the topic. Chris M. Evans also did a great write-up on Datera as part of his extensive series of SFD10 preview posts – you can check it out here. Finally, if you ever need to get my attention in presentations, the phrase “no more data migration orgies” seems to be a sure-fire way of getting me to listen.

It’s Hedvig, not Hedwig

Disclaimer: I recently attended Storage Field Day 10.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.


Before I get started, you can find a link to my raw notes on Hedvig‘s presentation here. You can also see videos of the presentation here.


It’s Hedvig, not Hedwig

I’m not trying to be a smart arse. But when you have a daughter who’s crazy about Harry Potter, it’s hard not to think about Hedwig when seeing the Hedvig brand name. I’m sure in time I’ll learn not to do this.

If you’re unfamiliar with Hedvig, it’s software-defined storage. The Hedvig Distributed Storage Platform is made up of standard servers and the Hedvig software.

Some of the key elements of the Hedvig solution are as follows:

  • Software is completely decoupled from commodity hardware;
  • Application-specific storage policies; and
  • Automated and API-driven.



Hedvig took us through their 7 core capabilities, which were described as follows:

  • Seamless scaling with x86 or ARM (haven’t seen an ARM-64 deployment yet);
  • Hyperconverged and hyperscale architectures (can mix and match in the same cluster);
  • Support for any hypervisor, container or OS (Xen, KVM, HyperV, ESX, containers, OpenStack, bare-metal Windows or Linux);
  • Block (iSCSI), file (NFS) and object (S3, SWIFT) protocols in one platform;
  • Enterprise features: dedupe, compression, tiering, caching, snaps/clones;
  • Granular feature provisioning per virtual disk; and
  • Multi-DC and cloud replication.




The Hedvig solution is comprised of the following key components:

  • Hedvig Storage Proxy – presents the block and file storage; runs as VM, container, or bare metal;
  • Hedvig Storage Service – forms an elastic cluster using commodity servers and/or cloud infrastructure; and
  • RESTful APIs – provides object access via S3 or Swift, instruments control and data plane


How Does It Work?

This is oversimplifying things, but here’s roughly how it works:

  • Create and present virtual disks to the application tier;
  • Hedvig Storage Proxy captures and directs I/O to storage cluster;
  • Hedvig Storage Service distributes and replicates data across nodes;
  • The cluster caches and balances across nodes and racks; and
  • The cluster replicates for DR across DCs and/or clouds.


Use Cases?

So where would you use Hedvig? According to Hedvig, they’re seeing uptake in a number of both “traditional” and “new” areas:


  • Server virtualisation
  • Backup and BC/DR
  • VDI

New workloads

  • Production clouds
  • Test/Dev
  • Big data/IoT


Further Reading and Final Thoughts

Before I wrap up, a quick shout-out to Chris Kranz for his use of Hedvig flavoured magnetic props during his whiteboard session – it was great. Here’s a shonky photo of Chris.


Avinash Lakshman is a super smart dude with a tonne of experience in doing cloud and storage things at great scale. He doesn’t believe that traditional storage has a future. When you watch the video of the Hedvig presentation at SFD10 you get a real feel for where the company’s coming from. The hyper-functional API access versus the GUI that looks a little rough around the edges certainly gives away the heritage of this product. That said, I think Avinash and Hedvig are onto a good thing here. The “traditional” storage architectures are indeed dying, as much as we might enjoy the relative simplicity of selling someone a dual-controller, midrange, block array with limited scalability.

As with many of these solutions I feel like we’re on the cusp of seeing something really cool being developed right in front of us. For some us, the use cases won’t strike a chord, and the need for this level of scalability may not be there. But if you’re all in on SDS, Hedvig certainly has some compelling pieces of the puzzle that I think are worthy of further investigation.

The Hedvig website contains a wealth of information. You should also check out Chris M. Evans‘s SFD10 preview post on Hedvig here, while Rick Schlander did a great overview post that I recommend reading. Max did a really good deep dive post, along with a higher level view that you can see here.