I’ve covered the Cohesity appliance deployment in a howto article previously. I’ve also made use of the VMware-compatible Virtual Edition in our lab to test things like cluster to cluster replication and cloud tiering. The benefits of virtual appliances are numerous. They’re generally easy to deploy, don’t need dedicated hardware, can be re-deployed quickly when you break something, and can be a quick and easy way to validate a particular process or idea. They can also be a problem with regards to performance, and are at the mercy of the platform administrator to a point. But aren’t we all? With 6.1, Cohesity have made available a clustered virtual edition (the snappily titled Cohesity Cluster Virtual Edition ESXi). If you have access to the documentation section of the Cohesity support site, there’s a PDF you can download that explains everything. I won’t go into too much detail but there are a few things to consider before you get started.
Just like the non-clustered virtual edition, there’s a small and large configuration you can choose from. The small configuration supports up to 8TB for the Data disk, while the large configuration supports up to 16TB for the Data disk. The small config supports 4 vCPUs and 16GB of memory, while the large configuration supports 8 vCPUs and 32GB of memory.
Once you’ve deployed the appliance, you’ll need to add the Metadata disk and Data disk to each VM. The Metadata disk should be between 512GB and 1TB. For the large configuration, you can also apparently configure 2x 512GB disks, but I haven’t tried this. The Data disk needs to be between 512GB and 8TB for the small configuration and up to 16TB for the large configuration (with support for 2x 8TB disks). Cohesity recommends that these are formatted as Thick Provision Lazy Zeroed and deployed in Independent – Persistent mode. Each disk should be attached to its own SCSI controller as well, so you’ll have the system disk on SCSI 0:0, the Metadata disk on SCSI 1:0, and so on.
I did discover a weird issue when deploying the appliance on a Pure Storage FA-450 array in the lab. In vSphere this particular array’s datastore type is identified by vCenter as “Flash”. For my testing I had a 512GB Metadata disk and 3TB Data disk configured on the same datastore, with the three nodes living on three different datastores on the FlashArray. This caused errors with the cluster configuration, with the configuration wizard complaining that my SSD volumes were too big.
I moved the Data disk (with storage vMotion) to an all flash Nimble array (that for some reason was identified by vSphere as “HDD”) and the problem disappeared. Interestingly I didn’t have this problem with the single node configuration of 6.0.1 deployed with the same configuration. I raised a ticket with Cohesity support and they got back to me stating that this was expected behaviour in 6.1.0a. They tell me, however, that they’ve modified the behaviour of the configuration routine in an upcoming version so fools like me can run virtualised secondary storage on primary storage.
You can configure the appliance for increased resiliency at the Storage Domain level as well. If you go to Platform – Cluster – Storage Domains you can modify the DefaultStorageDomain (and other ones that you may have created). Depending on the size of the cluster you’ve deployed, you can choose the number of failures to tolerate and whether or not you want erasure coding enabled.
You can also decide whether you want EC to be a post-process activity or something that happens inline.
Once you’ve deployed (a minimum) 3 copies of the Clustered VE, you’ll need to manually add Metadata and Data disks to each VM. The specifications for these are listed above. Fire up the VMs and go to the IP of one of the nodes. You’ll need to log in as the admin user with the appropriate password and you can then start the cluster configuration.
This bit is pretty much the same as any Cohesity cluster deployment, and you’ll need to specify things like a hostname for the cluster partition. As always, it’s a good idea to ensure your DNS records are up to date. You can get away with using IP addresses but, frankly, people will talk about you behind your back if you do.
At this point you can also decide to enable encryption at the cluster level. If you decide not to enable it you can do this on a per Domain basis later.
Click on Create Cluster and you should see something like the following screen.
Once the cluster is created, you can hit the virtual IP you’ve configured, or any one of the attached nodes, to log in to the cluster. Once you log in, you’ll need to agree to the EULA and enter a license key.
The availability of virtual appliance versions for storage and data protection solutions isn’t a new idea, but it’s certainly one I’m a big fan of. These things give me an opportunity to test new code releases in a controlled environment before pushing updates into my production environment. It can help with validating different replication topologies quickly, and validating other configuration ideas before putting them into the wild (or in front of customers). Of course, the performance may not be up to scratch for some larger environments, but for smaller deployments and edge or remote office solutions, you’re only limited by the available host resources (which can be substantial in a lot of cases). The addition of a clustered version of the virtual edition for ESXi and Hyper-V is a welcome sight for those of us still deploying on-premises Cohesity solutions (I think the Azure version has been clustered for a few revisions now). It gets around the main issue of resiliency by having multiple copies running, and can also address some of the performance concerns associated with running virtual versions of the appliance. There are a number of reasons why it may not be the right solution for you, and you should work with your Cohesity team to size any solution to fit your environment. But if you’re running Cohesity in your environment already, talk to your account team about how you can leverage the virtual edition. It really is pretty neat. I’ll be looking into the resiliency of the solution in the near future and will hopefully be able to post my findings in the next few weeks.
I’ve been doing some work with Cohesity in our lab and thought it worth covering some of the basic features that I think are pretty neat. In this edition of Cohesity Basics, I thought I’d quickly cover off how to exclude VMs from protection jobs based on assigned tags. In this example I’m using version 6.0.1b_release-20181014_14074e50 (a “feature release”).
The first step is to find the VM in vCenter that you want to exclude from a protection job. Right-click on the VM and select Tags & Custom Attributes. Click on Assign Tag.
In the Assign Tag window, click on the New Tag icon.
Assign a name to the new tag, and add a description if that’s what you’re into.
In this example, I’ve created a tag called “COH-Test”, and put it in the “Backup” category.
Now go to the protection job you’d like to edit.
Click on the Tag icon on the right-hand side. You can then select the tag you created in vCenter. Note that you may need to refresh your vCenter source for this new tag to be reflected.
When you select the tag, you can choose to Auto Protect or Exclude the VM based on the applied tags.
If you drill in to the objects in the protection job, you can see that the VM I wanted to exclude from this job has been excluded based on the assigned tag.
I’ve written enthusiastically about Cohesity’s Auto Protect feature previously. Sometimes, though, you need to exclude VMs from protection jobs. Using tags is a quick and easy way to do this, and it’s something that your virtualisation admin team will be happy to use too.
What Is It?
If we’re not talking about the god and personification of the Sun, what are we talking about? Cohesity tells me that Helios is a “SaaS-based data and application orchestration and management solution”.
[image courtesy of Cohesity]
Here is the high-level architecture of Helios. There are three main features:
- Multi-cluster management – Control all your Cohesity clusters located on-premises, in the cloud or at the edge from a single dashboard;
- SmartAssist – Gives critical global operational data to the IT admin; and
- Machine Learning Engine – Gives IT Admins machine driven intelligence so that they can make an informed decision.
All of this happens when Helios collects, anonymises, aggregates, and analyses globally available metadata and gives actionable recommendations to IT Admins.
Multi-cluster management is just that: the ability to manage more than one cluster through a unified UI. The cool thing is that you can rollout policies or make upgrades across all your locations and clusters with a single click. It also provides you with the ability to monitor your Cohesity infrastructure in real-time, as well as being able to search and generate reports on the global infrastructure. Finally, there’s an aggregated, simple to use dashboard.
SmartAssist is a feature that provides you with the ability to have smart management of SLAs in the environment. The concept is that if you configure two protection jobs in the environment with competing requirements, the job with the higher SLA will get priority. I like this idea as it prevents people doing silly things with protection jobs.
The Machine Learning part of the solution provides a number of things, including insights into capacity consumption. And proactive wellness? It’s not a pitch for some dodgy natural health product, but instead gives you the ability to perform:
- Configuration validations, preventing you from doing silly things in your environment;
- Blacklist version control, stopping known problematic software releases spreading too far in the wild; and
- Hardware health checks, ensuring things are happy with your hardware (important in a software-defined world).\
Thoughts and Further Reading
There’s a lot more going on with Helios, but I’d like to have some stick time with it before I have a lot more to say about it. People are perhaps going to be quick compare this with other SaaS offerings, but I think they might be doing some different things, with a bit of a different approach. You can’t go five minutes on the Internet without hearing about how ML is changing the world. If nothing else, this solution delivers a much needed consolidated view of the Cohesity environment. This seems like an obvious thing, but probably hasn’t been necessary until Cohesity landed the type of customers that had multiple clusters installed all over the place.
I also really like the concept of a feature like SmartAssist. There’s only so much guidance you can give people before they have to do some thinking for themselves. Unfortunately, there are still enough environments in the wild where people are making the wrong decision about what priority to place on jobs in their data protection environment. SmartAssist can do a lot to take away the possibility that things will go awry from an SLA perspective.
I deployed Cohesity Cloud Edition in Microsoft Azure recently and took a few notes. I’m the first to admit that I’m completely hopeless when it comes to fumbling my way about Azure, so this probably won’t seem as convoluted a process to you as it did to me. If you have access to the documentation section of the Cohesity support site, there’s a PDF you can download that explains everything. I won’t go into too much detail but there are a few things to consider. There’s also a handy solution brief on the Cohesity website that sheds a bit more light on the solution.
The installation requires a Linux VM be setup in Azure (a small one – DS1_V2 Standard). Just like in the physical world, you need to think about how many nodes you want to deploy in Azure (this will be determined largely by how much you’re trying to protect). As part of the setup you edit a Cohesity-provided JSON file with a whole bunch of cool stuff like Application IDs and Keys and Tenant IDs.
Specify the subscription ID for the subscription used to store the resources of the Cohesity Cluster.
WARNING: The subscription account must have owner permissions for the specified subscription.
Specify the Application ID assigned by Azure during the service principal creation process.
Specify the Application key generated by Azure during the service principal creation process that is used for authentication.
Specify the unique Tenant ID assigned by Azure.
The Linux VM then goes off and builds the cluster in the location you specify with the details you’ve specified. If you haven’t done so already, you’ll need to create a Service Principal as well. Microsoft has some useful documentation on that here.
One thing to keep in mind is that, at this stage, “Cohesity does not support the native backup of Microsoft Azure VMs. To back up a cloud VM (such as a Microsoft Azure VM), install the Cohesity agent on the cloud VM and create a Physical Server Protection Job that backs up the VM”. So you’ll see that, even if you add Azure as a source, you won’t be able to perform VM backups in the same way you would with vSphere workloads, as “”Cloud Edition only supports registering a Microsoft Azure Cloud for converting and cloning VMware VMs. The registered Microsoft Azure Cloud is where the VMs are cloned to”. This is the same across most public cloud platforms, as Microsoft, Amazon and friends aren’t terribly interested in giving out that kind of access to the likes of Cohesity or Rubrik. Still, if you’ve got the right networking configuration in place, you can back up your Azure VMs either to the Cloud Edition or to an on-premises instance (if that works better for you).
I’m on the fence about “Cloud Editions” of data protection products, but I do understand why they’ve come to be a thing. Enterprises have insisted on a lift and shift approach to moving workloads to public cloud providers and have then panicked about being able to protect them, because the applications they’re running aren’t cloud-native and don’t necessarily work well across multiple geos. And that’s fine, but there’s obviously an overhead associated with running cloud editions of data protection solutions. And it feels like you’re just putting off the inevitable requirement to re-do the whole solution. I’m all for leveraging public cloud – it can be a great resource to get things done effectively without necessarily investing a bunch of money in your own infrastructure. But you need to re-factor your apps for it to really make sense. Otherwise you find yourself deploying point solutions in the cloud in order to avoid doing the not so cool stuff.
I’m not saying that this type of solution doesn’t have a place. I just wish it didn’t need to be like this sometimes …
I’ve been doing some work with Cohesity in our lab and thought it worth covering some of the basic features that I think are pretty neat. In this edition of Cohesity Basics, I thought I’d quickly cover off how to get started with the “Cloud Tier” feature. You can read about Cohesity’s cloud integration approach here. El Reg did a nice write-up on the capability when it was first introduced as well.
What Is It?
Cohesity have a number of different technologies that integrate with the cloud, including Cloud Archive and Cloud Tier. With Cloud Archive you can send copies of snapshots up to the cloud to keep as a copy separate to the backup data you might have replicated to a secondary appliance. This is useful if you have some requirement to keep a monthly or six-monthly copy somewhere for compliance reasons. Cloud Tier is an overflow technology that allows you to have cold data migrated to a cloud target when the capacity of your environment exceeds 80%. Note that “coldness” is defined in this instance as older than 60 days. That is, you can’t just pump a lot of data in to your appliance to see how this works (trust me on that). The coldness level is configurable, but I recommend you engage with Cohesity support before you go down that track. It’s also important to note that once you turn on Cloud Tier for a View Box, you can’t turn it off again.
How Do I?
Here’s how to get started in 10 steps or less. Apologies if the quality of some of these screenshots is not great. The first thing to do is register an External Target on your appliance. In this example I’m running version 5.0.1 of the platform on a Cohesity Virtual Edition VM. Click on Protection – External Target.
Under External Targets you’ll see any External Targets you’ve already configured. Select Register External Target.
You’ll need to give it a name and choose whether you’re using it for Archival or Cloud Tier. This choice also impacts some of the types of available targets. You can’t, for example, configure a NAS or QStar target for use with Cloud Tier.
Selecting Cloud Tier will provide you with more cloudy targets, such as Google, AWS and Azure.
In this example, I’ve selected S3 (having already created the bucket I wanted to test with). You need to know the Bucket name, Region, Access Key ID and your Secret Access Key.
If you have it all correct, you can click on Register and it will work. If you’ve provided the wrong credentials, it won’t work. You then need to enable Cloud Tier on the View Box. Go to Platform – Cluster.
Click on View Boxes and the click on the three dots on the right to Edit the View Box configuration.
You then can toggle Cloud Tier and select the External Target you want to use for Cloud Tier.
Once everything is configured (and assuming you have some cold data to move to the cloud and your appliance is over 80% full) you can click on the cluster dashboard and you’ll see an overview of Cloud Tier storage in the Storage part of the overview.
All the kids are getting into cloud nowadays, and Cohesity is no exception. I like this feature because it can help with managing capacity on your on-premises appliance, particularly if you’ve had a sudden influx of data into the environment, or you have a lot of old data that you likely won’t be accessing. You still need to think about your egress charges (if you need to get those cold blocks back) and you need to think about what the cost of that S3 bucket (or whatever you’re using) really is. I don’t see the default coldness level being a problem, as you’d hope that you sized your appliance well enough to cope with a certain amount of growth.
Features like this demonstrate both a willingness on behalf of Cohesity to embrace cloud technologies, as well as a focus on ease of use when it comes to reasonably complicated activities like moving protection data to an alternative location. My thinking is that you wouldn’t necessarily want to find yourself in the position of having to suddenly shunt a bunch of cold data to a cloud location if you can help it (although I haven’t done the maths on which is a better option) but it’s nice to know that the option is available and easy enough to setup.
This one falls into the category of “unlikely that it will happen to you but might be worth noting”. I’ve been working with some Cohesity gear in the lab recently and came across a warning, not an error, when I was doing a SQL backup.
But before I get to that, it’s important to share the context of the testing. With Cohesity, there’s some support for protecting Microsoft SQL workloads that live on Windows Failover Clusters (as well as AAGs – but that’s a story for another time). You configure these separately from your virtual sources, and you install an agent on each node in the cluster. In my test environment I’ve created a simple two-node Windows Failover Cluster based on Windows 2016. It has some shared disk and a heartbeat network (a tip of the hat to Windows clusters of yore). I’ve cheated, because it’s virtualised, but needs must and all that. I’m running SQL 2014 on top of this. It took me a little while to get that working properly, mainly because I’m a numpty with SQL. I finally had everything setup when I noticed the following error after each SQL protection job ran.
I was a bit confused as I had set the databases to full recovery mode. Of course, the more it happened, the more I got frustrated. I fiddled about with permissions on the cluster, manual maintenance jobs, database roles and all manner of things I shouldn’t be touching. I even went for a short walk. The thing I didn’t do, though, was click the arrow on the left hand side of the job. That expands the job run details so you can read more about what happened. If I’d done that, I would have seen this error straight away. And the phrase “No databases available for log backup” would have made more sense.
And I would have realised that the reason I was getting the log backup warning was because it was skipping the system databases and, as I didn’t have any other databases deployed, it wasn’t doing any log backups. This is an entirely unlikely scenario in the real world, because you’ll be backing up SQL clusters that have data on them. If they don’t have data on them, they’re likely low value items and won’t get protected. The only situation where you might come across this is if you’re testing your infrastructure before deploying data to it. I resolved the issue by creating a small database. The log backups then went through without issue.
For reference, the DataPlatform version I’m using is 5.0.1.
Disclaimer: I recently attended Storage Field Day 15. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Cohesity recently presented at Storage Field Day 15. It’s not the first time I’ve spoken about them, and you can read a few of my articles on them here and here. You can see their videos from Storage Field Day 15 here, and download a PDF copy of my rough notes from here.
The Data Centre Is Boring
Well, not boring exactly. Okay, it’s a little boring. Cohesity talk a lot about the concept of secondary storage and, in their view, most of the storage occupying the DC is made up of secondary storage. Think of your primary storage tier as your applications, and your secondary storage as being comprised of:
- Archival data;
- Analytics; Test/Dev workloads; and
- File shares.
In other words, it’s a whole lot of unstructured data. Cohesity like to talk about the “storage iceberg”, and it’s a pretty reasonable analogy for what’s happening.
[Image courtesy of Cohesity]
Cohesity don’t see all this secondary data as simply a steaming pile of unmanaged chaos and pain. Instead, they see it as a potential opportunity for modernisation. The secondary storage market has delivered, in Cohesity’s view, an opportunity to “[c]lean up the mess left by enterprise backup products”. The idea is that you can use an “Apple-like UI”, operating at “Google-like scale”, to consolidate workloads on the Cohesity DataPlatform and then take advantage of copy data management to really extract value from that data.
The Cohesity Difference
So what differentiates Cohesity from other players in the secondary storage space?
- Global Space Efficiency
- Variable length dedupe
- Erasure coding
- Multi workload isolation
- Noisy neighbour prevention
- Instant Mass Restore
- Any point in time
- Highly available
- Data Resiliency
- Strict consistency
- Ensures data integrity
- Cloud/Apps Integration
- Universal access
I’ve been fortunate enough to have some hands on experience with the Cohesity solution and can attest that these features (particularly things like storage efficiency and resiliency) aren’t just marketing. There are some other neat features, such as public cloud support with AWS and Azure that are also worthy of further investigation.
Thoughts And Further Reading
There’s a lot to like about Cohesity’s approach to leveraging secondary storage in the data centre. For a very long time, the value of secondary storage hasn’t been at the forefront of enterprise analytics activities. Or, more bluntly put, copy data management has been something of an ongoing fiasco, with a number of different tools and groups within organisations being required to draw value from the data that’s just sitting there. Cohesity don’t like to position themselves simply as a storage target for data protection, because the DataPlatform is certainly capable of doing a lot more than that. While the messaging has occasionally been confusing, the drive of the company to deliver a comprehensive data management solution that extends beyond traditional solutions shouldn’t be underestimated. Coupled with a relentless focus on ease of use and scalability and the Cohesity offering looks to be a great way of digging in to the “dark data” in your organisation to make sense of what’s going on.
There are still situations where Cohesity may not be the right fit (at the moment), particularly if you have requirements around non-x86 workloads or particularly finicky (read: legacy) enterprise applications. That said, Cohesity are working tirelessly to add new features to the solution at a rapid pace, and are looking to close the gap between themselves and some of the more established players in the market. The value here, however, isn’t just in the extensive data protection capability, but also in the analytics that can be leveraged to provide further insight into your organisation’s protected data. It’s sometimes not immediately obvious why you need to be mining your unstructured data for information. But get yourself the right tools and the right people and you can discover a whole lot of very useful (and sometimes scary) information about your organisation that you wouldn’t otherwise know. And it’s that stuff that lies beneath the surface that can have a real impact on your organisation’s success. Even if it is a little boring.
I’ve been doing some work with Cohesity in our lab and thought it worth covering some of the basic features that I think are pretty neat. In this edition of Cohesity Basics, I thought I’d quickly cover off the “Auto Protect” feature. If you read their white paper on data protection, you’ll find the following line: “As new virtual machines are added, they are auto discovered and included in the protection policy that meets the desired SLAs”. It seems like a pretty cool feature, and was introduced in version 4.0. I wanted to find out a bit more about how it works.
What Is It?
Auto Protect will “protect new VMs that are added to a selected parent Object (such as a Datacenter, Folder, Cluster or Host)”. The idea behind this is that you can add a source and have Cohesity automatically protect all of the VMs in a folder, cluster, etc. The cool thing is that it will also protect any new VMs added to that source.
When you’re adding Objects to a Protection Job, you can select what to auto protect. In the screenshot below you can see that the Datacenter in my vCenter has Auto Protect turned off.
The good news is that you can explicitly exclude Objects as well. Here’s what the various icons mean.
[Image courtesy of Cohesity]
When you create a Protection Job in Cohesity you add Objects to the job. If you select to Auto Protect this Object, anything under that Object will automatically be protected. Every time the Protection Job runs, if the Object hierarchy has been refreshed on the Cohesity Cluster, new VMs are also backed up even though the new VM has not been manually included in the Protection Job. There are two ways that the Object hierarchy gets refreshed. It is automatically done every 4 hours by the cluster. If you’re in a hurry though, you can do it manually. Go to Protection -> Sources and click on the Source you’d like to refresh. There’s a refresh button to click on and you’ll see your new Objects showing up.
Why Wouldn’t You?
As part of my testing, I’ve been creating “catchall” Protection Jobs and adding all the VMs in the environment into the jobs. But we have some VMware NSX Controller VMs in our lab, and VMware “only supports backing up the NSX Edge and controller through the NSX Manager“. Not only that, but it simply won’t work.
In any case, you can use FTP to back up your NSX VMs if you really feel like that’s emoting you want to do. More info on that is here. You also want to be careful that you’re not backing up stuff you don’t need to, such as clones and odds and sods. Should I try protecting the Cohesity Virtual Edition appliance VM? I don’t know about that …
I generally prefer data protection configurations that “protect everything and exclude as required”. While Auto Protect is turned off by default, it’s simple enough to turn on when you get started. And it’s a great feature, particularly in dynamic environments where there’s no automation of data protection when new workloads are provisioned (a problem for another time). Hat tip to my Cohesity SE Pete Marfatia for pointing this feature out to me.