VMware Cloud Disaster Recovery – Using A Script VM

This is a quick post covering the steps required to configure a script VM for use in a recovery plan with VMware Cloud Disaster Recovery (VCDR). Why would you want to do this? You might be running a recovery for a Linux VM and you need to run a script to update the DNS settings of the VM once it’s powered on at another site. Or you might have a site-specific application that needs to be installed. Whatever. The point is that VCDR gives you that ability to do that via the Script VM. You can read the documentation on the feature here.

Firstly, you configure the Script VM as part of the Recovery Plan creation process. Specify the name of the VM and the vCenter it’s hosted on.

Under Recovery steps, click on Add Step to add a step to the recovery process.

When you add the step, you’ll want to add an action for the post-recovery phase.

You can then select “Run script on the Script VM”.

At this point you can specify the full path to the script file, keeping in mind that Windows looks different to Linux. You can also set a timeout for the script.

And that’s pretty much it. Remember that you’ll need working DNS, or, failing that, valid IP addresses for things to work.

VMware Cloud on AWS – Check TRIM/UNMAP

This a really quick follow up to one of my TMCHAM articles on TRIM/UNMAP on VMware Cloud on AWS. In short, a customer wanted to know whether TRIM/UNMAP had been enabled on one of their clusters, as they’d requested. The good news is it’s easy enough to find out. On your cluster, go to Configure. Under vSAN, you’ll see Services. Expand the Advanced Options section and you’ll see whether TRIM/UNMAP has been enabled for the cluster or not.

VMware Cloud Disaster Recovery – Ransomware Recovery Activation

One of the cool features of VMware Cloud Disaster Recovery (VCDR) is the Enhanced Ransomware Recovery capability. This is a quick post to talk through how to turn it on in your VCDR environment, and things you need to consider.


Organization Settings

The first step is to enable the ransomware services integration in your VCDR dashboard. You’ll need to be an Organisation owner to do this. Go to Settings, and click on Ransomware Recovery Services.

You’ll then have the option to select where the data analysis is performed.

You’ll also need to tick some boxes to ensure that you understand that an appliance will be deployed in each of your Recovery SDDCs, Windows VMs will get a sensor installed, and some preinstalled sensors may clash with Carbon Black.

Click on Activate and it will take a few moments. If it takes much longer than that, you’ll need to talk to someone in support.

Once the analysis integration is activated, you can then activate NSX Advanced Firewall. Page 245 of the PDF documentation covers this better than I can, but note that NSX Advanced Firewall is a chargeable service (if you don’t already have a subscription attached to your Recovery SDDC). There’s some great documentation here on what you do and don’t have access to if you allow the activation of NSX Advanced Firewall.

Like your favourite TV chef would say, here’s one I’ve prepared earlier.

Recovery Plan Configuration

Once the services integration is done, you can configure Ransomware Recovery on a per Recovery Plan basis.

Start by selecting Activate ransomware recovery. You’ll then need to acknowledge that this is a chargeable feature.

You can also choose whether you want to use integrated analysis (i.e. Carbon Black Cloud), and if you want to manually remove other security sensors when you recover. You can, also, choose to use your own tools if you need to.

And that’s it from a configuration perspective. The actual recovery bit? A story for another time.

VMware Cloud Disaster Recovery – Firewall Ports

I published an article a while ago on getting started with VMware Cloud Disaster Recovery (VCDR). One thing I didn’t cover in any real depth was the connectivity requirements between on-premises and the VCDR service. VMware has worked pretty hard to ensure this is streamlined for users, but it’s still something you need to pay attention to. I was helping a client work through this process for a proof of concept recently and thought I’d cover it off more clearly here. The diagram below highlights the main components you need to look at, being:

  • The Cloud File System (frequently referred to as the SCFS)
  • The VMware Cloud DR SaaS Orchestrator (the Orchestrator); and
  • VMware Cloud DR Auto-support.

It’s important to note that the first two services are assigned IP addresses when you enable the service in the Cloud Service Console, and the Auto-support service has three public IP addresses that you need to be able to communicate with. All of this happens outbound over TCP 443. The Auto-support service is not required, but it is strongly recommended, as it makes troubleshooting issues with the service much easier, and provides VMware with an opportunity to proactively resolve cases. Network connectivity requirements are documented here.

[image courtesy of VMware]

So how do I know my firewall rules are working? The first sign that there might be a problem is that the DRaaS Connector deployment will fail to communicate with the Orchestrator at some point (usually towards the end), and you’ll see a message similar to the following. “ERROR! VMware Cloud DR authentication is not configured. Contact support.”

How can you troubleshoot the issue? Fortunately, we have a tool called the DRaaS Connector Connectivity Check CLI that you can run to check what’s not working. In this instance, we suspected an issue with outbound communication, and ran the following command on the console of the DRaaS Connector to check:

drc network test --scope cloud

This returned a status of “reachable” for the Orchestrator and Auto-support services, but the SCFS was unreachable. Some negotiations with the firewall team, and we were up and running.

Note, also, that VMware supports the use of proxy servers for communicating with Auto-support services, but I don’t believe we support the use of a proxy for Orchestrator and SCFS communications. If you’re worried about VCDR using up all your bandwidth, you can throttle it. Details on how to do that can be found here. We recommend a minimum of 100Mbps, but you can go as low as 20Mbps if required.

QNAP – Expand Volume With Larger Drives

This is one of those articles I’ve been meaning to post for a while, simply because I forget every time I do it how to do it properly. One way to expand the capacity of your QNAP NAS (non-disruptively) is to replace the drives one at a time with larger capacity drives. It’s recommended that you follow this process, rather than just ripping the drives out one by one and waiting for the RAID Group to expand. It’s a simple enough process to follow, although the QNAP UI has always struck me as a little on the confusing side to navigate, so I took some pictures. Note that this was done on QNAP firmware

Firstly, go to Storage/Snapshots under Storage in the ControlPanel. Click on Manage.

Select the Storage Pool you want to expand, and click on Manage again.

This will give you a drop-down menu. Select Replace Disks One by One.

Now select the disk you want to replace and click on Change.

Once you’ve done this for all of the disks (and it will take some time to rebuild depending on a variety of factors), click on Expand Capacity. It will ask you if you’re sure and hopefully you’ll click OK.

It’ll take a while for the RAID Group to synchronise.

You’ll notice then that, while the Storage Pool has expanded, the Volume is still the original size. Select the Volume and click on Manage.

Now you can click on Resize Volume.

The Wizard will give you information on the Storage Pool capacity and give you the option to set the new capacity of the volume. I usually click on Set to Max.

It will warn you about that. Click on OK because you like to live on the edge.

It will take a little while, but eventually your Volume will have expanded to fill the space.


Rubrik Basics – Multi-tenancy – Create An Organization

I covered multi-tenancy with Rubrik some time ago, but things have certainly advanced since then. One of the useful features of Rubrik CDM (and something that’s really required for Envoy to make sense) is the Organizations feature. This is the way in which you can use a combination of LDAP sources, roles, and tenant workloads to deliver a packaged multi-tenancy feature to organisations either within or external to your company. In this article I’ll run through the basics of setting up an Organization. If you’d like to see how it can be applied in a practical sense, it’s worth checking out my post on deploying Rubrik Envoy.

It starts, as these things often do, by clicking on the gear in the Rubrik CDM UI. Select Organizations (located under Access Management).

Click on Create Organization.

You’ll want to give it a name, and think about whether you want to give your tenant the ability to do per-tenant access control.

You’ll want an Org Admin Role to have particular abilities, and you might like to get fancy and add in some additional roles that will have some other capabilities.

At this point you’ll get to select which users you want in your Organization.

Hopefully you’ve added the tenant’s LDAP source to your environment already.

And it’s worth thinking about what users and / or groups you’ll be using from that LDAP source to populate your Organization’s user list.

You’ll also need to consider which role will be assigned to these users (rather than relying on Global Admins to do things for tenants).

You can then assign particular resources, including VMs, vApps, and so forth.

You can also select what SLA Domains the Organization has access to, as well as Archival locations, and replication targets and sources. This becomes important in a multi-tenanted environment as you don’t want folks putting data where they shouldn’t.

At this point you can download the Rubrik Envoy OVA, deploy it, and connect it to your Organization.

And then you’re done. Well, normally you would be, but I didn’t select a whole lot of objects in this example. Click Finish and you’re on your way.

Assuming you’ve assigned your roles correctly, when your tenant logs in, he or she will only be able to see and control resources that belong to that particular Organization.


Rubrik Basics – Envoy Deployment

I’ve recently been doing some work with Rubrik Envoy in the lab and thought I’d run through the basics. There’s a new document outlining the process on the articles page.


Why Envoy?

This page explains it better than I do, but Envoy is ostensibly a way for service providers to deliver Rubrik services to customers sitting on networks that are isolated from the Rubrik environment. Why would you need to do this? There are all kinds of reasons why you don’t want to give your tenants direct access to your data protection resources, and most of these revolve around security (even if your Rubrik environment is secured appropriately). As many SPs will also tell you, bringing private networks from a tenant / edge into your core is usually not a great experience either.

At a high level, it looks like this.

In this example, Tenant A sits on a private network, and the Envoy Tenant Network is The Rubrik Routable Network on the Envoy appliance is, and the data management interface on the Rubrik cluster is The Envoy appliance talks to tenant hosts over ports 12800 and 12801. The Rubrik cluster communicates with Envoy over ports 7500 and 7501. The only time the tenant network communicates with the Rubrik cluster is when the Envoy / Rubrik UI is used by the tenant. This is accessed over a port specified when the Organization is created (see below), and the Envoy to cluster communication is over port 443.

Other Notes

Envoy isn’t a data mover in its current iteration, but rather a way for SPs to present some self-service capabilities to tenants in a controlled fashion without relying on third-party portals or network translation tools. So if you had a bunch of workloads sitting in a tenant’s environment, you’d be better served deploying Rubrik Air / Edge appliances and then replicating that data into the core. If your tenant has a vCenter environment with a few VMs, you can use the Rubrik Backup Service to backup those VMs, but you couldn’t setup vCenter as a source for the tenant unless you opened up networks between your environments by some other means and added it to your Rubrik cluster. This would be ugly at best.

Note also that the deployment assumes you’re creating an Organization in the Rubrik appliance that will be used to isolate the tenant’s data and access from other tenants in the environment. To get hold of the Envoy OVA appliance and credentials, you need to run through the Organization creation process and connect the Envoy appliance when prompted. You’ll also need to ensure that you’ve configured Roles correctly for your tenant’s environment.

If, for some reason, you need to change or view the IP configuration of the Envoy appliance, it’s important to note that the articles on the Rubrik support site are a little out of step with CentOS 7 (i.e. written for Ubuntu). I don’t know whether this is because I’m using Rubrik Air appliances in the lab, but I think it’s maybe just a shift. In any case, to get IP information, you need to login to the console and go to /etc/sysconfig/network-scripts. You’ll find a couple of files (ifcfg-eth0 and ifcfg-eth1) that will tell you whether you’ve made a boo boo with your configuration or not.



I’m the first to admit it took a little while to understand the utility of something like Envoy. Most SPs struggle to deliver self-service capabilities for services that don’t always do network multi-tenancy very well. This is a good step in the direction of solving some of the problems associated with that. It’s also important to understand that, if your tenant has workloads sitting in VMware Cloud Director, for example, they’ll be accessing Rubrik resources in a different fashion. As I mentioned before, if there is a bit to protect on the edge site, it’s likely a better option to deploy a virtualised Rubrik appliance or a smaller cluster and replicate that data. In any case, I’ll update this post if I come across anything else useful.

Rubrik Basics – Rubrik CDM Upgrades With Polaris – Part 2

This is the second part of the super exciting article “Rubrik CDM Upgrades With Polaris”. In the first episode, I connected my Polaris tenancy to a valid Rubrik Support account so it could check for CDM upgrades. In this post, I’ll be covering the actual update process using Polaris. Hold on to your hats.

To get started, login to Polaris, click on the Gear icon, and select CDM Upgrades.

If there’s a new version of CDM available for deployment, you’ll see it listed in the dashboard. In this example, my test Edge cluster has an update available (5.3.1-p3). Happy days!

You’ll need to get this update downloaded to the cluster you want to install it on first. Click on the ellipsis and select Download.

You can then choose to download the release from Rubrik or locally.

Click on the version you want to download and click Next.

You then get the opportunity to confirm the download. Click on Confirm to do this.

It will then let you know that it’s working on it.

Once the update has downloaded, you’ll see “Ready for upgrade” on the dashboard.

Select the cluster you’d like to upgrade and click on Upgrade.

At this point, you’ll get the option to schedule the upgrade, and select to rollback if the upgrade fails for some reason.

Confirm the upgrade and you’ll be on your way.

Polaris lets you know that it’s working on it.

You can see the progress in the dashboard.

When it’s done, it’s done.

And that’s it. This highlights the utility of something like Polaris, particularly when you’re managing a large number of clusters and need to keep things in tip-top shape.

Rubrik Basics – Rubrik CDM Upgrades With Polaris – Part 1

I decided to break this article into 2 parts. Not because it’s super epic or particularly complicated, but because there are a lot of screenshots and it just looks weird if I put it in one big thing. Should it have been a downloadable article? Sure, probably. But here we are. It’s been some time since I ran through the Rubrik CDM upgrade process (on physical hardware no less). I didn’t have access to Polaris GPS at that time, and thought it would be useful to run through what it looks like to perform platform upgrades via that rather than the CLI. This post covers the process of configuring Polaris to check for CDM updates, and the second post covers deploying those updates to Rubrik clusters.

Login to your Polaris dashboard, click on the Gear icon, and select CDM Upgrades.

Click on Connect to Support Portal to enter your Rubrik support account details. This lets your Polaris instance communicate freely with the Rubrik Support Portal.

You’ll need a valid support account to connect.

If you’ve guessed your password successfully, you’ll get a message at the bottom of the screen letting you know as much.

If you environment was already fairly up to date, you may not see anything listed in the CDM Upgrades dashboard.

And that’s it for Part 1. I can hear you asking “how could it get any more exciting than this, Dan?”. I know, it’s pretty great. Just wait until I run you though deploying an update in this post.

Rubrik Basics – Add A VMware Cloud Director Instance

You’ve deployed your Rubrik virtual appliance (technically I should have used Air but let’s just go with it) and now you want to protect a VMware Cloud Director instance. When you add an instance, Rubrik automatically discovers all of the components of your VCD environment, including:

  • Organizations;
  • Organization virtual datacenters;
  • vApps; and
  • Virtual machines.

You can protect vApps by assigning the SLA Domain at various levels in the VCD hierarchy, and also by assigning it to individual VMs. vApp protection also protects vApp metadata including networks, boot order, and the access list. There are a few limitations with vApp protection to keep in mind as well.

Virtual machines in a vApp Maximum of 128 virtual machines in a vApp. To protect a vApp with more than 128 virtual machines, use the exclude function to reduce the number protected.
Mounts The Rubrik cluster performs all mounts for vApps at the virtual machine level.
Backup exclusion Protection of vApps does not include Cloud Director Object Metadata.
Autodiscovery Rubrik CDM ignores the Cloud Director auto discovery feature.

There’s good support for multi-tenancy and RBAC as well. There’s a bunch of other stuff I could write about VCD and Rubrik but let’s just get started on adding an instance. Click on the Gear and select “vCD Instances”.

Then click on “Add vCD Account”.


You’ll then have the opportunity to enter your credentials.

I use all dots for my password too.

Once you’ve added the instance you’ll see it listed under “All vCD Instances”.

If you look under “Virtual Machines” you should see any vApps associated with the instance listed under “vCD Apps”. In this example my tenancy only has one vApp deployed.

And that’s it. This all gets a lot more interesting when you start messing about with the Rubrik VCD plug-in and the API, but that’s a story for another time.