VMware Cloud on AWS – TMCHAM – Part 5 – VM Management

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around managing VMs running on the VMware-managed VMware Cloud on AWS platform, and talk about vCenter plugins and what that looks like when you move across to VMware Cloud on AWS.

How Can I Access vCenter?

VMware vCenter has been around since Hector was a pup, and the good news is that it can be used to manage your VMware Cloud on AWS environment. It’s accessible via a few different methods, including PowerCLI. If you want to access the HTML5 UI via the cloud console, you’ll need to ensure there’s a firewall rule in place to allow access via your Management gateway – the official documentation is here. If the rule has already been created and you just need to add your IP to the mix, here’s the process.

The first step is to find out your public IP address. I use WhatIsMyIP.com to do this.

In your console, go to Networking & Security -> Inventory -> Groups.

Under Groups, make sure you select Management Groups.

You’ll find a Group that was created that stores the IP information of folks wanting to access vCenter. In this example, we’ve called it “SET Home IP Addresses”.

Click on the vertical ellipsis and click Edit.

Click on the IPs section.

You’ll then see a spot where you can enter your IP address. You can do a single address or enter a range, as shown below.

Click Apply and then click Save to save the rule. Now you should be able to open vCenter.

Can I run RVTools and other scripts on my VMC environment?

Yes, you can run RVTools against your environment. In terms of privilege levels with VMware Cloud on AWS, you get CloudAdmin. The level of access is outlined here. It’s important to understand these privilege levels, because some things will and won’t work as a result of these.

Can I lockdown my VMs using PowerShell?

You will have the ability to set these advanced settings on your VMs in the SDDC, but this is limited to per-VM, rather than on a per-cluster basis. So if you normally ran a script on a pre-VM basis to harden the VM config, you’d need to run that on each VM individually, rather than on a per-cluster level.

What about vCenter plugins?

We don’t have a concept of vCenter plugins in VMware Cloud on AWS, so there are different ways to get the information you’d normally need. vROps, for example, has the ability to look at VMware Cloud on AWS, using either the on-premises version or the cloud version. There’s information on that here, but note that the plugin isn’t supported with VMC vCenter.

What about my Site Recovery Manager plugin? The mechanism for managing this will change depending on whether you’re using SRaaS or VCDR to protect your workloads. There’s some good info on SRaaS here, and some decent VCDR information here. Again, there is no plugin available, but the element managers are available via the cloud console.  

What about NSX-V? VMware Cloud on AWS is all NSX-T, and you can access the NSX Manager via the cloud console.

Conclusion

A big part of the reason people like VMware Cloud on AWS is that the management experience doesn’t differ significantly from what you get VMware Cloud Foundation of VMware Validated Designs on-premises. That said, there are a few things that do change when you move to VMware Cloud on AWS. Things like plugins don’t exist, but you can still run many of the scripts you know and love against the platform. Remember, though, it is a fully managed service, so some of the stuff you used to run against your on-premises environment is no longer necessary.

VMware – VMworld 2017 – STO2063BU – Architecting Site Recovery Manager to Meet Your Recovery Goals

Disclaimer: I recently attended VMworld 2017 – US.  My flights were paid for by ActualTech Media, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my rough notes from “STO2063BU – Architecting Site Recovery Manager to Meet Your Recovery Goals” presented by GS Khalsa. You can grab a PDF version of them from here. It’s mainly bullet points, but I’m sure you know the drill.

 

Terminology

  • RPO – Last viable restore point
  • RTO – How long it will take before all functionality is recovered

You should break these down to an application, or a service tier level.

 

Protection Groups and Recovery Plans

What is a Protection Group?

Group of VMS that will be recovered together

  • Application
  • Department
  • System Type
  • ?

Different depending on replication type

A VM can only belong to one Protection Group

 

How do protection groups fit into Recovery Plans?

[Image via https://blogs.vmware.com/vsphere/2015/05/srm-protection-group-design.html]

 

vSphere Replication Protection Groups

  • Group VMs as desired into Protection Groups
  • What storage they are located on doesn’t matter

 

Array Based Protection Groups

If you want your protection groups to align to your applications – you’ll need to shuffle storage around

 

Policy Driven Protection

  • New style Protection Group leveraging storage profiles
  • High level of automation compared to traditional protection groups
  • Policy based approach reduces OpEx
  • Simpler integration of VM provisioning, migration, and decommissioning

This was introduced in SRM 6.1

 

How Should You Organise Your Protection Groups?

More Protection Groups

  • Higher RTO
  • Easier testing
  • Only what is needed
  • More granular and complex

Fewer Protection Groups

  • Lower RTO
  • Less granular, complex and flexible

This varies by customer and will be dictated by the appropriate combination of complexity and flexibility

 

Topologies

SRM Supports Multiple DR Topologies

Active-Passive Failover

  • Dedicated resources for recovery

Active-Active Failover

  • Run low priority apps on recovery infrastructure

Bi-directional Failover

  • Production applications at both sites
  • Each site acts as the recovery site for the other

Multi-site

  • Many-to-one failover
  • Useful for Remote Office / Branch Office

 

Enhanced Topology Support

There are a few different topologies that are supported.

[Images via https://blogs.vmware.com/virtualblocks/2016/07/28/srm-multisite/]

  • 10 SRM pairs per vCenter
  • A VM can only be protected once

 

SRM & Stretched Storage

[Image via https://blogs.vmware.com/virtualblocks/2015/09/01/srm-6-1-whats-new/]

Supported as of SRM 6.1

 

SRM and vSAN Stretched Cluster

[Image via https://blogs.vmware.com/virtualblocks/2015/08/31/whats-new-vmware-virtual-san-6-1/]

Failover to the third site (not the 2 sites comprising the cluster)

 

Enhanced Linked Mode

You can find more information on Enhanced Linked Mode here. It makes it easier to manage your environment and was introduced in vSphere 6.0.

 

Impacts to RTO

Decision Time

How long does it take to decide to failover?

 

IP Customisation

Workflow without customisation

  • Power on VM and wait for VMtools heartbeats

Workflow with IP customisation

  • Power on VM with network disconnected
  • Customise IP utilising VMtools
  • Power off VM
  • Power on VM and wait for VMtools heartbeats

Alternatives

  • Stretched Layer 2
  • Move VLAN / Subnet

It’s going to take some time to do when you failover a guest

 

Priorities and Dependencies vs Priorities Only

Organisation for lower RTO

  • Fewer / larger NFS datastore / LUNs
  • Fewer protection groups
  • Don’t replicate VM swap files
  • Fewer recovery plans

 

VM Configuration

  • VMware Tools installed in all VMs
  • Suspend VMS on Recovery vs PowerOff VMs
  • Array-based replication vs vSphere Replication

 

Recovery Site Sizing

  • vCenter sizing – it works harder than you think
  • Number of hosts – more is better
  • Enable DRS – why wouldn’t you?
  • Different recovery plans target different clusters

 

Recommendations

Be Clear with the Business

What is / are their

  • RPOs?
  • RTOs?
  • Cost of downtime?
  • Application priorities?
  • Units of failover?
  • Externalities?

Do you have Executive buy-in?

 

Risk with Infrequent DR Plan Testing

  • Parallel and cutover tests provide the best verification, but are very resource intensive and time consuming
  • Cutover tests are disruptive, may take days to complete and leaves the business at risk

 

Frequent DR Testing Reduces Risk

  • Increased confidence that the plan will work
  • Recovery can be tested at anytime without impact to production

 

Test Network

Use VLAN or isolated network for test environment

  • Default “auto” setting does not allow VM communication between hosts

Different PortGroup can be specified in SRM for test vs actual run

  • Specified in Network Mapping and / or Recovery Plan

 

Test Network – Multiple Options

Two Options

  • Disconnect NSX Uplink (this can be easily scripted)
  • Use NSX to create duplicate “Test” networks

RTO = dollars

*Demos

 

Conclusion and Further Reading

I enjoy these kind of sessions, as they provide a nice overview of the product capabilities that ties in well with business requirements. SRM is a pretty neat solution, and something you might consider using if you need to move workload from one DC to another. If you’re after a technical overview of Site Recovery Manager 6.5, this site is pretty good too. 4.5 stars.

New book on VMware SRM now available

Good news, “Disaster Recovery Using VMware vSphere Replication and vCenter Site Recovery Manager – Second Edition” has just been released via Packt Publishing. It was written by Abhilash G B and I had the pleasure of serving as the technical reviewer. While SRM 6.5 has just been announced this is nonetheless a handy manual with some great guidance (and pictures!) on how to effectively use SRM with both array-based and vSphere Replication-based protection. There’s an ebook version available for purchase with a print copy also available for order.

6096en_5349_disasterrecoveryusingvmwarevspherereplicationandvcentersiterecoverymanagersecond

Brisbane VMUG – November 2015

The November Brisbane VMUG will be held on Thursday 19th November at EMC’s office in the city (Level 11, 345 Queen Street, Brisbane) from 4 – 6 pm. It’s sponsored by EMC and should be a ripper.

Here’s the agenda:

  • Latest SRM Features and VMworld Highlights – VMware
  • EMC VPLEX Metro with Site Recovery Manager Presentation
  • EMC Demonstration
  • Pizza and Refreshments (I promise)

You can find out more information and register for the event here. I hope to see you there. Also, if you’re interested in sponsoring one of these events, please get in touch with me and I can help make it happen.

EMC MirrorView – New Article

I wrote a brief article on configuring EMC MirrorView from scratch through to being ready for VMware SRM usage. It’s not really from scratch, because I don’t go through the steps required to load the MirrorView enabler on the frame. But I figured that’s more a CE-type activity in any case. I hope to follow it up with a brief article on the SRM side of things. You can find my other articles here. Enjoy!