Book Review – VMware Cloud on AWS Blueprint

Late last year I was approached by the folks at Packt Publishing to be a technical reviewer on a book about VMware Cloud on AWS. I was happy to be involved as VMC is something I’ve been working with quite a bit since I started at VMware. Fast forward to a few months ago and I received my reviewer copy (yes, an actual book, although you can also buy a PDF or access it via a subscription) of VMware Cloud on AWS Blueprint, written by Oleg Ulyanov, Michael Schwartzman, and Harsha Sanku. I thought I’d do a quick review of the book here, as I think it’s something worth diving into if you’re looking at running, or already run, VMware Cloud on AWS.

 

What’s In The Book?

The book weighs in at 388 pages, and is divided into 12 chapters, covering the foundational aspects of the VMware Cloud on AWS service, along with everything you need to know to standup the service, connect to it, run applications on it, and secure it. You’ll get the full view of what nodes go into the software-defined data centre (SDDC), what connectivity options you have available, and the best ways to put it all together. There are also chapters covering native AWS integrations, automation, as well as a chapter on the (now maybe only historically interesting) VMware Cloud on AWS Outposts offering. In short, it covers quite a lot of ground, and fills in a lot of detail that could otherwise be confusing for the first-time user of VMware Cloud on AWS. I’m the first to admit that I’m not the best when it comes to advanced networking and security concepts, being more of a disk slinger in the past than someone who focusses on things that go ping, so I found the chapter on understanding networking and security configurations to be truly helpful. Additionally, there was a great chapter on best practice advice, along with guidance on how to avoid common mistakes when deploying and using VMware Cloud on AWS.

 

Why Read It?

So why bother reading a book about a solution that you’ve probably already deployed? Because chances are there’s going to be some information in there that you haven’t come across, or hadn’t considered when you deployed your VMware Cloud on AWS solution. I’m a big believer in the documentation being able to get you so far, but it is books by specialists that can really open up a topic for you and allow you to see things from a different viewpoint. You might have just deployed your first SDDC, or you might have 20 of them running across multiple AWS Regions. I think you’ll still get some benefits from reading this book. Even if you’re not looking to leverage VMware Cloud on AWS, this book will give you some great insights into how a well-architected, mature, infrastructure-as-a-service offering looks, and provides some great perspectives on design considerations and things to look out for. The authors all have years of field experience, and know what they’re talking about. It was a real pleasure to be involved with this project, and I recommend you check it out.

Random Short Take #94

Welcome to Random Short Take #94. Let’s get random.

Random Short Take #93

Welcome to Random Short Take #93. If it’s old news can it still be called news? Maybe olds. Let’s get random.

Brisbane VMUG – Lunch and Learn – April 2024

The April 2024 edition of the Brisbane VMUG meeting will be held on Wednesday 24th April at the Attura Head Office (Level 9, 116 Adelaide Street, Brisbane, Queensland, 4000) from 12pm – 1:30pm. It’s sponsored by Cloud Ready Solutions and promises to be a great session.

Here’s the agenda:

  • Potential threats to virtual environments
  • Protecting virtual workloads with NAKIVO
    • VM backup
    • Ransomware protection
    • Instant recovery
    • Disaster recovery
    • IT Monitoring for VMware vSphere
    • MSP Console
  • Best practices for virtual data protection
    • The 3-2-1-1-0 strategy
    • Automated workflows
    • Storage efficiency
    • Flexible retention
    • Backup security
    • Backup vs. replication
    • Case studies
    • Technical demo
  • Q&A session

Cloud Ready Solutions has gone to great lengths to make sure this will be a fun and informative session and I’m really looking forward to hearing about NAKIVO. You can find out more information and register for the event here. I hope to see you there. Also, if you’re interested in sponsoring one of these events, please get in touch with me and I can help make it happen.

Disaster Recovery – Do It Yourself or As-a-Service?

I don’t normally do articles off the back of someone else’s podcast episode, but I was listening to W. Curtis Preston and Prasanna Malaiyandi discuss “To DRaaS or Not to DRaaS” on The Backup Wrap-up a little while ago, and thought it was worth diving into a bit on the old weblog. In my day job I talk to many organisations about their Disaster Recovery (DR) requirements. I’ve been doing this a while, and I’ve seen the conversation change somewhat over the last few years. This article will barely skim the surface on what you need to think about, so I recommend you listen to Curtis and Prasanna’s podcast, and while you’re at it, buy Curtis’s book. And, you know, get to know your environment. And what your environment needs.

 

The Basics

As Curtis says in the podcast, “[b]ackup is hard, recovery is harder, and disaster recovery is an advanced form of recovery”. DR is rarely a matter of having another copy of your data safely stored away from wherever your primary workloads are hosted. Sure, that’s a good start, but it’s not just about getting that data back, or the servers, it’s the connectivity as well. So you have a bunch of servers running somewhere, and you’ve managed to get some / all / most of your data back. Now what? How do your users connect to that data? How do they authenticate? How long will it take to reconfigure your routing? What if your gateway doesn’t exist any more? A lot of customers think of DR in a similar fashion to the way they treat their backup and recovery approach. Sure, it’s super important that you understand how you can meet your recovery point objectives and your recovery time objectives, but there are some other things you’ll need to worry about too.

 

What’s A Disaster?

Natural

What kind of disaster are you looking to recover from? In the olden days (let’s say about 15 years ago), I was talking to clients about natural disasters in the main. What happens when the Brisbane River has a one in a hundred years flood for the second time in 5 years? Are your data centres above river level? What about your generators? What if there’s a fire? What if you live somewhere near the ocean and there’s a big old tsunami heading your way? My friends across the ditch know all about how not to build data centres on fault lines.

Accidental

Things have evolved to cover operational considerations too. I like to joke with my customers about the backhoe operator cutting through your data centre’s fibre connection, but this is usually the reason why data centres have multiple providers. And there’s alway human error to contend with. As data gets more concentrated, and organisations look to do things more efficiently (i.e. some of them are going cheap on infrastructure investments), the risk of messing up a single component can have a bigger impact than it would have previously. You can architect around some of this, for sure, but a lot of businesses are not investing in those areas and just leaving it up to luck.

Bad Hacker, Stop It

More recently, however, the conversation I’ve been having with folks across a variety of industries has been more about some kind of human malfeasance. Whether it’s bad actors (in hoodies and darkened rooms no doubt) looking to infect your environment with malware, or someone not paying attention and unleashing ransomware into your environment, we’re seeing more and more of this kind of activity having a real impact on organisations of all shapes and sizes. So not only do you need to be able to get your data back, you need to know that it’s clean and not going to re-infect your production environment.

A Real Disaster

And how pragmatic are you going to be when considering your recovery scenarios? When I first started in the industry, the company I worked for talked about their data centres being outside the blast radius (assuming someone wanted to drop a bomb in the middle of the Brisbane CBD). That’s great as far as it goes, but are all of your operational staff going to be around to start of the recovery activities? Are any of them going to be interested in trying to recovery your SQL environment if they have family and friends who’ve been impacted by some kind of earth-shattering disaster? Probably not so much. In Australia many organisations have started looking at having some workloads in Sydney, and recovery in Melbourne, or Brisbane for that matter. All cities on the Eastern seaboard. Is that enough? What if Oz gets wiped out? Should you have a copy of the data stored in Singapore? Is data sovereignty a problem for you? Will anyone care if things are going that badly for the country?

 

Can You Do It Better?

It’s not just about being able to recover your workloads either, it’s about how you can reduce the complexity and cost of that recovery. This is key to both the success of the recovery, and the likelihood of it not getting hit with relentless budget cuts. How fast do you want to be able to recover? How much data do you want to recover? To what level? I often talk to people about them shutting down their test and development environments during a DR event. Unless your whole business is built on software development, it doesn’t always make sense to have your user acceptance testing environment up and running in your DR environment.

The cool thing about public cloud is that there’s a lot more flexibility when it comes to making decisions about how much you want to run and where you want to run it. The cloud isn’t everything though. As Prasanna mentions, if the cloud can’t run all of your workloads (think mainframe and non-x86, for example), it’s pointless to try and use it as a recovery platform. But to misquote Jason Statham’s character Turkish in Snatch – “What do I know about DR?” And if you don’t really know about DR, you should think about getting someone who does involved.

 

What If You Can’t Do It?

And what if you want to DRaaS your SaaS? Chances are high that you probably can’t do much more than rely on your existing backup and recovery processes for your SaaS platforms. There’s not much chance that you can take that Salesforce data and put it somewhere else when Salesforce has a bad day. What you can do, however, is be sure that you back that Salesforce data up so that when someone accidentally lets the bad guys in you’ve got data you can recover from.

 

Thoughts

The optimal way to approach DR is an interesting problem to try and solve, and like most things in technology, there’s no real one size fits all approach that you can take. I haven’t even really touched on the pros and cons of as-a-Service offerings versus rolling your own either. Despite my love of DIY (thanks mainly to punk rock, not home improvement), I’m generally more of a fan of getting the experts to do it. They’re generally going to have more experience, and the scale, to do what needs to be done efficiently (if not always economically). That said, it’s never as simple as just offloading the responsibility to a third-party. You might have a service provider delivering a service for you, but ultimately you’ll always be accountable for your DR outcomes, even if you’re not directly responsible for the mechanics of it.

VMware Cloud on AWS – TMCHAM – Part 13 – Delete the SDDC

Following on from my article on host removal, in this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to cover SDDC removal on the VMware-managed VMware Cloud on AWS platform. Don’t worry, I haven’t lost my mind in a post-acquisition world. Rather, this is some of the info you’ll find useful if you’ve been running a trial or a proof of concept (as opposed to a pilot) deployment of VMware Cloud Disaster Recovery (VCDR) and / or VMware Cloud on AWS and want to clean some stuff up when you’re all done.

 

Process

Firstly, if you’re using VCDR and want to deactivate the deployment, the steps to perform are outlined here, and I’ve copied the main bits from that page below.

  1. Remove all DRaaS Connectors from all protected sites. See Remove a DRaaS Connector from a Protected Site.
  2. Delete all recovery SDDCs. See Delete a Recovery SDDC.
  3. Deactivate the recovery region from the Global DR Console. (Do this step last.) See Deactivate a Recovery Region. Usage charges for VMware Cloud DR are not stopped until this step is completed.

Funnily enough, as I was writing this, someone zapped our lab for reasons. So this is what a Region deactivation looks like in the VCDR UI.

Note that it’s important you perform these steps in that order, or you’ll have more cleanup work to do to get everything looking nice and tidy. I have witnessed firsthand someone doing it the other way and it’s not pretty. Note also that if your Recovery SDDC had services such as HCX connected, you should hold off deleting the Recovery SDDC until you’ve cleaned that bit up.

Secondly, if you have other workloads deployed in a VMware Cloud on AWS SDDC and want to remove a PoC SDDC, there are a few steps that you will need to follow.

If you’ve been using HCX to test migrations or network extension, you’ll need to follow these steps to remove it. Note that this should be initiated from the source side, and your HCX deployment should be in good order before you start (site pairings functioning, etc). You might also wish to remove a vCenter Cloud Gateway, and you can find information on that process here.

Finally, there are some AWS activities that you might want to undertake to clean everything up. These include:

  • Removing VIFs attached to your AWS VPC.
  • Deleting the VPC (this will likely be required if your organisation has a policy about how PoC deployments  are managed).
  • Tidy up and on-premises routing and firewall rules that may have been put in place for the PoC activity.

And that’s it. There’s not a lot to it, but tidying everything up after a PoC will ensure that you avoid any unexpected costs popping up in the future.

VMware Cloud on AWS – TMCHAM – Part 12 – Host Removal

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to cover host removal on the VMware-managed VMware Cloud on AWS platform. This is a fairly brief post, as there’s not a lot to say about the process, but I’ve had enough questions about that I thought it was worth covering.

 

Background

I’ve written about Elastic DRS (EDRS) in VMware Cloud on AWS previously. It’s something you can’t turn off, and the Baseline policy does a good job of making sure you don’t get in hot water from a storage perspective. That said, there might be occasions where you want to remove a host or two to scale in your cluster manually. This might happen after a cluster conversion, or you may have had a rapid scale out event and you have now removed whatever workloads caused that scale out event to occur.

 

Process

The process to remove a host is documented here. Note that it is a sequential process, with one host being removed at a time. Depending on the number of hosts in your cluster, you may need to adjust your storage and fault tolerance policies as well. To start the process, go to your cloud console and select the SDDC you want to remove the hosts from. If there’s only one cluster, you can click on Remove Hosts under Actions. If there are multiple clusters in the SDDC, you’ll need to select the cluster you want to remove the host from.

You’ll then be advised that you need to understand what you’re doing (and acknowledge that), and you may be advised to change your default storage policies as well. More info on those policies is here.

Once you kick off the process, the cluster will be evaluated to ensure that removing hosts will not violate the applicable EDRS policies. VMs will be migrated off the host when it’s put into maintenance mode, and billing will be stopped for that host.

And that’s it. Pretty straightforward.

VMware – vExpert 2024

I’m very happy to have been listed as a vExpert for 2024. This is the twelfth time that they’ve forgotten to remove my name from the list (even I didn’t think I’d keep doing that “joke”). You can read more about it here. Thanks again to Corey Romero, the vExpert PROs, and the VMware by Broadcom Community and Advocacy Team for making this kind of thing happen. And thanks also to the vExpert community for being such a great community to be part of. Congratulations to you (whether this is your first or thirteenth time). There’s been a lot happening in and around VMware recently, and I’m happy that programs like this can continue to exist.

VMware Cloud on AWS – What’s New – February 2024

It’s been a little while since I posted an update on what’s new with VMware Cloud on AWS, so I thought I’d share some of the latest news.

 

M7i.metal-24xl Announced

It’s been a few months since it was announced at AWS re:Invent 2023, but the M7i.metal-24xl (one of the catchier host types I’ve seen) is going to the change the way we approach storage-heavy VMC on AWS deployments.

What is it?

It’s a host without local storage. There are 48 physical cores (96 logical cores with Hyper-Threading enabled). It has 384 GiB memory. The key point is that there are flexible NFS storage options to choose from – VMware Cloud Flex Storage or Amazon FSx for NetApp ONTAP. There’s support for up to 37.5 Gbps networking speed, and it supports always-on memory encryption using Intel Total Memory Encryption (TME).

Why?

Some of the potential use cases for this kind of host type are as follows:

  • CPU Intensive workloads
    • Image processing
    • Video encoding
    • Gaming servers
  • AI/ML Workloads
    • Code Generation
    • Natural Language Processing
    • Classical Machine Learning
    • Workloads with limited resource requirements
  • Web and application servers
    • Microservices/Management services
    • Secondary data stores/database applications
  • Ransomware & Disaster Recovery
    • Modern Ransomware Recovery
    • Next-gen DR
    • Compliance and Risk Management

Other Notes

New (greenfield) customers can deploy the M7i.metal-24xl in the first cluster using 2-16 nodes. Existing (brownfield) customers can deploy the M7i.metal-24xl in secondary clusters in the same SDDC. In terms of connectivity, we recommend you take advantage of VPC peering for your external storage connectivity. Note that there is no support for multi-AZ deployments, nor is there support for single node deployments. If you’d like to know more about the M7i.metal-24xl, there’s an excellent technical overview here.

 

vSAN Express Storage Architecture on VMware Cloud on AWS

SDDC Version 1.24 was announced in November 2023, and with that came support for vSAN Express Storage Architecture (ESA) on VMC on AWS. There’s some great info on what’s included in the 1.24 release here, but I thought I’d focus on some of the key constraints you need to look at when considering ESA in your VMC on AWS environment.

Currently, the following restrictions apply to vSAN ESA in VMware Cloud on AWS:
  • vSAN ESA is available for clusters using i4i hosts only.
  • vSAN ESA is not supported with stretched clusters.
  • vSAN ESA is not supported with 2-host clusters.
  • After you have deployed a cluster, you cannot convert from vSAN ESA to vSAN OSA or vice versa.
So why do it? There are plenty of reasons, including better performance, enhanced resource efficiency, and several improvements in terms of speed and resiliency. You can read more about it here.

VMware Cloud Disaster Recovery Updates

There have also been some significant changes to VCDR, with the recent announcement that we now support a 15-minute Recovery Point Objective (down from 30 minutes). There have also been a number of enhancements to the ransomware recovery capability, including automatic Linux security sensor installation in the recovery workflow (trust me, once you’ve done it manually a few times you’ll appreciate this). With all the talk of supplemental storage above, it should be noted that “VMware Cloud DR does not support recovering VMs to VMware Cloud on AWS SDDC with NFS-mounted external datastores including Amazon FSx for NetApp datastores, Cloud Control Volumes or VMware Cloud Flex Storage”. Just in case you had an idea that this might be something you want to do.

 

Thoughts

Much of the news about VMware has been around the acquisition by Broadcom. It certainly was news. In the meantime, however, the VMware Cloud on AWS product and engineering teams have continued to work on releasing innovative features and incremental improvements. The encouraging thing about this is that they are listening to customers and continuing to adapt the solution architecture to satisfy those requirements. This is a good thing for both existing and potential customers. If you looked at VMware Cloud on AWS three years ago and ruled it out, I think it’s worth looking at again.

Random Short Take #92

Happy New Year. Some of this news is old news now, but I’m posting it anyway. Let’s get random.