Random Short Take #71

Welcome to Random Short Take #71. A bit of home IT in this one. Let’s get random.

VMware Cloud on AWS – TMCHAM – Part 3 – SDDC Lifecycle

In this episode of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around the lifecycle of the VMware-managed VMware Cloud on AWS platform, and what customers need to know to make sense of it all.

 

The SDDC

If you talk to VMware folks about VMware Cloud on AWS, you’ll hear a lot of talk about software-defined data centres (SDDCs). This is the logical construct in place that you use within your Organization to manage your hosts and clusters, in much the same fashion as you would your on-premises workloads. Unlike most on-premises workloads, however, the feeding and watering of the SDDC, from a software currency perspective, is done by VMware.

Release Notes

If you’ve read the VMware Cloud on AWS Release Notes, you’ll see something like this at the start:

“Beginning with the SDDC version 1.11 release, odd-numbered releases of the SDDC software are optional and available for new SDDC deployments only. By default, all new SDDC deployments and upgrades will use the most recent even-numbered release. If you want to deploy an SDDC with an odd-numbered release version, contact your VMware TAM, sales, or customer success representative to make the request.”

Updated on: 5 April  2022

Essential Release: VMware Cloud on AWS (SDDC Version 1.18) | 5 April 2022

Optional Release: VMware Cloud on AWS (SDDC Version 1.17) | 19 November 2021

Basically, when you deploy onto the platform, you’ll usually get put on what VMware calls an “Essential” release. From time to time, customers may have requirements that mean that they qualify to be deployed on an “Optional” release. This might be because they have a software integration requirement that hasn’t been handled in 1.16, for example, but is available for 1.17. It’s also important to note that each major release will have a variety of minor releases as well, depending on issues that need to be resolved or features that need to be rolled out. So you’ll also see references to 1.16v5 in places, for example.

Upgrades and Maintenance

So what happens when your SDDC is going to be upgraded? Well, we let you know in advance, and it’s done in phases, as you’d imagine.

[image courtesy of VMware]

You can read more about the process here. VMware also does the rollout of releases in waves, so not every customer has the upgrade done at the same time. If you’re the type of customer that needs to be on the latest version of everything, or perhaps you have a real requirement to be near the front of the line, you should talk to your account team and they’ll liaise with the folks who can make it happen for you. When the upgrades are happening, you should be careful not to:

  • Perform hot or cold workload migrations. Migrations fail if they are started or in progress during maintenance.
  • Perform workload provisioning (New/Clone VM). Provisioning operations fail if they are started or in progress during maintenance.
  • Make changes to Storage-based Policy Management settings for workload VMs.

You should also ensure that there is enough storage capacity (> 30% slack space) in each cluster.

How Long Will It Take?

As usual, it depends. But you can make some (very) rough estimates by following the guidance on this page.

Will My SDDC Expire?

Yes, your SDDC version will some day expire. But it will be upgraded before that happens. There’s a page where you can look up the expiration dates of the various SDDC releases. It’s all part of the lifecycle part of the SDDC lifecycle.

Correlating VMware Cloud on AWS with Component Releases

Ever found yourself wondering what component versions are being used in VMware Cloud on AWS? Wonder no more with this very handy reference.

 

Conclusion

There’s obviously a lot more that goes on behind the scenes to keep everything running in tip-top shape for our customers. All of this talk of phases, waves, and release notes can be a little confusing if you’re new to the platform. Having worked in a variety of (managed and unmanaged) service providers over the years, I do like that VMware has bundled up all of this information and put it out there for people to check out. As always, if you’ve got questions about how the various software integrations work, and you can’t find the information in the documentation, reach out to your local account team and they’ll be able to help.

Random Short Take #70

Welcome to Random Short Take #70. Let’s get random.

VMware Cloud on AWS – TMCHAM – Part 2 – VCDR Notes

In this episode of “Things My Customers Have Asked Me” (or TMCHAM for short), I’m going to dive into a few questions around VMware Cloud Disaster Recovery (VCDR), a service we offer as an add-on to VMware Cloud on AWS. If you’re unfamiliar with VCDR, you can read a bit more about it here.

VCDR Roles and Permissions

Can RBAC roles be customised? Not really, as these are cascaded down from the Cloud Services hub. As I understand it, I don’t believe you have granular control over it, just the pre-defined, default roles as outlined here, so you need to be careful about what you hand out to folks in your organisation. To see what Service Roles have been assigned to your account, in the VMware Cloud Services, go to My Account, and then click on My Roles. Under Service Roles, you’ll see a list of services, such as VCDR, Skyline, and so on. You can then check what roles have been assigned. 

VCDR Protection Groups

VCDR Protection Groups are the way that we logically group together workloads to be protected with the same RPO, schedule, and retention. There are two types of protection group: standard-frequency and high-frequency. Standard-frequency snapshots can be run as often as every 4 hours, while high-frequency snapshots can go as often as every 30 minutes. You can read more on protection groups here. It’s important to note that there are some caveats to be aware of with high-frequency snapshots. These are outlined here.

30-minute RPOs were introduced in late 2021, but there are some caveats that you need to be aware of. Some of these are straightforward, such as the minimum software levels for on-premises protection. But you also need to be mindful that VMs with existing vSphere snapshots will not be included, and, more importantly, high-frequency snapshots can’t be quiesced.

Can you have a VM instance in both a standard- and high-frequency snapshot protection group?  Would this allow us to get the best of both worlds – e.g. RPO could be as low as 30 minutes, but with a guaranteed snapshot of 4 hours?  Once you do a high-frequency snap on a VM, it keeps using that mechanism thereafter, even if it sits in a protection group using standard protection. Note also that you set a schedule for a protection group, so you can have snapshots running ever 30 mins and kept for a particular period of time (customer selects this). You could also run snapshots at 4 hours and keep those for a period of time too. While you can technically have a VM in multiple groups, what you’re better off doing is configuring a variety of schedules for your protection groups to meet those different RPOs.

Quiesced Snapshots

What happens to a VM during a quiesced state – would we experience micro service outages? The best answer I can give is “it depends”. The process for the standard, quiesced snapshot is similar to the one described hereThe VM will be stunned by the process, so depending on what kind of activity is happening on the VM, there may be a micro outage to the service.

Other Considerations

The documentation talks about not changing anything when a scheduled snapshot is being run – how do we manage configuration of the SDDC if jobs are running 24/7?  Seems odd that nothing can be changed when a scheduled snapshot is being run? This refers more to the VM that is being snapped. i.e. Don’t change configs or make changes to the environment, as that would impact this VM. It’s not a blanket rule for the whole environment. 

Like most things, success with VCDR relies heavily on understanding the outcomes your organisation wants to achieve, and then working backwards from there. It’s also important to understand that this is a great way to do DR, but not necessarily a great way to do standard backup and recovery activities. Hopefully this article helps clarify some of the questions folks have around VCDR, and if it doesn’t, please don’t hesitate to get in contact.

VMware Cloud on AWS – TMCHAM – Part 1 – PCI DSS

I’m starting a new series on the blog. It’s called “Things My Customers Have Asked Me” (or TMCHAM for short). There are frequently occasions where the customer collateral I present on VMware Cloud on AWS doesn’t cover every single use case that my customers are interested in, or perhaps it doesn’t dive deeply enough into some of the material people would like to know more about. The idea behind these posts is that if I have one customer asking about this stuff, chances are another one might like to know about it too. I won’t be talking about internal-only stuff, or roadmap details in these posts (or anywhere publicly, for that matter), but hopefully these articles will be a useful point of information consolidation for folks who are into that sort of thing.

 

PCI DSS?

The Payment Card Industry Data Security Standard (PCI DSS) is the security standard adhered to by organisations handling credit card information from the major card vendors. You can find the official Attestation of Compliance (AoC) in the VMware Cloud Trust Center, and there’s also a comprehensive whitepaper here.

Getting Started on VMware Cloud on AWS

The capability was covered in March 2021, and you can see some of the details in the VMware Cloud on AWS Release Notes. You can also read my learned colleague Greg Vinton’s take on it here, and there’s a YouTube video for people who prefer that sort of thing. To enable PCI compliance on your Organization, you need to request the capability via your VMware account team. It’s not just something that’s configured by default, as some of the requirements around PCI DSS might be considered an unnecessary overhead by some folks. The account team will get it enabled on your Organization, and you can then deploy your SDDC. It’s important to note that your Organization needs to be empty – PCI DSS can’t be enabled on an Organization with SDDCs that are already deployed.

Configuration Changes

There are a number of configuration changes needed to ensure that your SDDC is PCI-compliant too. This includes disabling add-on services like HCX and Site Recovery. To do this, go to Inventory – Settings, and scroll down to Compliance Hardening.

Note that you’ll only see the “Compliance Hardening” section if your Organization has been configured for PCI DSS compliance. You’ll need to finish your HCX migrations before your Organization is compliant. You’ll also need to change your NSX configuration (Network & Security Tab Access). There is some more info on that here and there’s a blog post that also runs through it step by step that you can read here. Note that you’ll need to use the API to change the local NSX Manager user password every 90 days. Information on that can be found here.

Other Considerations

One final thing to note is that this process doesn’t automatically make your Virtual Machines PCI compliant. You’ll still need to ensure that you’ve done the work in that respect. And I can’t repeat this enough – your Organization will only pass a PCI audit if you’ve done these additional steps. Merely requesting that VMware enable this at an Organization level won’t be enough.

Random Short Take #69

Welcome to Random Short Take #69. Let’s get random.

VMware Cloud on AWS – A Few Notes

If you’ve been following along at home, you may have noticed that the blog has been a little quiet recently. There were a few reasons for that, but the main one was that I joined VMware this year as a Cloud Solutions Architect focussed on VMware Cloud on AWS. It’s an interesting role, and an interesting place to work. I’ve been busy onboarding and thought I’d share some brief notes on VMware Cloud on AWS. I still intend to talk about other things on this blog too, but figured this has been front of mind for me recently, and it might be useful to someone looking to find out more. If you have any questions, or want to know more about something, I’m happy to help where I can. And it doesn’t need to be a sales call.

 

Overview

In short, VMware Cloud on AWS is “an integrated cloud offering jointly developed by Amazon Web Services (AWS) and VMware.” The idea is that you run VMware’s SDDC stack on AWS bare metal hosts and enjoy the best of both worlds – VMware’s software and access to a broad range of AWS services. I won’t be covering too much of the basics here, but you can read more about it on the product website. I do recommend checking out the product walkthroughs, as these are a great way to get familiar with how the product behaves. Once you’ve done that, you should also check out the solutions index – it’s a great collection of information about various things that run on VMware Cloud on AWS, including things like SQL performance, DNS configuration, and stuff like that. Once you’ve got a handle on the platform and some of the things it can do, it’s also worth running through the Evaluation Guide. This will give you the opportunity to perform a self-guided evaluation of the platform’s features and functionality. There’s also a pretty comprehensive FAQ that you can find here.

 

Hardware

Node Types

There are 2 types of nodes available at this time: i3.metal and i3en.metal. The storage for nodes is provided by VMware vSAN.

i3.metal i3en.metal
Intel Xeon Broadwell @ 2.3GHz, 36 Cores (Hyper-Threading Disabled) Intel Xeon Cascade Lake @ 2.5GHz, 48 Cores (Hyper-Threading enabled providing 96 Cores)
512 GiB RAM 768 GiB RAM
10 TiB NVMe (RAW) 45 TiB NVMe (RAW)
High IOPS High IOPS, High Bandwidth

Custom Core Counts

One of the neat things is support for custom core counts on a per-cluster basis. You still pay full price for the hosts, but the idea is that your core licensing for BigDBVendor, or whatever, is under control. Note that you can’t change this core count once your hosts are deployed.

Other Cool Features 

Elastic DRS lets you expand your SDDC as required, based on configured thresholds for CPU, RAM, and storage. You can read more about that here.

 

Configuration Backups

If you’re using HCX, you might want to back up your HCX Manager. You can read more on that here. There’s also a VMware Fling that provides a level of SDDC import / export capability. You can check that out here. (Hat tip to my colleague Michael for telling me about these).

 

Sizing It Up

If you’re curious about what your current on-premises estate might look like from a sizing perspective, you can run it through the online sizing tool. This has a variety of input options, including support for RVTools imports. It’s fairly easy to use,  but for complex scenarios I’d always recommend you get VMware or a partner involved. Pricing for the platform is also publicly available, and you can check that out here. There are a few different ways to consume the platform, including 1-year, 3-year, and on-demand options, and the discounting levels vary according to the commitment.

Note that there are a number of other capabilities sold separately, including:

  • VMware Site Recovery
  • VMware Cloud Disaster Recovery
  • VMware NSX Advanced Firewall
  • VMware vRealize Automation Cloud
  • VMware vRealize Operations Cloud
  • VMware vRealize Log Insight Cloud
  • VMware vRealize Network Insight Cloud
  • VMware Tanzu Standard

 

Lifecycle

One of the things I like about VMware Cloud on AWS is that the release notes for the platform are publicly available, and provide a great summary of new features as they get rolled out to customers.

 

What Now?

I’ve barely scratched the surface of what I’d like to talk about with VMware Cloud on AWS, and I hope in the future to post articles on some of the stuff that gets me excited, like migration options with HCX, and using VMware Cloud Disaster Recovery. In the meantime, the team (it’s mainly Greg doing the hard work, if I’m being honest) is running a series of webinars next week. If you’re interested in VMware Cloud on AWS and want to know more, you could do worse than checking these out. Details below, and registration is here.

Design and Deploy a VMware Cloud on AWS SDDC
28 February 2022, Monday
9:30am IST | 12:00pm SGT | 1:00pm KST | 3:00pm AEDT
Join us as we walk through the process of Architecting and Deploying a VMware Cloud on AWS SDDC. We will cover: SDDC sizing for an application, sizing of the management CIDR block, connectivity design, VPN vs direct connect, basic networking and dependencies
Application Migration to VMC on AWS

1 March 2022, Tuesday
9:30am IST | 12:00pm SGT | 1:00pm KST | 3:00pm AEDT
In this session we will demonstrate the process of migrating a live application. Topics include: walk through the HCX architecture, HCX deployment process, HCX configuration, extending an L2 network, mobility (location) aware networking, migration types – conversation
Disaster Recovery – Protecting VMC on AWS or On-Prem Based Applications

2 March 2022, Wednesday 
9:30am IST | 12:00pm SGT | 1:00pm KST | 3:00pm AEDT
Listen to experts demonstrate the process of Architecting and Deploying a VMware Cloud Disaster Recovery (VCDR), with VMC on AWS to protect an application. We will cover: walk through the VCDR architecture, VCDR deployment process, considerations around VCDR, building a protection group, building a DR plan, executing DR and discuss failback options

Random Short Take #67

Welcome to Random Short Take #67. Let’s get random.

  • MinIO was in the news recently, and this article from Chin-Fah seems to summarise nicely what you need to know.
  • Whenever I read articles about home Internet connectivity, I generally chuckle in Australian and move on. But this article from Jeff Geerling on his experience with Starlink makes for interesting reading, if only for the somewhat salty comments people felt the need to leave after the article was published. He nonetheless brings up some great points about challenges with the service, and I think the endless fawning over Musk as some kind of tech saviour needs to stop.
  • In the “just because you can, doesn’t mean you should” category is this article from William Lam, outlining how to create a VMFS datastore on a USB device. It’s unsupported, but it strikes me that this is just the kind of crazy thing that might be useful to folks trying to move around VMs at the edge.
  • Karen Lopez is a really smart person, and this article over at Gestalt IT is more than just the “data is the new oil” schtick we’ve been hearing for the past few years.
  • Speaking of Pure Storage, Kyndryl and Pure Storage have announced a global alliance. You can read more on that here.
  • Mike Preston wrote a brief explainer on S3 Object Lock here. I really enjoy Mike’s articles, as I find he has a knack for breaking down complex topics into very simple to digest and consume pieces.
  • Remember when the movies and TV shows you watched had consistent aspect ratios? This article from Tom Andry talks about how that’s changed quite a bit in the last few years.
  • I’m still pretty fresh in my role, but in the future I hope to be sharing more news and articles about VMware Cloud on AWS. In the meantime, check out this article from Greg Vinton, where he covers some of his favourite parts of what’s new in the platform.

In unrelated news, this is the last week to vote for the #ITBlogAwards. You can cast your vote here.

Random Short Take #66

Happy New Year. Let’s get random.

  • Excited about VMware Cloud Director releases? Me too. 10.3.2 GA was recently announced, and you can read more about that here.
  • Speaking of Cloud Director, Al Rasheed put together this great post on deploying VCD 10.3.x – you can check it out here
  • Getting started with VMware Cloud on AWS but feeling a bit confused by some of the AWS terminology? Me too. Check out this extremely useful post on Amazon VPCs from a VMware perspective.
  • Still on VMware Cloud on AWS. So you need some help with HCX? My colleague Greg put together this excellent guide a little while ago – highly recommended. This margarita recipe is also highly recommended, if you’re into that kind of thing. 
  • Speaking of hyperscalers, Mellor put together a nice overview of Hyve Solutions here
  • Detecting audio problems in your home theatre? Are you though? Tom Andry breaks down what you should be looking for here.  
  • Working with NSX-T and needing to delete route advertisement filters via API? Say no more
  • Lost the password you set on that Raspbian install? Frederic has you covered

Datrium Announces CloudShift

I recently had the opportunity to speak to Datrium‘s Brian Biles and Craig Nunes about their CloudShift announcement and thought it was worth covering some of the highlights here.

 

DVX Now

Datrium have had a scalable protection tier and focus on performance since their inception.

[image courtesy of Datrium]

The “mobility tier”, in the form of Cloud DVX, has been around for a little while now. It’s simple to consume (via SaaS), yields decent deduplication results, and the Datrium team tells me it also delivers fast RTO. There’s also solid support for moving data between DCs with the DVX platform. This all sounds like the foundation for something happening in the hybrid space, right?

 

And Into The Future

Datrium pointed out that disaster recovery has traditionally been a good way of finding out where a lot of the problems exist in you data centre. There’s nothing like failing a failover to understand where the integration points in your on-premises infrastructure are lacking. Disaster recovery needs to be a seamless, integrated process, but data centres are still built on various silos of technology. People are still using clouds for a variety of reasons, and some clouds do some things better than others. It’s easy to pick and choose what you need to get things done. This has been one of the big advantages of public cloud and a large reason for its success. As a result of this, however, the silos are moving to the cloud, even as they’re fixed in the DC.

As a result of this, Datrium are looking to develop a solution that delivers on the following theme: “Run. Protect. Any Cloud”. The idea is simple, offering up an orchestrated DR offering that makes failover and failback a painless undertaking. Datrium tell me they’ve been a big supporter of VMware’s SRM product, but have observed that there can be problems with VMware offering an orchestration-only layer, with adapters having issues from time to time, and managing the solution can be complicated. With CloudShift, Datrium are taking a vertical stack approach, positioning CloudShift as an orchestrator for DR as a SaaS offering. Note that it only works with Datrium.

[image courtesy of Datrium]

The idea behind CloudShift is pretty neat. With Cloud DVX you can already backup VMs to AWS using S3 and EC2. The idea is that you can leverage data already in AWS to fire up VMs on AWS (using on-demand instances of VMware Cloud on AWS) to provide temporary disaster recovery capability. The good thing about this is that converting your VMware VMs to someone else’s cloud is no longer a problem you need to resolve. You’ll need to have a relationship with AWS in the first place – it won’t be as simple as entering your credit card details and firing up an instance. But it certainly seems a lot simpler than having an existing infrastructure in place, and dealing with the conversion problems inherent in going from vSphere to KVM and other virtualisation platforms.

[image courtesy of Datrium]

Failover and failback is a fairly straightforward process as well, with the following steps required for failover and failback of workloads:

  1. Backup to Cloud DVX / S3 – This is ongoing and happens in the background;
  2. Failover required – the CloudShift runbook is initiated;
  3. Restart VM groups on VMC – VMs are rehydrated from data in S3; and
  4. Failback to on-premises – CloudShift reverses the process with deltas using change block tracking.

It’s being pitched as a very simple way to run DR, something that has been notorious for being a stressful activity in the past.

 

Thoughts and Further Reading

CloudShift is targeted for release in the first half of 2019. The economic power of DRaaS in the cloud is very strong. People love the idea that they can access the facility on-demand, rather than having passive infrastructure doing nothing on the off chance that it will be required. There’s obviously some additional cost when you need to use on demand versus reserved resources, but this is still potentially cheaper than standing up and maintaining your own secondary DC presence.

Datrium are focused on keeping inherently complex activities like DR simple. I’ll be curious to see whether they’re successful with this approach. The great thing about something like a generic orchestration framework like VMware SRM is that you can use a number of different vendors in the data centre and not have a huge problem with interoperability. The downside to this approach is that this broader ecosystem can leave you exposed to problems with individual components in the solution. Datrium is taking a punt that their customers are going to see the advantages of having an integrated approach to leveraging on demand services. I’m constantly astonished that people don’t get more excited about DRaaS offerings. It’s really cool that you can get this level of protection without having to invest a tonne in running your own passive infrastructure. If you’d like to read more about CloudShift, there’s a blog post that sheds some more light on the solution on Datrium’s site, and you can grab a white paper here too.