Random Short Take #86

Welcome to Random Short Take #86. It’s been a while, and I’ve been travelling a bit for work. So let’s get random.

  • Let’s get started with three things / people I like: Gestalt IT, Justin Warren, and Pure Storage. This article by Justin digs into some of the innovation we’re seeing from Pure Storage. Speaking of Justin, if you don’t subscribe to his newsletter “The Crux”, you should. I do. Subscribe here.
  • And speaking of Pure Storage, a survey was conducted and results were had. You can read more on that here.
  • Switching gears slightly (but still with a storage focus), check out the latest Backblaze drive stats report here.
  • Oh you like that storage stuff? What about this article on file synchronisation and security from Chin-Fah?
  • More storage? What about this review of the vSAN Objects Viewer from Victor?
  • I’ve dabbled in product management previously, but this article from Frances does a much better job of describing what it’s really like.
  • Edge means different things to different people, and I found this article from Ben Young to be an excellent intro to the topic.
  • You know I hate Netflix but love its tech blog. Check out this article on migrating critical traffic at scale.

Bonus round. I’m in the Bay Area briefly next week. If you’re around, let me know! Maybe we can watch one of the NBA Finals games.

VMware Cloud on AWS – TMCHAM – Part 10 – Cluster Conversion

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into the topic of cluster conversions on the VMware-managed VMware Cloud on AWS platform.

 

Background

With the end of sale announcement of the I3.metal node type in VMware Cloud on AWS, I’ve had a few customers ask about how the cluster conversion process works. We’ve previously offered the ability to convert nodes from I3.metal to I3en.metal, and we’ve taken that process and made it possible for the I4i.metal node type as well. The process is outlined in some detail here. From a technical perspective, you’ll need to be on SDDC version 1.18v8 or 1.20v2 at a minimum. From a commercial perspective, to use your existing subscriptions, they’ll need to be flexible, or you can choose to add new subscriptions. Your account team can help with that.

 

Sounds Easy, What’s the Catch?

I’ve had a few customers run through this process now in my part of the world, and more and more folks are converting across to I4i.metal every week. One of the key considerations when planning the conversion, particularly with smaller environments, is sizing and storage policies. When the team converts your cluster, they will do some sizing estimates prior to the activity, and the results of this sizing might be higher than you’d expect. For example, we talk about the I4i.metal being something in the order of 1.6 – 2 times as powerful as the I3.metal node. But this really depends on a variety of factors, including the vSAN RAID policy in use, the types of workloads running on the cluster, and so forth. I’ve seen scenarios where a customer has wanted to convert a 6-node I3.metal cluster to 4 I4i.metal nodes. From a calculated capacity perspective, this should be a no-brainer. But what you’ll find, when working with the conversion team, is that they will likely come back to you saying that 6 nodes will be the target. The reason for this is that they’re assuming your cluster is running RAID 6.

How do you solve this problem? Think about the vSAN policy you want to run moving forward. If you’re happy to drop to RAID 5, for example, you have a way forward. Once the cluster conversion is complete, jump on and change the default policy to RAID 5 / FTT:1. This will cause vSAN to modify the policy for all of the VMs on the cluster. This is a background process, and won’t interfere with normal operations. Once you’ve done that, you can then remove the additional nodes. It’s a little fiddly, and will require some amount of coordination with the conversion team and your account team, but it’s a fairly simple task, and will get you running on new shiny boxes without having to muck about with setting up another cluster (or SDDC) and manually migrating workloads across.

You’ll want to ensure that changing your RAID policy won’t have an impact on your available storage. Every workload is different, but at a high level, you can use the public sizer to work through some of these numbers. A 16-node I3.metal cluster with RAID 6 configured will give you roughly 165.89 TiB of useable capacity (ignoring management workload overheads and vSAN slack space), and a similar storage footprint can be had with a 8 or 9-node cluster of I4i.metal nodes. You’ll also want to be sure your organisation is comfortable with the vSAN policy you’re moving to. If you’re moving from 16 nodes to 8 or 9 nodes, for example, this isn’t really a problem, as you’ll likely be sticking with RAID 6 for clusters that large. But if you’re going from 6 nodes to 3 nodes, you’re going from RAID 6 to RAID 1.

 

Thoughts and Further Reading

The neat thing about the VMware Cloud on AWS offering is that it’s a managed service from VMware, and we do a good job of managing boring stuff like this for you, reducing the impact of software and hardware changes by leveraging core VMware technologies that aren’t otherwise available on native cloud platforms. If you’d like to read more about the I4i.metal node – check out our FAQ here.

VMware Cloud on AWS – TMCHAM – Part 9 – Elastic DRS Policy Changes

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around recent(ish) changes to Elastic DRS policies and capacity on the VMware-managed VMware Cloud on AWS platform.

I’ve had a few customers ask about changes VMware has made to Elastic DRS policies on VMware Cloud on AWS. I’ve talked a little about eDRS previously, and the release notes cover the changes here (go to March 27th, 2023). In short the changes are as follows:

  • Elastic DRS optimize for rapid scaling policy now supports rapid scaling-in to enable faster scaling use cases like  VDI, disaster recovery or any other business needs.
  • The Elastic DRS Cost Policy improvement will allow automated scale-in of a cluster if the storage utilization falls below 40% instead of the current 20% limit.

What does it mean from a practical perspective? Not a lot for customers using the default baseline policy. But if you’re using “Optimize for Lower Cost” or “Rapid Scaling”, it might be worth looking into.

 

Huh?

Optimize for Lowest Cost

The documentation does a great job of describing how this works: “When scaling in, this policy removes hosts quickly to maintain baseline performance while keeping host counts to a practical minimum. It removes hosts only if it anticipates that storage utilization would not result in a scale out in the near term after host removal”. It has the following thresholds:

Old High Old Low New High New Low
CPU 90% 60% 90% 60%
Memory 80% 60% 80% 60%
Storage 70% 20% 80% (this changed a while ago) 40%

You’ll see that the new low has 40% as the threshold for storage now (I added in the change from 70 – 80% as well, but this was done a while ago). Generally speaking, the algorithm is designed not to do silly things, but we’ve added in this number to enable customers to scale in workloads sooner, helping to reduce the cost of scaling events.

Rapid Scaling

From the documentation: “[t]his policy adds multiple hosts at a time when needed for memory or CPU, and adds hosts incrementally when needed for storage. By default, hosts are added four at a time. You can specify a larger scale-out increment (8 or 12) if you need faster scaling for disaster recovery, Virtual Desktop Infrastructure (VDI), and similar use cases. As with any EDRS policy, scale-out time increases with increment size. When the increment is large (12 hosts), it can take up to 40 minutes to complete in some configurations.

When scaling in, this policy removes hosts rapidly, maintaining baseline performance while keeping host count to a practical minimum. It does not remove hosts if it anticipates that doing so would degrade performance and force a near-term scale-out. Scale-in stops when the cluster reaches the minimum host count or the number of hosts in the scale-out increment has been removed”. This policy has the following thresholds:

Old High Old Low New High New Low
CPU 80% 0% 80% 50%
Memory 80% 0% 80% 50%
Storage 70% 0% 80% 40%

What does that mean? We’ve added in some guardrails for rapid scale-in to ensure that things don’t get too hectic too quickly. And on the flip side, it means that you’ll scale out your environment faster as well. Again, this is useful for bursty workloads such as VDI or, potentially, rapid DR.

 

Thoughts

Elastic DRS is one of the cooler features of VMware Cloud on AWS. You can do some really interesting things from a scaling perspective, particularly if you’re operating with some volatile / bursty workloads. That said, if you only use the default baseline policy you’ll also likely be in a good spot, as the thing that can really hurt in these kinds of environments is when your hosts run short of storage.

Random Short Take #85

Welcome to Random Short Take #85. Let’s get random.

Random Short Take #84

Welcome to Random Short Take #84. There’s a bit going on, so let’s get random.

Updated Articles Page

I recently had the opportunity to run through a VMware Cloud on Disaster Recovery deployment with a customer and thought I’d run through the basics. It’s important to note that there a variety of topologies supported with VCDR, and many things that need to be considered before you click deploy, and this is just one way of doing it. In any case, there’s a new document outlining the process on the articles page.

VMUG UserCon(s) 2023

I’ve been a bit slack and neglected to post this sooner, but the Sydney and Melbourne VMUG UserCon events are coming up in less than a month. If you’re unfamiliar with UserCon, it’s an event put on by VMUG where you can:

  • Participate in technical deep dives led by a variety of industry experts;
  • Put your skills to the test during breakout sessions and hands-on labs;
  • Make meaningful connections with like-minded IT professionals;
  • Learn about the latest products and solutions from trusted VMUG partners;
  • Win Cool SWAG & prizes;

And a whole bunch more. The good news is that Chris McCain is doing the morning and closing keynote at both events, and there’s a heavy focus on security. In addition to that, there are some great speakers presenting from both VMware and the community. If you’re heading to the Sydney event, I recommend getting along to hear my friend Tony Williamson talking about real world service provider networking leveraging NSX.

The Melbourne event is being held on March 14, 2023 at Crown Melbourne (map), while the Sydney event is being held on March 16, 2023 at the Sofitel (map). You can view the full agenda and register for the Melbourne event here, and details of the Sydney event can be found here. If you have the time and are in the area, I heartily recommend registering and attending – these promise to be excellent events.

VMware – vExpert 2023

I’m very happy to have been listed as a vExpert for 2023. This is the eleventh time that they’ve forgotten to remove my name from the list (I’m like Rick Astley with that joke). You can read more about it here, and more news about this year’s programme is coming shortly. Thanks again to Corey Romero and the rest of the VMware Social Media & Community Team for making this kind of thing happen. And thanks also to the vExpert community for being such a great community to be part of. Congratulations to you (whether this is your first or thirteenth time). It looks like there are around 1400 folks from all parts of the globe. I think that’s pretty cool.

Brisbane VMUG – New Year VMUG Beers

Now that holiday season is over, Brisbane VMUG would like to say thank you to its Community and Sponsors, who supported them as they got back in to in-person meetings last year. They’ve secured The Terrace at QUT from 2pm until 5pm on Friday 17th February, and would like to invite you to join them for some drinks, nibbles and networking.

There will be some prize giveaways and an opportunity to chill out and mingle with like-minded people from the vCommunity.

Please register here!

Random Short Take #83

Welcome to Random Short Take #83. Quite a few press releases in this one, so let’s get random.