VMware Cloud on AWS – TMCHAM – Part 8 – TRIM/UNMAP

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around TRIM/UNMAP and capacity reclamation on the VMware-managed VMware Cloud on AWS platform.

 

Why TRIM/UNMAP?

TRIM/UNMAP, in short, is the capability for operating systems to reclaim no longer used space on thin-provisioned filesystems. Why is this important? Imagine you have a thin-provisioned volume that has 100GB of capacity allocated to it. It consumes maybe 1GB when it’s first deployed. You then add 50GB of data to it. You then delete 50GB of data from the volume. You’ll still see 51GB of capacity being consumed on the filesystem. This is because older operating systems just mark the blocks as deleted, but don’t zero them out. Modern operating systems do support TRIM/UNMAP though, but the hypervisor needs to understand the commands being sent to it. You can read more on that here.

How I Do This For VMware Cloud on AWS?

You can contact your account team, and we raise a ticket to get the feature enabled. We had some minor issues recently that meant we weren’t enabling the feature, but if you’re running M16v12 or M18v5 (or above) on your SDDCs, you should be good to go. Note that this feature is enabled on a per-cluster basis, and you need to reboot the VMs in the cluster for it to take effect.

What About Migrating With HCX?

Do the VMs come across thin? Do you need to reclaim space first? If you’re using HCX to go from thick to thin, you should be fine. If you’re migrating thin to thin, it’s worth checking whether you’ve got any space reclamation in place on your source side. I’ve had customers report back that some environments have migrated across with higher than expected storage usage due to a lack of space reclamation happening on the source storage environment. You can use something like Live Optics to report on your capacity consumed vs allocated, and how much capacity can be reclaimed.

Why Isn’t This Enabled By Default?

I don’t know for sure, but I imagine it has something to do with the fact that TRIM/UNMAP has the potential to have a performance impact from a latency perspective, depending on the workloads running in the environment, and the amount of capacity being reclaimed at any given time. We recommend that you “schedule large space reclamation jobs during off-peak hours to reduce any potential impact”. Given that VMware Cloud on AWS is a fully-managed service, I imagine we want to control as many of the performance variables as possible to ensure our customers enjoy a reliable and stable platform. That said, TRIM/UNMAP is a really useful feature, and you should look at getting it enabled if you’re concerned about the potential for wasted capacity in your SDDC.

Random Short Take #80

Welcome to Random Short Take #80. Lots of press release news this week and some parochial book recommendations. Let’s get random.

Random Short Take #79

Welcome to Random Short Take #79. Where did October go? Let’s get random.

Random Short Take #78

Welcome to Random Short Take #78. We’re hurtling towards the silly season. Let’s get random.

VMware Cloud on AWS – I4i.metal – A Few Notes …

At VMware Explore 2022 in the US, VMware announced a number of new offerings for VMware Cloud on AWS, including a new bare-metal instance type: the I4i.metal. You can read the official blog post here. I thought it would be useful to provide some high-level details and cover some of the caveats that punters should be aware of.

 

By The Numbers

What do you get from a specifications perspective?
  • The CPU is 3rd generation Intel Xeon Ice Lake @ 2.4GHz / Turbo 3.5GHz
  • 64 physical cores, supporting 128 logical cores with Hyper Threading (HT)
  • 1024 GiB memory
  • 30 TiB NVMe (Raw local capacity)
  • Up to 75 Gbps networking speed
So, how does the I4i.metal compare with the i3.metal? You get roughly 2x compute, storage, and memory, with improved network speed as well.
FAQ Highlights
Can I use custom core counts? Yep, the I4i will support physical custom core counts of 8, 16, 24, 30, 36, 48, 64.
Is there stretched cluster support? Yes, you can deploy these in stretched clusters (of the same host type).
Can I do in-cluster conversions? Yes, read more about that here.
Other Considerations
Why does the sizer say 20 TiB useable for the I4i? Around 7 TiB is consumed by the cache tier at the moment, so you’ll see different numbers in the sizer. And your useable storage numbers will obviously be impacted by the usual constraints around failures to tolerate (FTT) and RAID settings.
Region Support?
The I4i.metal instances will be available in the following Regions (and Availability Zones):
  • US East (N. Virginia) – use1-az1, use1-az2, use1-az4, use1-az5, use1-az6
  • US West (Oregon) – usw2-az1, usw2-az2, usw2-az3, usw2-az4
  • US West (N. California) – usw1-az1, usw1-az3
  • US East (Ohio) – use2-az1, use2-az2, use2-az3
  • Canada (Central) – cac1-az1, cac1-az2
  • Europe (Ireland) – euw1-az1, euw1-az2, euw1-az3
  • Europe (London) – euw2-az1, euw2-az2, euw2-az3
  • Europe (Frankfurt) – euc1-az1, euc1-az2, euc1-az3
  • Europe (Paris) –  euw3-az1, euw3-az2, euw3-az3
  • Asia Pacific (Singapore) – apse1-az1, apse1-az2, apse1-az3
  • Asia Pacific (Sydney) – apse2-az1, apse2-az2, apse2-az3
  • Asia Pacific (Tokyo) – apne1-az1, apne1-az2, apne1-az4

Other Regions will have availability over the coming months.

 

Thoughts

The i3.metal isn’t going anywhere, but it’s nice to have an option that supports more cores and it a bit more storage and RAM. The I4i.metal is great for SQL workloads and VDI deployments where core count can really make a difference. Coupled with the addition of supplemental storage via VMware Cloud Flex Storage and Amazon FSx for NetApp ONTAP, there are some great options available to deal with the variety of workloads customers are looking to deploy on VMware Cloud on AWS.

On another note, if you want to hear more about all the cloudy news from VMware Explore US, I’ll be presenting at the Brisbane VMUG meeting on October 12th, and my colleague Ray will be doing something in Sydney on October 19th. If you’re in the area, come along.

Random Short Take #77

Welcome to Random Short Take #77. Spring has sprung. Let’s get random.

Finally, the blog turned 15 years old recently (about a month ago). I’ve been so busy with the day job that I forgot to appropriately mark the occasion. But I thought we should do something. So if you’d like some stickers (I have some small ones for laptops, and some big ones because I can’t measure things properly), send me your address via this contact form and I’ll send you something as a thank you for reading along.

Random Short Take #76

Welcome to Random Short Take #76. Summer’s almost here. Let’s get random.

 

Random Short Take #75

Welcome to Random Short Take #75. Half the year has passed us by already. Let’s get random.

  • I talk about GiB all the time when sizing up VMware Cloud on AWS for customers, but I should take the time to check in with folks if they know what I’m blithering on about. If you don’t know, this explainer from my friend Vincent is easy to follow along with – A little bit about Gigabyte (GB) and Gibibyte (GiB) in computer storage.
  • MinIO has been in the news a bit recently, but this article from my friend Chin-Fah is much more interesting than all of that drama – Beyond the WORM with MinIO object storage.
  • Jeff Geerling seems to do a lot of projects that I either can’t afford to do, or don’t have the time to do. Either way, thanks Jeff. This latest one – Building a fast all-SSD NAS (on a budget) – looked like fun.
  • You like ransomware? What if I told you you can have it cross-platform? Excited yet? Read Melissa’s article on Multiplatform Ransomware for a more thorough view of what’s going on out there.
  • Speaking of storage and clouds, Chris M. Evans recently published a series of videos over at Architecting IT where he talks to NetApp’s Matt Watt about the company’s hybrid cloud strategy. You can see it here.
  • Speaking of traditional infrastructure companies doing things with hyperscalers, here’s the July 2022 edition of What’s New in VMware Cloud on AWS.
  • In press release news, Aparavi and Backblaze have joined forces. You can read more about that here.
  • I’ve spent a lot of money over the years trying to find the perfect media streaming device for home. I currently favour the Apple TV 4K, but only because my Boxee Box can’t keep up with more modern codecs. This article on the Best Device for Streaming for Any User – 2022 seems to line up well with my experiences to date, although I admit I haven’t tried the NVIDIA device yet. I do miss playing ISOs over the network with the HD Mediabox 100, but those were simpler times I guess.

VMware Cloud on AWS – TMCHAM – Part 7 – Elastic DRS and Host Failure Remediation

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around managing host additions and failures on the VMware-managed VMware Cloud on AWS platform.

Elastic DRS

One of the questions I frequently get asked by customers is what happens when you reach a certain capacity in your VMware Cloud on AWS cluster? The good news is we have a feature called Elastic DRS that can take care of that for you. Elastic DRS is a little different to what you might know as the vSphere Distributed Resource Scheduler (DRS). Elastic DRS operates at a host level and takes care of capacity constraints in your VMC environment. The idea is that, when your cluster reaches a certain resource threshold (be it storage, vCPU, or RAM), Elastic DRS takes care of adding in additional host resources as required. 

The algorithm runs every 5 minutes and uses the following parameters:

  • Minimum and maximum number of hosts the algorithm should scale up or down to.
  • Thresholds for CPU, memory and storage utilisation such that host allocation is optimized for cost or performance.

Note also that your cluster may scale back in, assuming the resources stay consistently below the threshold for a number of iterations.

Settings

There are a few different options for Elastic DRS, with the default being the “Elastic DRS Baseline Policy”. With this policy, a host is automatically added when there’s less than 20% free vSAN storage. Note that this doesn’t apply to single-node SDDC configurations, and only the baseline policy is available with 2-node configurations. Beyond those limitations, though, there are a number of other configurations available and these are outlined here. The neat thing is that there’s some amount of flexibility in how you have your SDDC automatically managed, with options for best performance, lowest cost, or rapid scale-out also available.

Can I Turn It Off?

No, but you can fiddle with the settings from your VMC cloud console.

Other Questions

What happens if I’m adding a host manually? The Elastic DRS recommendations are ignored. Same goes with planned maintenance or SDDC maintenance, where the support team may be adding in an additional host. But what if you’ve lost a host? The auto-remediation process kicks in and the Elastic DRS recommendations are ignored while the failed host is being replaced. You can read more about that process here.

 

Thoughts

One of the things I like about the VMware Cloud on AWS approach is that VMware has looked into a number of common scenarios that occur in the wild (hosts running out of capacity, for example) and built some automation on top of an already streamlined SDDC stack. Elastic DRS and the Auto-Scaler features seem like minor things, but when you’re managing an SDDC of any significant scale, it’s nice to have the little things taken care of.

Random Short Take #74

Welcome to Random Short Take #74. Let’s get random.