In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around managing host additions and failures on the VMware-managed VMware Cloud on AWS platform.
Elastic DRS
One of the questions I frequently get asked by customers is what happens when you reach a certain capacity in your VMware Cloud on AWS cluster? The good news is we have a feature called Elastic DRS that can take care of that for you. Elastic DRS is a little different to what you might know as the vSphere Distributed Resource Scheduler (DRS). Elastic DRS operates at a host level and takes care of capacity constraints in your VMC environment. The idea is that, when your cluster reaches a certain resource threshold (be it storage, vCPU, or RAM), Elastic DRS takes care of adding in additional host resources as required.
The algorithm runs every 5 minutes and uses the following parameters:
- Minimum and maximum number of hosts the algorithm should scale up or down to.
- Thresholds for CPU, memory and storage utilisation such that host allocation is optimized for cost or performance.
Note also that your cluster may scale back in, assuming the resources stay consistently below the threshold for a number of iterations.
Settings
There are a few different options for Elastic DRS, with the default being the “Elastic DRS Baseline Policy”. With this policy, a host is automatically added when there’s less than 20% free vSAN storage. Note that this doesn’t apply to single-node SDDC configurations, and only the baseline policy is available with 2-node configurations. Beyond those limitations, though, there are a number of other configurations available and these are outlined here. The neat thing is that there’s some amount of flexibility in how you have your SDDC automatically managed, with options for best performance, lowest cost, or rapid scale-out also available.
Can I Turn It Off?
No, but you can fiddle with the settings from your VMC cloud console.
Other Questions
What happens if I’m adding a host manually? The Elastic DRS recommendations are ignored. Same goes with planned maintenance or SDDC maintenance, where the support team may be adding in an additional host. But what if you’ve lost a host? The auto-remediation process kicks in and the Elastic DRS recommendations are ignored while the failed host is being replaced. You can read more about that process here.
Thoughts
One of the things I like about the VMware Cloud on AWS approach is that VMware has looked into a number of common scenarios that occur in the wild (hosts running out of capacity, for example) and built some automation on top of an already streamlined SDDC stack. Elastic DRS and the Auto-Scaler features seem like minor things, but when you’re managing an SDDC of any significant scale, it’s nice to have the little things taken care of.