I miss Tru64, and Solaris for that matter. I don’t miss HP-UX. And I definitely won’t miss AIX. Read about the death of Unix over at El Reg – Unix is dead. Long live Unix!
The I3.metal is going away very soon. Remember this is from a sales perspective, VMware is still supporting the I3.metal in the wild, and you’ll still have access to deploy on-demand if required (up to a point).
In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around TRIM/UNMAP and capacity reclamation on the VMware-managed VMware Cloud on AWS platform.
TRIM/UNMAP, in short, is the capability for operating systems to reclaim no longer used space on thin-provisioned filesystems. Why is this important? Imagine you have a thin-provisioned volume that has 100GB of capacity allocated to it. It consumes maybe 1GB when it’s first deployed. You then add 50GB of data to it. You then delete 50GB of data from the volume. You’ll still see 51GB of capacity being consumed on the filesystem. This is because older operating systems just mark the blocks as deleted, but don’t zero them out. Modern operating systems do support TRIM/UNMAP though, but the hypervisor needs to understand the commands being sent to it. You can read more on that here.
Do the VMs come across thin? Do you need to reclaim space first? If you’re using HCX to go from thick to thin, you should be fine. If you’re migrating thin to thin, it’s worth checking whether you’ve got any space reclamation in place on your source side. I’ve had customers report back that some environments have migrated across with higher than expected storage usage due to a lack of space reclamation happening on the source storage environment. You can use something like Live Optics to report on your capacity consumed vs allocated, and how much capacity can be reclaimed.
Why Isn’t This Enabled By Default?
I don’t know for sure, but I imagine it has something to do with the fact that TRIM/UNMAP has the potential to have a performance impact from a latency perspective, depending on the workloads running in the environment, and the amount of capacity being reclaimed at any given time. We recommend that you “schedule large space reclamation jobs during off-peak hours to reduce any potential impact”. Given that VMware Cloud on AWS is a fully-managed service, I imagine we want to control as many of the performance variables as possible to ensure our customers enjoy a reliable and stable platform. That said, TRIM/UNMAP is a really useful feature, and you should look at getting it enabled if you’re concerned about the potential for wasted capacity in your SDDC.
At VMware Explore 2022 in the US, VMware announced a number of new offerings for VMware Cloud on AWS, including a new bare-metal instance type: the I4i.metal. You can read the official blog post here. I thought it would be useful to provide some high-level details and cover some of the caveats that punters should be aware of.
By The Numbers
What do you get from a specifications perspective?
Can I use custom core counts? Yep, the I4i will support physical custom core counts of 8, 16, 24, 30, 36, 48, 64.
Is there stretched cluster support? Yes, you can deploy these in stretched clusters (of the same host type).
Can I do in-cluster conversions? Yes, read more about that here.
Why does the sizer say 20 TiB useable for the I4i? Around 7 TiB is consumed by the cache tier at the moment, so you’ll see different numbers in the sizer. And your useable storage numbers will obviously be impacted by the usual constraints around failures to tolerate (FTT) and RAID settings.
The I4i.metal instances will be available in the following Regions (and Availability Zones):
US East (N. Virginia) – use1-az1, use1-az2, use1-az4, use1-az5, use1-az6
US West (Oregon) – usw2-az1, usw2-az2, usw2-az3, usw2-az4
US West (N. California) – usw1-az1, usw1-az3
US East (Ohio) – use2-az1, use2-az2, use2-az3
Canada (Central) – cac1-az1, cac1-az2
Europe (Ireland) – euw1-az1, euw1-az2, euw1-az3
Europe (London) – euw2-az1, euw2-az2, euw2-az3
Europe (Frankfurt) – euc1-az1, euc1-az2, euc1-az3
Europe (Paris) – euw3-az1, euw3-az2, euw3-az3
Asia Pacific (Singapore) – apse1-az1, apse1-az2, apse1-az3
Asia Pacific (Sydney) – apse2-az1, apse2-az2, apse2-az3
Asia Pacific (Tokyo) – apne1-az1, apne1-az2, apne1-az4
Other Regions will have availability over the coming months.
The i3.metal isn’t going anywhere, but it’s nice to have an option that supports more cores and it a bit more storage and RAM. The I4i.metal is great for SQL workloads and VDI deployments where core count can really make a difference. Coupled with the addition of supplemental storage via VMware Cloud Flex Storage and Amazon FSx for NetApp ONTAP, there are some great options available to deal with the variety of workloads customers are looking to deploy on VMware Cloud on AWS.
The October 2022 edition of the Brisbane VMUG meeting will be held on Wednesday 12th October at the Cube (QUT) from 5pm – 7pm. It’s sponsored by NetApp and promises to be a great afternoon.
Two’s Company, Three’s a Cloud – NetApp, VMware and AWS
NetApp has had a strategic relationship with VMware for over 20 years, and with AWS for over 10 years. Recently at VMware Explore we made a significant announcement about VMC support for NFS Datastores provided by the AWS FSx for NetApp ONTAP service.
Come and learn about this exciting announcement and more on the benefits of NetApp with VMware Cloud. We will discuss architecture concepts, use cases and cover topics such as migration, data protection and disaster recovery as well as Hybrid Cloud configurations.
There will be a lucky door prize as well as a prize for best question on the night. Looking forward to see you there!
Wade Juppenlatz – Specialist Systems Engineer – QLD/NT
Chris (Gonzo) Gondek – Partner Technical Lead QLD/NT
PIZZA AND NETWORKING BREAK!
This will be followed by:
All the News from VMware Explore – (without the jet lag)
We will cover a variety of cloudy announcements from VMware Explore, including:
VMware Cloud on AWS
VMware Cloud Flex Storage
GCVE, OCVS, AVS
VMware Ransomware Recovery for Cloud DR
Dan Frith – Staff Solutions Architect – VMware Cloud on AWS, VMware
And we will be finishing off with:
Preparing for VMware Certifications
With the increase of position requirements in the last few years, certifications help you demonstrate your skills and move you a step forward on getting better jobs. In this Community Ssession we will help you understand how to prepare for a VMware certification exam and some useful tips you can use during the exam.
We will talk about:
Different types of exams
How to schedule an exam
Where to get material to study
Lessons learned from the field per type of exam
Francisco Fernandez Cardarelli – Senior Consultant (4 x VCIX)
Soft drinks and vBeers will be available throughout the evening! We look forward to seeing you there!
Doors open at 5pm. Please make your way to The Atrium, on Level 6.
You can find out more information and register for the event here. I hope to see you there. Also, if you’re interested in sponsoring one of these events, please get in touch with me and I can help make it happen.
VMware Cloud on AWS has been around for just over 5 years now, and in that time it’s proven to be a popular platform for a variety of workloads, industry verticals, and organisations of all different sizes. However, one of the challenges that a hyper-converged architecture presents is that resource growth is generally linear (depending on the types of nodes you have available). In the case of VMware Cloud on AWS, we (now) have 3 nodes available for use: the I3, I3en, and I4i. Each of these instances provides a fixed amount of CPU, RAM, and vSAN storage for use within your VMC cluster. So when your storage grows past a certain threshold (80%), you need to add an additional node. This is a longwinded way of saying that, even if you don’t need the additional CPU and RAM, you need to add it anyway. To address this challenge, VMware now offers what’s called “Supplemental Storage” for VMware Cloud on AWS. This is ostensibly external dat stores presented to the VMC hosts over NFS. This comes in two flavours: FSx for NetApp ONTAP and VMware Cloud Flex Storage. I’ll cover this in a little more detail below.
[image courtesy of VMware]
Amazon FSx for NetApp ONTAP
The first cab off the rank is Amazon FSx for NetApp ONTAP (or FSxN to its friends). This one is ONTAP-like storage made available to your VMC environment as a native service. It’s fully customer managed, and VMware managed from a networking perspective.
[image courtesy of VMware]
There’s a 99.99% Availability SLA attached to the service. It’s based on NetApp ONTAP, and offers support for:
Note that it currently requires VMware Managed Transit Gateway (vTGW) for Multi-AZ deployment (the only deployment architecture currently supported), and can connect to multiple clusters and SDDCs for scale. You’ll need to be on SDDC version 1.20 (or greater) to leverage this service in your SDDC, and there is currently no support for attachment to stretched clusters. While you can only connect datastores to VMC hosts using NFSv3, there is support for connecting directly to guest via other protocols. More information can be found in the FAQ here. There’s also a simulator you can access here that runs you through the onboarding process.
VMware Cloud Flex Storage
The other option for supplemental storage is VMware Cloud Flex Storage (sometimes referred to as VMC-FS). This is a datastore presented to your hosts over NFSv3.
VMware Cloud Flex Storage is:
A natively integrated cloud storage service for VMware Cloud on AWS that is fully managed by VMware;
Cost effective multi-cloud Cloud storage solution built on SCFS;
Delivered via a two-tier architecture for elasticity and performance (AWS S3 and local NVMe cache); and
Provides integrated Data-Management.
In short, VMware has taken a lot of the technology used in VMware Cloud Disaster Recovery (the result of the Datrium acquisition in 2020) and used it to deliver up to 400 TiB of storage per SDDC.
[image courtesy of VMware]
The intent of the solution, at this stage at least, is that it is only offered as a datastore for hosts via NFSv3, rather than other protocols directly to guests. There are some limitations around the supported topologies too, with stretched clusters not currently supported. From a disaster recovery perspective, it’s important to note that VMware Cloud Flex Storage is currently only offered on a single-AZ basis (although the supporting components are spread across multiple Availability Zones), and there is currently no support for VMware Cloud Disaster Recovery co-existence with this solution.
I’ve only been at VMware for a short period of time, but I’ve had numerous conversations with existing and potential VMware Cloud on AWS customers looking to solve their storage problems without necessarily putting everything on vSAN. There are plenty of reasons why you wouldn’t want to use vSAN for high capacity storage workloads, and I believe these two initial solutions go some ways to solving that issue. Many of the caveats that are wrapped around these two products at General Availability will be removed over time, and the traditional objections relating to VMware Cloud on AWS being not great at high-capacity, cost-effective storage will also have been removed.
Finally, if you’re an existing NetApp ONTAP customer, and were thinking about what you were going to do with that Petabyte of unstructured data you had lying about when you moved to VMware Cloud on AWS, or wanting to take advantage of the sweat equity you’ve poured into managing your ONTAP environment over the years, I think we’ve got you covered as well.
Jeff Geerling seems to do a lot of projects that I either can’t afford to do, or don’t have the time to do. Either way, thanks Jeff. This latest one – Building a fast all-SSD NAS (on a budget) – looked like fun.
You like ransomware? What if I told you you can have it cross-platform? Excited yet? Read Melissa’s article on Multiplatform Ransomware for a more thorough view of what’s going on out there.
Speaking of storage and clouds, Chris M. Evans recently published a series of videos over at Architecting IT where he talks to NetApp’s Matt Watt about the company’s hybrid cloud strategy. You can see it here.
I’ve spent a lot of money over the years trying to find the perfect media streaming device for home. I currently favour the Apple TV 4K, but only because my Boxee Box can’t keep up with more modern codecs. This article on the Best Device for Streaming for Any User – 2022 seems to line up well with my experiences to date, although I admit I haven’t tried the NVIDIA device yet. I do miss playing ISOs over the network with the HD Mediabox 100, but those were simpler times I guess.
In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around managing host additions and failures on the VMware-managed VMware Cloud on AWS platform.
One of the questions I frequently get asked by customers is what happens when you reach a certain capacity in your VMware Cloud on AWS cluster? The good news is we have a feature called Elastic DRS that can take care of that for you. Elastic DRS is a little different to what you might know as the vSphere Distributed Resource Scheduler (DRS). Elastic DRS operates at a host level and takes care of capacity constraints in your VMC environment. The idea is that, when your cluster reaches a certain resource threshold (be it storage, vCPU, or RAM), Elastic DRS takes care of adding in additional host resources as required.
Minimum and maximum number of hosts the algorithm should scale up or down to.
Thresholds for CPU, memory and storage utilisation such that host allocation is optimized for cost or performance.
Note also that your cluster may scale back in, assuming the resources stay consistently below the threshold for a number of iterations.
There are a few different options for Elastic DRS, with the default being the “Elastic DRS Baseline Policy”. With this policy, a host is automatically added when there’s less than 20% free vSAN storage. Note that this doesn’t apply to single-node SDDC configurations, and only the baseline policy is available with 2-node configurations. Beyond those limitations, though, there are a number of other configurations available and these are outlined here. The neat thing is that there’s some amount of flexibility in how you have your SDDC automatically managed, with options for best performance, lowest cost, or rapid scale-out also available.
Can I Turn It Off?
No, but you can fiddle with the settings from your VMC cloud console.
What happens if I’m adding a host manually? The Elastic DRS recommendations are ignored. Same goes with planned maintenance or SDDC maintenance, where the support team may be adding in an additional host. But what if you’ve lost a host? The auto-remediation process kicks in and the Elastic DRS recommendations are ignored while the failed host is being replaced. You can read more about that process here.
One of the things I like about the VMware Cloud on AWS approach is that VMware has looked into a number of common scenarios that occur in the wild (hosts running out of capacity, for example) and built some automation on top of an already streamlined SDDC stack. Elastic DRS and the Auto-Scaler features seem like minor things, but when you’re managing an SDDC of any significant scale, it’s nice to have the little things taken care of.
In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to touch briefly on some things you might come across when sizing workloads for the VMware Cloud on AWS platform using the VMware Cloud on AWS Sizer.
VMware Cloud on AWS Sizer
One of the neat things about VMware Cloud on AWS is that you can jump on the publicly available sizing tool and input some numbers (or import RVTools or LiveOptics files) and it will spit out the number of nodes that you’ll (likely) need to support your workloads. Of course, if that’s all there was to it, you wouldn’t need folks like me to help you with sizing. That said, VMware has worked hard to ensure that the sizing part of your VMware Cloud on AWS planning is fairly straightforward. There are a few things to look out for though.
Why Do I See A Weird Number Of Cores In The Sizer?
If you put a workload into the sizer, you might see some odd core counts in the output. For example, the below screenshot shows 4x i3en nodes with 240 cores, but clearly it should be 192 cores (4x 48).
Yet when the same workload is changed to the i3 instance type, the correct amount of cores (5x 36 = 180) is displayed.
The reason for this is that the i3en instance types support Hyper-Threading, and the Sizer applies a weighting to calculations. This can be changed via the Global Settings in the Advanced section of the Sizer. If you’re not into HT, set it to 0%. If you’re a believer, set it to 100%. By default it’s set to 25%, hence the 240 cores number in the previous example (48 x 1.25 x 4 nodes).
Why Do I Need This Many Nodes?
You might need to satisfy Host Admission Control requirements. The current logic of Host Admission Control (as it’s applied in VMC sizer) is as follows:
A 2-host cluster should have 50.00 percent reserved CPU and memory capacity for HA Admission Control.
A 3-host cluster reserves 33.33 percent for HAC
And so on until you get to
A 16-host cluster reserving 6.25 percent of resources for HAC.
It’s also important to note that a 2-host cluster can accommodate a maximum of 35 VMs. Anything above that will need an extra host. And if you’re planning to run a full HCX configuration on two nodes, you should review this Knowledge Base article. Speaking of running things at capacity, I’ll go into Elastic DRS in another post, but by default we add another host to your cluster when you hit 80% storage capacity.
What About My Storage Consumption?
By default there are some storage policies applied to your vSAN configurations too. A standard Cluster with 5 hosts or less is set to 1 Failure / RAID-1, whilst a standard Cluster with 6 hosts or more is set to tolerate 2 Failures / RAID-6 by default. You can read more about that here.
There’s a bunch of stuff I haven’t covered here, including the choices you have to make between using RVTools and LiveOptics, and whether you should size with a high CPU to core ratio or keep it one to one like the old timers like. But hopefully this post has been of some use explaining some of the quirky things that pop up in the Sizer from time to time.