Been wondering if your media streaming experience is really all that it could be? Are discs really better than streaming? I don’t think they are, but I’m often too lazy to get up and put the disc in the player.
In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to cover Managed Storage Policy Profiles (MSPPs) on the VMware-managed VMware Cloud on AWS platform.
VMware Cloud on AWS has MSPPs deployed on clusters to ensure that customers have sufficient resilience built into the cluster to withstand disk or node failures. By default, clusters are configured with RAID 1, Failures to Tolerate (FTT):1 for 2 – 5 nodes, and RAID 6, FTT:2 for clusters with 6 or more nodes. Note that single-node clusters have no Service Level Agreement (SLA) attached to them, as you generally only run those on a trial basis, and if the node fails, there’s nowhere for the data to go. You can read more about vSAN Storage Polices and MSPPs here, and there’s a great Tech Zone article here. The point of these policies is that they are designed to ensure your cluster(s) remain in compliance with the SLAs for the platform. You can view the policies in your environment by going to Policies and Profiles in vCenter and selecting VM Storage Policies.
Can I Change Them?
The MSPPs are maintained by VMware, and so it’s not a great idea to change the default policies on your cluster, as the system will change them back at some stage. And why would you want to change the policies on your cluster? Well, you might decide that 4 or 5 nodes could actually run better (from a capacity perspective) using RAID 5, rather than RAID 1. This is a reasonable thing to want to do, and as the SLA talks about FTT numbers, not RAID types, you can change the RAID type and remain in compliance. And the capacity difference can be material in some cases, particularly if you’re struggling to fit your workloads onto a smaller node count.
So How Do I Do It Then?
Clone The Policy
There are a few ways to approach this, but the simplest is by cloning an existing policy. In this example, I’ll clone the vSAN Default Storage Policy. In the VMware Cloud on AWS, there is an MSPP assigned to each cluster with the name “VMC Workload Storage Policy – ClusterName“. Select the policy you want to clone and then click on Clone.
The first step is to give the VM Storage Policy a name. Something cool with your initials should do the trick.
You can edit the policy structure at this point, or just click Next.
Here you can configure your Availability options. You can also do other things, like configure Tags and Advanced Policy Rules.
Once this is configured, the system will check that your vSAN datastore are compatible with your policy.
And then you’re ready to go. Click Finish, make yourself a beverage, bask in the glory of it all.
Apply The Policy
So you have a fresh new policy, now what? You can choose to apply it to your workload datastore, or apply it to specific Virtual Machines. To apply it to your datastore, select the datastore you want to modify, click on General, then click on Edit next to the Default Storage Policy option. The process to apply the policy to VMs is outlined here. Note that if you create a non-compliant policy and apply it to your datastore, you’ll get hassled about it and you should likely consider changing your approach.
The thing about managed platforms is that the service provider is on the hook for architecture decisions that reduce the resilience of the platform. And the provider is trying to keep the platform running within the parameters of the SLA. This is why you’ll come across configuration items in VMware Cloud on AWS that you either can’t change, or have some default options that seem conservative. Many of these decisions have been made with the SLAs and the various use cases in mind for the platform. That said, it doesn’t mean there’s no flexibility here. If you need a little more capacity, particularly in smaller environments, there are still options available that won’t reduce the platform’s resilience, while still providing additional capacity options.
Welcome to Random Short Take #86. It’s been a while, and I’ve been travelling a bit for work. So let’s get random.
Let’s get started with three things / people I like: Gestalt IT, Justin Warren, and Pure Storage. This article by Justin digs into some of the innovation we’re seeing from Pure Storage. Speaking of Justin, if you don’t subscribe to his newsletter “The Crux”, you should. I do. Subscribe here.
And speaking of Pure Storage, a survey was conducted and results were had. You can read more on that here.
Switching gears slightly (but still with a storage focus), check out the latest Backblaze drive stats report here.
Oh you like that storage stuff? What about this article on file synchronisation and security from Chin-Fah?
More storage? What about this review of the vSAN Objects Viewer from Victor?
I’ve dabbled in product management previously, but this article from Frances does a much better job of describing what it’s really like.
Edge means different things to different people, and I found this article from Ben Young to be an excellent intro to the topic.
Speaking of old things, El Reg had some info on running (hobbyist) x86-64 editions of OpenVMS. I ran OpenVMS on a DEC Alpha AXP-150 at home for a brief moment, but that feels like it was a long time ago.
This article from JB on the Bowlo was excellent. I don’t understand why Australians are so keen on poker machines (or gambling in general), but it’s nice when businesses go against the grain a bit.
I miss Tru64, and Solaris for that matter. I don’t miss HP-UX. And I definitely won’t miss AIX. Read about the death of Unix over at El Reg – Unix is dead. Long live Unix!
The I3.metal is going away very soon. Remember this is from a sales perspective, VMware is still supporting the I3.metal in the wild, and you’ll still have access to deploy on-demand if required (up to a point).
Welcome to Random Short Take #81. Last one for the year, because who really wants to read this stuff over the holiday season? Let’s get random.
Curtis did a podcast on archive and retrieve as part of his “Backup to Basics” series. It’s something I feel pretty strongly about, so much so that I wrote a chapter in his book about it. You can listen to it here.
I love Backblaze. Not in the sense that I want to marry the company, but I really like what the folks there do. And I really like the transparency with which they operate. This article giving a behind the scenes look at its US East Data Center is a fantastic example of that.
And, to “celebrate” 81 Random Short Takes (remember when I used to list my favourite NBA players and the numbers they wore?), let’s take a stroll down memory lane with two of my all-time, top 5, favourite NBA players – Kobe Bryant and Jalen Rose. The background for this video is explained by Jalen here.
Take care of yourselves and each other, and I’ll hopefully see you all on the line or in person next year.
In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around TRIM/UNMAP and capacity reclamation on the VMware-managed VMware Cloud on AWS platform.
TRIM/UNMAP, in short, is the capability for operating systems to reclaim no longer used space on thin-provisioned filesystems. Why is this important? Imagine you have a thin-provisioned volume that has 100GB of capacity allocated to it. It consumes maybe 1GB when it’s first deployed. You then add 50GB of data to it. You then delete 50GB of data from the volume. You’ll still see 51GB of capacity being consumed on the filesystem. This is because older operating systems just mark the blocks as deleted, but don’t zero them out. Modern operating systems do support TRIM/UNMAP though, but the hypervisor needs to understand the commands being sent to it. You can read more on that here.
Do the VMs come across thin? Do you need to reclaim space first? If you’re using HCX to go from thick to thin, you should be fine. If you’re migrating thin to thin, it’s worth checking whether you’ve got any space reclamation in place on your source side. I’ve had customers report back that some environments have migrated across with higher than expected storage usage due to a lack of space reclamation happening on the source storage environment. You can use something like Live Optics to report on your capacity consumed vs allocated, and how much capacity can be reclaimed.
Why Isn’t This Enabled By Default?
I don’t know for sure, but I imagine it has something to do with the fact that TRIM/UNMAP has the potential to have a performance impact from a latency perspective, depending on the workloads running in the environment, and the amount of capacity being reclaimed at any given time. We recommend that you “schedule large space reclamation jobs during off-peak hours to reduce any potential impact”. Given that VMware Cloud on AWS is a fully-managed service, I imagine we want to control as many of the performance variables as possible to ensure our customers enjoy a reliable and stable platform. That said, TRIM/UNMAP is a really useful feature, and you should look at getting it enabled if you’re concerned about the potential for wasted capacity in your SDDC.
Verity ES recently announced its official company launch and the commercial availability of its Verity ES data eradication enterprise software solution. I had the opportunity to speak to Kevin Enders about the announcement and thought I’d briefly share some thoughts here.
From Revert to Re-birth?
Revert, a sister company of Verity ES, is an on-site data eradication service provider. It’s also a partner for a number of Storage OEMs.
The folks at Revert have had an awful lot of experience with data eradication in big enterprise environments. With that experience, they’d observed a few challenges, namely:
The software doing the data eradication was too slow;
Eradicating data in enterprise environments introduced particular requirements at high volumes; and
Larger capacity HDDs and SDDs were a real problem to deal with.
The Real Problem?
Okay, so the process to get rid of old data on storage and compute devices is a bit of a problem. But what’s the real problem? Organisations need to get rid of end of life data – particularly from a legal standpoint – in a more efficient way. Just as data growth continues to explode, so too does the requirement to delete the old data.
Verity ES was spawned to develop software to solve a number of the challenges Revert were coming across in the field. There are two ways to do it:
Eliminate the data destructively (via device shredding / degaussing); or
Why eradicate? It’s a sustainable approach, enables residual value recovery, and allows for asset re-use. But it nonetheless needs to be secure, economical, and operationally simple to do. How does Verity ES address these requirements? It has Product Assurance Certification from ADISA. It’s also developed software that’s more efficient, particularly when it comes to those troublesome high capacity drives.
[image courtesy of Verity ES]
Who’s this product aimed at? Primarily enterprise DC operators, hyperscalers, IT asset disposal companies, and 3rd-party hardware maintenance providers.
If you’ve spent any time on my blog you’ll know that I write a whole lot about data protection, and this is probably one of the first times that I’ve written about data destruction as a product. But it’s an interesting problem that many organisations are facing now. There is a tonne of data being generated every day, and some of that data needs to be gotten rid of, either because it’s sitting on equipment that’s old and needs to be retired, or because legislatively there’s a requirement to get rid of the data.
The way we tackle this problem has changed over time too. One of the most popular articles on this blog was about making an EMC CLARiiON CX700 useful again after EMC did a certified erasure on the array. There was no data to be found on the array, but it was able to be repurposed as lab equipment, and enjoyed a few more months of usefulness. In the current climate, we’re all looking at doing more sensible things with our old disk drives, rather than simply putting a bullet in them (except for the Feds – but they’re a bit odd). Doing this at scale can be challenging, so it’s interesting to see Verity ES step up to the plate with a solution that promises to help with some of these challenges. It takes time to wipe drives, particularly when you need to do it securely.
I should be clear that this data doesn’t go out and identify what data needs to be erased – you have to do that through some other tools. So it won’t tell you that a bunch of PII is buried in a home directory somewhere, or sitting in a spot it shouldn’t be. It also won’t go out and dig through your data protection data and tell you what needs to go. Hopefully, though, you’ve got tools that can handle that problem for you. What this solution does seem to do is provide organisations with options when it comes to cost-effective, efficient data eradication. And that’s something that’s going to become crucial as we continue to generate data, need to delete old data, and do so on larger and larger disk drives.
VMware Cloud on AWS has been around for just over 5 years now, and in that time it’s proven to be a popular platform for a variety of workloads, industry verticals, and organisations of all different sizes. However, one of the challenges that a hyper-converged architecture presents is that resource growth is generally linear (depending on the types of nodes you have available). In the case of VMware Cloud on AWS, we (now) have 3 nodes available for use: the I3, I3en, and I4i. Each of these instances provides a fixed amount of CPU, RAM, and vSAN storage for use within your VMC cluster. So when your storage grows past a certain threshold (80%), you need to add an additional node. This is a longwinded way of saying that, even if you don’t need the additional CPU and RAM, you need to add it anyway. To address this challenge, VMware now offers what’s called “Supplemental Storage” for VMware Cloud on AWS. This is ostensibly external dat stores presented to the VMC hosts over NFS. This comes in two flavours: FSx for NetApp ONTAP and VMware Cloud Flex Storage. I’ll cover this in a little more detail below.
[image courtesy of VMware]
Amazon FSx for NetApp ONTAP
The first cab off the rank is Amazon FSx for NetApp ONTAP (or FSxN to its friends). This one is ONTAP-like storage made available to your VMC environment as a native service. It’s fully customer managed, and VMware managed from a networking perspective.
[image courtesy of VMware]
There’s a 99.99% Availability SLA attached to the service. It’s based on NetApp ONTAP, and offers support for:
Note that it currently requires VMware Managed Transit Gateway (vTGW) for Multi-AZ deployment (the only deployment architecture currently supported), and can connect to multiple clusters and SDDCs for scale. You’ll need to be on SDDC version 1.20 (or greater) to leverage this service in your SDDC, and there is currently no support for attachment to stretched clusters. While you can only connect datastores to VMC hosts using NFSv3, there is support for connecting directly to guest via other protocols. More information can be found in the FAQ here. There’s also a simulator you can access here that runs you through the onboarding process.
VMware Cloud Flex Storage
The other option for supplemental storage is VMware Cloud Flex Storage (sometimes referred to as VMC-FS). This is a datastore presented to your hosts over NFSv3.
VMware Cloud Flex Storage is:
A natively integrated cloud storage service for VMware Cloud on AWS that is fully managed by VMware;
Cost effective multi-cloud Cloud storage solution built on SCFS;
Delivered via a two-tier architecture for elasticity and performance (AWS S3 and local NVMe cache); and
Provides integrated Data-Management.
In short, VMware has taken a lot of the technology used in VMware Cloud Disaster Recovery (the result of the Datrium acquisition in 2020) and used it to deliver up to 400 TiB of storage per SDDC.
[image courtesy of VMware]
The intent of the solution, at this stage at least, is that it is only offered as a datastore for hosts via NFSv3, rather than other protocols directly to guests. There are some limitations around the supported topologies too, with stretched clusters not currently supported. From a disaster recovery perspective, it’s important to note that VMware Cloud Flex Storage is currently only offered on a single-AZ basis (although the supporting components are spread across multiple Availability Zones), and there is currently no support for VMware Cloud Disaster Recovery co-existence with this solution.
I’ve only been at VMware for a short period of time, but I’ve had numerous conversations with existing and potential VMware Cloud on AWS customers looking to solve their storage problems without necessarily putting everything on vSAN. There are plenty of reasons why you wouldn’t want to use vSAN for high capacity storage workloads, and I believe these two initial solutions go some ways to solving that issue. Many of the caveats that are wrapped around these two products at General Availability will be removed over time, and the traditional objections relating to VMware Cloud on AWS being not great at high-capacity, cost-effective storage will also have been removed.
Finally, if you’re an existing NetApp ONTAP customer, and were thinking about what you were going to do with that Petabyte of unstructured data you had lying about when you moved to VMware Cloud on AWS, or wanting to take advantage of the sweat equity you’ve poured into managing your ONTAP environment over the years, I think we’ve got you covered as well.