Random Short Take #88

Welcome to Random Short Take #88. This one’s been sitting in my drafts folder for a while. Let’s get random.

VMware Cloud on AWS – TMCHAM – Part 11 – Storage Policies

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to cover Managed Storage Policy Profiles (MSPPs) on the VMware-managed VMware Cloud on AWS platform.

 

Background

VMware Cloud on AWS has MSPPs deployed on clusters to ensure that customers have sufficient resilience built into the cluster to withstand disk or node failures. By default, clusters are configured with RAID 1, Failures to Tolerate (FTT):1 for 2 – 5 nodes, and RAID 6, FTT:2 for clusters with 6 or more nodes. Note that single-node clusters have no Service Level Agreement (SLA) attached to them, as you generally only run those on a trial basis, and if the node fails, there’s nowhere for the data to go. You can read more about vSAN Storage Polices and MSPPs here, and there’s a great Tech Zone article here. The point of these policies is that they are designed to ensure your cluster(s) remain in compliance with the SLAs for the platform. You can view the policies in your environment by going to Policies and Profiles in vCenter and selecting VM Storage Policies.

 

Can I Change Them?

The MSPPs are maintained by VMware, and so it’s not a great idea to change the default policies on your cluster, as the system will change them back at some stage. And why would you want to change the policies on your cluster? Well, you might decide that 4 or 5 nodes could actually run better (from a capacity perspective) using RAID 5, rather than RAID 1. This is a reasonable thing to want to do, and as the SLA talks about FTT numbers, not RAID types, you can change the RAID type and remain in compliance. And the capacity difference can be material in some cases, particularly if you’re struggling to fit your workloads onto a smaller node count.

 

So How Do I Do It Then?

Clone The Policy

There are a few ways to approach this, but the simplest is by cloning an existing policy. In this example, I’ll clone the vSAN Default Storage Policy. In the VMware Cloud on AWS, there is an MSPP assigned to each cluster with the name “VMC Workload Storage Policy – ClusterName“. Select the policy you want to clone and then click on Clone.

The first step is to give the VM Storage Policy a name. Something cool with your initials should do the trick.

You can edit the policy structure at this point, or just click Next.

Here you can configure your Availability options. You can also do other things, like configure Tags and Advanced Policy Rules.

Once this is configured, the system will check that your vSAN datastore are compatible with your policy.

And then you’re ready to go. Click Finish, make yourself a beverage, bask in the glory of it all.

Apply The Policy

So you have a fresh new policy, now what? You can choose to apply it to your workload datastore, or apply it to specific Virtual Machines. To apply it to your datastore, select the datastore you want to modify, click on General, then click on Edit next to the Default Storage Policy option. The process to apply the policy to VMs is outlined here. Note that if you create a non-compliant policy and apply it to your datastore, you’ll get hassled about it and you should likely consider changing your approach.

 

Thoughts

The thing about managed platforms is that the service provider is on the hook for architecture decisions that reduce the resilience of the platform. And the provider is trying to keep the platform running within the parameters of the SLA. This is why you’ll come across configuration items in VMware Cloud on AWS that you either can’t change, or have some default options that seem conservative. Many of these decisions have been made with the SLAs and the various use cases in mind for the platform. That said, it doesn’t mean there’s no flexibility here. If you need a little more capacity, particularly in smaller environments, there are still options available that won’t reduce the platform’s resilience, while still providing additional capacity options.

Random Short Take #86

Welcome to Random Short Take #86. It’s been a while, and I’ve been travelling a bit for work. So let’s get random.

  • Let’s get started with three things / people I like: Gestalt IT, Justin Warren, and Pure Storage. This article by Justin digs into some of the innovation we’re seeing from Pure Storage. Speaking of Justin, if you don’t subscribe to his newsletter “The Crux”, you should. I do. Subscribe here.
  • And speaking of Pure Storage, a survey was conducted and results were had. You can read more on that here.
  • Switching gears slightly (but still with a storage focus), check out the latest Backblaze drive stats report here.
  • Oh you like that storage stuff? What about this article on file synchronisation and security from Chin-Fah?
  • More storage? What about this review of the vSAN Objects Viewer from Victor?
  • I’ve dabbled in product management previously, but this article from Frances does a much better job of describing what it’s really like.
  • Edge means different things to different people, and I found this article from Ben Young to be an excellent intro to the topic.
  • You know I hate Netflix but love its tech blog. Check out this article on migrating critical traffic at scale.

Bonus round. I’m in the Bay Area briefly next week. If you’re around, let me know! Maybe we can watch one of the NBA Finals games.

Random Short Take #85

Welcome to Random Short Take #85. Let’s get random.

Random Short Take #83

Welcome to Random Short Take #83. Quite a few press releases in this one, so let’s get random.

Random Short Take #82

Happy New Year (to those who celebrate). Let’s get random.

Random Short Take #81

Welcome to Random Short Take #81. Last one for the year, because who really wants to read this stuff over the holiday season? Let’s get random.

Take care of yourselves and each other, and I’ll hopefully see you all on the line or in person next year.

VMware Cloud on AWS – TMCHAM – Part 8 – TRIM/UNMAP

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to delve into some questions around TRIM/UNMAP and capacity reclamation on the VMware-managed VMware Cloud on AWS platform.

 

Why TRIM/UNMAP?

TRIM/UNMAP, in short, is the capability for operating systems to reclaim no longer used space on thin-provisioned filesystems. Why is this important? Imagine you have a thin-provisioned volume that has 100GB of capacity allocated to it. It consumes maybe 1GB when it’s first deployed. You then add 50GB of data to it. You then delete 50GB of data from the volume. You’ll still see 51GB of capacity being consumed on the filesystem. This is because older operating systems just mark the blocks as deleted, but don’t zero them out. Modern operating systems do support TRIM/UNMAP though, but the hypervisor needs to understand the commands being sent to it. You can read more on that here.

How I Do This For VMware Cloud on AWS?

You can contact your account team, and we raise a ticket to get the feature enabled. We had some minor issues recently that meant we weren’t enabling the feature, but if you’re running M16v12 or M18v5 (or above) on your SDDCs, you should be good to go. Note that this feature is enabled on a per-cluster basis, and you need to reboot the VMs in the cluster for it to take effect.

What About Migrating With HCX?

Do the VMs come across thin? Do you need to reclaim space first? If you’re using HCX to go from thick to thin, you should be fine. If you’re migrating thin to thin, it’s worth checking whether you’ve got any space reclamation in place on your source side. I’ve had customers report back that some environments have migrated across with higher than expected storage usage due to a lack of space reclamation happening on the source storage environment. You can use something like Live Optics to report on your capacity consumed vs allocated, and how much capacity can be reclaimed.

Why Isn’t This Enabled By Default?

I don’t know for sure, but I imagine it has something to do with the fact that TRIM/UNMAP has the potential to have a performance impact from a latency perspective, depending on the workloads running in the environment, and the amount of capacity being reclaimed at any given time. We recommend that you “schedule large space reclamation jobs during off-peak hours to reduce any potential impact”. Given that VMware Cloud on AWS is a fully-managed service, I imagine we want to control as many of the performance variables as possible to ensure our customers enjoy a reliable and stable platform. That said, TRIM/UNMAP is a really useful feature, and you should look at getting it enabled if you’re concerned about the potential for wasted capacity in your SDDC.

Verity ES Springs Forth – Promises Swift Eradication of Data

Verity ES recently announced its official company launch and the commercial availability of its Verity ES data eradication enterprise software solution. I had the opportunity to speak to Kevin Enders about the announcement and thought I’d briefly share some thoughts here.

 

From Revert to Re-birth?

Revert, a sister company of Verity ES, is an on-site data eradication service provider. It’s also a partner for a number of Storage OEMs.

The Problem

The folks at Revert have had an awful lot of experience with data eradication in big enterprise environments. With that experience, they’d observed a few challenges, namely:

  • The software doing the data eradication was too slow;
  • Eradicating data in enterprise environments introduced particular requirements at high volumes; and
  • Larger capacity HDDs and SDDs were a real problem to deal with.

The Real Problem?

Okay, so the process to get rid of old data on storage and compute devices is a bit of a problem. But what’s the real problem? Organisations need to get rid of end of life data – particularly from a legal standpoint – in a more efficient way. Just as data growth continues to explode, so too does the requirement to delete the old data.

 

The Solution

Verity ES was spawned to develop software to solve a number of the challenges Revert were coming across in the field. There are two ways to do it:

  • Eliminate the data destructively (via device shredding / degaussing); or
  • Non-destructively (using software-based eradication).

Why Eradicate?

Why eradicate? It’s a sustainable approach, enables residual value recovery, and allows for asset re-use. But it nonetheless needs to be secure, economical, and operationally simple to do. How does Verity ES address these requirements? It has Product Assurance Certification from ADISA. It’s also developed software that’s more efficient, particularly when it comes to those troublesome high capacity drives.

[image courtesy of Verity ES]

Who’s Buying?

Who’s this product aimed at? Primarily enterprise DC operators, hyperscalers, IT asset disposal companies, and 3rd-party hardware maintenance providers.

 

Thoughts

If you’ve spent any time on my blog you’ll know that I write a whole lot about data protection, and this is probably one of the first times that I’ve written about data destruction as a product. But it’s an interesting problem that many organisations are facing now. There is a tonne of data being generated every day, and some of that data needs to be gotten rid of, either because it’s sitting on equipment that’s old and needs to be retired, or because legislatively there’s a requirement to get rid of the data.

The way we tackle this problem has changed over time too. One of the most popular articles on this blog was about making an EMC CLARiiON CX700 useful again after EMC did a certified erasure on the array. There was no data to be found on the array, but it was able to be repurposed as lab equipment, and enjoyed a few more months of usefulness. In the current climate, we’re all looking at doing more sensible things with our old disk drives, rather than simply putting a bullet in them (except for the Feds – but they’re a bit odd). Doing this at scale can be challenging, so it’s interesting to see Verity ES step up to the plate with a solution that promises to help with some of these challenges. It takes time to wipe drives, particularly when you need to do it securely.

I should be clear that this data doesn’t go out and identify what data needs to be erased – you have to do that through some other tools. So it won’t tell you that a bunch of PII is buried in a home directory somewhere, or sitting in a spot it shouldn’t be. It also won’t go out and dig through your data protection data and tell you what needs to go. Hopefully, though, you’ve got tools that can handle that problem for you. What this solution does seem to do is provide organisations with options when it comes to cost-effective, efficient data eradication. And that’s something that’s going to become crucial as we continue to generate data, need to delete old data, and do so on larger and larger disk drives.

VMware Cloud on AWS – Supplemental Storage – A Few Notes …

At VMware Explore 2022 in the US, VMware announced a number of new offerings for VMware Cloud on AWS, including something we’re calling “Supplemental Storage”. There are some great (official) posts that have already been published, so I won’t go through everything here. I thought it would be useful to provide some high-level details and cover some of the caveats that punters should be aware of.

 

The Problem

VMware Cloud on AWS has been around for just over 5 years now, and in that time it’s proven to be a popular platform for a variety of workloads, industry verticals, and organisations of all different sizes. However, one of the challenges that a hyper-converged architecture presents is that resource growth is generally linear (depending on the types of nodes you have available). In the case of VMware Cloud on AWS, we (now) have 3 nodes available for use: the I3, I3en, and I4i. Each of these instances provides a fixed amount of CPU, RAM, and vSAN storage for use within your VMC cluster. So when your storage grows past a certain threshold (80%), you need to add an additional node. This is a longwinded way of saying that, even if you don’t need the additional CPU and RAM, you need to add it anyway. To address this challenge, VMware now offers what’s called “Supplemental Storage” for VMware Cloud on AWS. This is ostensibly external dat stores presented to the VMC hosts over NFS. This comes in two flavours: FSx for NetApp ONTAP and VMware Cloud Flex Storage. I’ll cover this in a little more detail below.

[image courtesy of VMware]

 

Amazon FSx for NetApp ONTAP

The first cab off the rank is Amazon FSx for NetApp ONTAP (or FSxN to its friends). This one is ONTAP-like storage made available to your VMC environment as a native service. It’s fully customer managed, and VMware managed from a networking perspective.

[image courtesy of VMware]

There’s a 99.99% Availability SLA attached to the service. It’s based on NetApp ONTAP, and offers support for:

  • Multi-Tenancy
  • SnapMirror
  • FlexClone
​Note that it currently requires VMware Managed Transit Gateway (vTGW) for Multi-AZ deployment (the only deployment architecture currently supported), and can connect to multiple clusters and SDDCs for scale. You’ll need to be on SDDC version 1.20 (or greater) to leverage this service in your SDDC, and there is currently no support for attachment to stretched clusters. While you can only connect datastores to VMC hosts using NFSv3, there is support for connecting directly to guest via other protocols. More information can be found in the FAQ here. There’s also a simulator you can access here that runs you through the onboarding process.

 

VMware Cloud Flex Storage

The other option for supplemental storage is VMware Cloud Flex Storage (sometimes referred to as VMC-FS). This is a datastore presented to your hosts over NFSv3.

Overview

VMware Cloud Flex Storage is:

  • A natively integrated cloud storage service for VMware Cloud on AWS that is fully managed by VMware;
  • Cost effective multi-cloud Cloud storage solution built on SCFS;
  • Delivered via a two-tier architecture for elasticity and performance (AWS S3 and local NVMe cache); and
  • Provides integrated Data-Management.

In short, VMware has taken a lot of the technology used in VMware Cloud Disaster Recovery (the result of the Datrium acquisition in 2020) and used it to deliver up to 400 TiB of storage per SDDC.

[image courtesy of VMware]
The intent of the solution, at this stage at least, is that it is only offered as a datastore for hosts via NFSv3, rather than other protocols directly to guests. There are some limitations around the supported topologies too, with stretched clusters not currently supported. From a disaster recovery perspective, it’s important to note that VMware Cloud Flex Storage is currently only offered on a single-AZ basis (although the supporting components are spread across multiple Availability Zones), and there is currently no support for VMware Cloud Disaster Recovery co-existence with this solution.

 

Thoughts
I’ve only been at VMware for a short period of time, but I’ve had numerous conversations with existing and potential VMware Cloud on AWS customers looking to solve their storage problems without necessarily putting everything on vSAN. There are plenty of reasons why you wouldn’t want to use vSAN for high capacity storage workloads, and I believe these two initial solutions go some ways to solving that issue. Many of the caveats that are wrapped around these two products at General Availability will be removed over time, and the traditional objections relating to VMware Cloud on AWS being not great at high-capacity, cost-effective storage will also have been removed.
Finally, if you’re an existing NetApp ONTAP customer, and were thinking about what you were going to do with that Petabyte of unstructured data you had lying about when you moved to VMware Cloud on AWS, or wanting to take advantage of the sweat equity you’ve poured into managing your ONTAP environment over the years, I think we’ve got you covered as well.