VMware Cloud on AWS – What’s New – February 2024

It’s been a little while since I posted an update on what’s new with VMware Cloud on AWS, so I thought I’d share some of the latest news.

 

M7i.metal-24xl Announced

It’s been a few months since it was announced at AWS re:Invent 2023, but the M7i.metal-24xl (one of the catchier host types I’ve seen) is going to the change the way we approach storage-heavy VMC on AWS deployments.

What is it?

It’s a host without local storage. There are 48 physical cores (96 logical cores with Hyper-Threading enabled). It has 384 GiB memory. The key point is that there are flexible NFS storage options to choose from – VMware Cloud Flex Storage or Amazon FSx for NetApp ONTAP. There’s support for up to 37.5 Gbps networking speed, and it supports always-on memory encryption using Intel Total Memory Encryption (TME).

Why?

Some of the potential use cases for this kind of host type are as follows:

  • CPU Intensive workloads
    • Image processing
    • Video encoding
    • Gaming servers
  • AI/ML Workloads
    • Code Generation
    • Natural Language Processing
    • Classical Machine Learning
    • Workloads with limited resource requirements
  • Web and application servers
    • Microservices/Management services
    • Secondary data stores/database applications
  • Ransomware & Disaster Recovery
    • Modern Ransomware Recovery
    • Next-gen DR
    • Compliance and Risk Management

Other Notes

New (greenfield) customers can deploy the M7i.metal-24xl in the first cluster using 2-16 nodes. Existing (brownfield) customers can deploy the M7i.metal-24xl in secondary clusters in the same SDDC. In terms of connectivity, we recommend you take advantage of VPC peering for your external storage connectivity. Note that there is no support for multi-AZ deployments, nor is there support for single node deployments. If you’d like to know more about the M7i.metal-24xl, there’s an excellent technical overview here.

 

vSAN Express Storage Architecture on VMware Cloud on AWS

SDDC Version 1.24 was announced in November 2023, and with that came support for vSAN Express Storage Architecture (ESA) on VMC on AWS. There’s some great info on what’s included in the 1.24 release here, but I thought I’d focus on some of the key constraints you need to look at when considering ESA in your VMC on AWS environment.

Currently, the following restrictions apply to vSAN ESA in VMware Cloud on AWS:
  • vSAN ESA is available for clusters using i4i hosts only.
  • vSAN ESA is not supported with stretched clusters.
  • vSAN ESA is not supported with 2-host clusters.
  • After you have deployed a cluster, you cannot convert from vSAN ESA to vSAN OSA or vice versa.
So why do it? There are plenty of reasons, including better performance, enhanced resource efficiency, and several improvements in terms of speed and resiliency. You can read more about it here.

VMware Cloud Disaster Recovery Updates

There have also been some significant changes to VCDR, with the recent announcement that we now support a 15-minute Recovery Point Objective (down from 30 minutes). There have also been a number of enhancements to the ransomware recovery capability, including automatic Linux security sensor installation in the recovery workflow (trust me, once you’ve done it manually a few times you’ll appreciate this). With all the talk of supplemental storage above, it should be noted that “VMware Cloud DR does not support recovering VMs to VMware Cloud on AWS SDDC with NFS-mounted external datastores including Amazon FSx for NetApp datastores, Cloud Control Volumes or VMware Cloud Flex Storage”. Just in case you had an idea that this might be something you want to do.

 

Thoughts

Much of the news about VMware has been around the acquisition by Broadcom. It certainly was news. In the meantime, however, the VMware Cloud on AWS product and engineering teams have continued to work on releasing innovative features and incremental improvements. The encouraging thing about this is that they are listening to customers and continuing to adapt the solution architecture to satisfy those requirements. This is a good thing for both existing and potential customers. If you looked at VMware Cloud on AWS three years ago and ruled it out, I think it’s worth looking at again.

Nexsan Announces Unity NV6000

Nexsan recently announced the Nexsan Unity NV6000. I had the chance to speak to Andy Hill about it, and thought I’d share some thoughts here.

 

What Is It?

[image courtesy of Nexsan]

I’ve said it before, and I’ll say it again … in the immortal words of Silicon Valley: “It’s a box”. And a reasonably powerful one at that, coming loaded with the following specifications.

Supported Protocols SAN (Fibre Channel, iSCSI), NAS (NFS, SMB 1.0 to 3.0, FTP), Object (S3)
Disk Bays | Rack U 60 | 4U
Maximum Drives with Expansion 180
Maximum Raw Capacity (chassis | total) 1.12 PB Raw | 3.36 PB Raw
System Memory (DRAM) per controller up to 128GB
FASTier 2.5″ SSD Drives (TB) 1.92 | 3.84 | 7.68 | 15.36
3.5” 7.2K SAS Drives (TB) 4 | 6 | 8 | 10 | 12 | 14 | 16 | 18 | 20
2.5″ NVME 1DWPD SSDs (TB) N/A
Host Connectivity 16/32Gb FC | 10/25/40/100 GbE
Max CIFS | NFS File Systems 512
Data Protection: Immutable Snapshots, S3 Object-Locking, and optional Unbreakable Backup.

It’s a dual-controller platform, with each controller containing 2x Intel Xeon Silver CPUs and a 12Gb/s SAS backplane. Note that you get access to the following features included as part of the platform license:

  • Nexsan’s FASTier® Caching – Use solid-state to accelerate the performance of the underlying spinning disks
  • Nexsan Unity software version 7.0, with important enhancements to power, enterprise-class security, compliance, and ransomware protection
  • Enhanced Performance – Up to 100,000 IOPs
  • Third-Party Software Support – Windows VSS, VMware, VAAI, Commvault, Veeam Ready Repository, and more
  • Multi-Protocol Support – SAN (Fibre Channel, iSCSI), NAS (NFS, CIF, SMB1 to SMB3, FTP), Object (S3), 16/32GB FC, 10/25/40/100 GbE
  • High Availability – No single point-of-failure architecture with dual redundant storage controllers, redundant power supplies and RAID

 

Other Features

Snapshot Immutability

The snapshot immutability claim caught my eye, as immutable means a lot of things to a lot of people. Hill mentioned that the snapshot IP used on Unity was developed in-house by Nexsan and isn’t the patched together solution that some other vendors promote as an immutable solution. There are some other smarts within Unity that should give users comfort that data can’t be easily gotten at. Once you’ve set retention periods for snapshots, for example, you can’t log in to the platform and the set the date forward and have those snapshots expire. The object storage componet also supports S3 Object Lock, which is good news for punters looking to take advantage of this feature.

Unified Protocol Support

It’s in the name, and Nexsan has done a good job of incorporating a variety of storage protocols and physical access methods into the Unity platform. There’s File, Block, and Object, and support for both FC and speedy Ethernet as well. In other words, something for everyone.

Assureon Integration

One of the other features I like about the Unity is the integration with Assureon. If you’re unfamiliar with Assureon, you can check it out here. It takes storage security and compliance to another level, and is worth looking into if you have a requirement for things like regulatory compliant storage, the ability to maintain chain of custody, and fun things like that.

 

Thoughts and Further Reading

Who cares about storage arrays any more? A surprising number of people, and with good reason. Some folks still need them in the data centre. And folks are also looking for storage arrays that can do more with less. I think this is where the Nexsan offering excels, with multi-protocol and multi-transport support, along with some decent security chops and an all-inclusive licensing model, it provides for cost-effective storage (thanks to a mix of spinning rust and solid-state drives) that competes well with the solutions that have traditionally dominated the midrange market. Additionally, integration with solutions like Assureon makes this a solution that’s worth a second look, particularly if you’re in the market for object storage with a lower barrier to entry (from a cost and capacity perspective) and the ability to deal with backup data in a secure fashion.

Arcitecta Announces Mediaflux Universal Data System

I had the opportunity to speak to Jason Lohrey and Robert Murphy from Arcitecta a little while ago about the company’s Mediaflux announcement. It was a great conversation, and I’m sad that I hadn’t heard about the company beforehand. In any case I figured I’d share some thoughts on the announcement.

 

What Is It?

The folks at Arcitecta describe the Mediaflux Universal Data System as “a convergence of data management, data orchestration, multi-protocol access, and storage in one platform”. The idea is that the system manages your data across all of your storage platforms. It’s not just clustered or distributed storage. It’s not just a control plane that gives you multi-protocol access to your storage platforms. It’s not just an orchestration engine that can move your data around as required. It’s all of these things and a bit more too. Features include:

  • Converges data management, orchestration and storage within a single platform – that’s right, it’s all in the one box.
  • Manages every aspect of the data lifecycle: On-premises and cloud, with globally distributed access.
  • Offers multi-protocol access and support. The system supports NFS, SMB, S3, SFTP and DICOM, among many others.
  • Empowers immense scalability. Mediaflux licensing is decoupled from the volume of data stored so organisations can affordably scale storage needs to hundreds of petabytes, accommodating hundreds of billions of files without the financial strain typically associated with such vast capacities. Note that Mediaflux’s pricing is based on the number of concurrent users.
  • Provides the option to forego third-party software and clustered file systems.
  • Supports multi-vendor storage environments, allowing customers to choose best-of-breed hardware.

Seem ambitious? Maybe, but it also seems like something that would be super useful.

 

Global Storage At A Worldwide Scale

One of the cool features of Mediaflux is how it handles distributed file systems, not just across data centres, but across continents. A key feature of the platform is the ability to deliver the same file system to every site.

[image courtesy of Arcitecta]

It has support for centralised file locking, as well as replication between sites. You can also configure variable retention policies for different site copies, giving you flexibility when it comes to how long you store your data in various locales. According to the folks at Arcitecta, it’s also happy to make the most of your bandwidth, and able to use up to 95% of the available bandwidth.

 

Thoughts And Further Reading

There have been a few data management / orchestration / unified control plane companies that have had a tilt at doing universal storage access well, across distances, and with support for multiple protocols. Sometimes the end result looks like an engineering project at best, and you have to hold your mouth right to have any hope of seeing your data again once you send it on its way. Putting these kinds of platforms together is no easy task, and that’s why this has been something of a journey for the team at Arcitecta. The company previously supported Mediaflux on top of third-party file and object systems, but customers needed a solution that was more flexible and affordable.

So why not just use the cloud? Because some people don’t like storing stuff in hyperscaler environments. And sometimes there’s a requirement for better performance than you can reasonably pay for in a cloud environment. And not every hyperscaler might have a presence where you want your data to be. All that said, if you do have data in the cloud, you can manage it with Mediaflux too.

I’m the first to admit that I haven’t had any recent experience with the type of storage systems that would benefit from something like Mediaflux, but on paper it solves a lot of the problems that enterprises come across when trying to make large datasets available across the globe, while managing the lifecycle of those datasets and keeping them readily available. Given some of the reference customers that are making use of the platform, it seems reasonable to assume that the company has been doing something right. As with all things storage, your mileage might vary, but if you’re running into roadblocks with the storage platforms you know and love, it might be time to talk to the nice people in Melbourne about what they can do for you. If you’d like to read more, you can download a Solution Brief as well.

Random Short Take #91

Squeezing this one in before the end of the year. It’s shorter than normal but we all have other things to do. Let’s get random.

  • Like the capacity and power consumption of tape but still want it on disk? Check out this coverage of the Disk Archive Corporation over at Blocks and Files.
  • This was a great series of posts on the RFC process. It doesn’t just happen by magic.
  • Jeff Geerling ran into some issues accessing media recently. It’s a stupid problem to have, and one of the reasons I’m still such a sucker for physical copies of things. I did giggle a bit when I first read the post though. These kind of issues come up frequently for folks outside the US thanks to content licensing challenges and studios just wanting us to keep paying for the same thing over and over again and not have any control over how we consume content.
  • My house was broken into recently. It’s a jarring experience at best. I never wanted to put cameras around my house, but now I have. If you do this in Queensland you can let the coppers know and they can ask for your help if there’s a crime in the area. I know it’s not very punk rock to surveil people but fuck those kids.
  • You didn’t think I’d get to 91 and not mention Dennis Rodman, did you? One of my top 5 favourite players of all time. Did everything on the court that I didn’t: played defence, grabbed rebounds, and gave many a high energy performance. So here’s some highlights on YouTube.

That’s it for this year. Stay safe, and see you in the future.

Random Short Take #89

Welcome to Random Short Take #89. I’ve been somewhat preoccupied with the day job and acquisitions. And the start of the NBA season. But Summer is almost here in the Antipodes. Let’s get random.

  • Jon Waite put out this article on how to deploy an automated Cassandra metrics cluster for VCD.
  • Chris Wahl wrote a great article on his thoughts on platform engineering as product design at scale. I’ve always found Chris to be a switched on chap, and his recent articles diving deeper into this topic have done nothing to change my mind.
  • Curtis and I have spoken about this previously, and he talks some more about the truth behind SaaS data recovery over at Gestalt IT. The only criticism I have for Curtis is that he’s just as much Mr Recovery as he is Mr Backup and he should have trademarked that too.
  • Would it be a Random Short Take without something from Chin-Fah? Probably not one worth reading. In this article he’s renovated his lab and documented the process of attaching TrueNAS iSCSI volumes to his Proxmox environment. I’m fortunate enough to not have had to do Linux iSCSI in some time, but it looks mildly easier than it used to be.
  • Press releases? Here’s one for you: Zerto research report finds companies lack a comprehensive ransomware strategy. Unlike the threat of World War 3 via nuclear strike in the eighties, ransomware is not a case of if, but when.
  • Hungry for more press releases? Datadobi is accelerating its channel momentum with StorageMAP.
  • In other PR news, Nyriad has unveiled its storage-as-a-service offering. I had a chance to speak to them recently, and they are doing some very cool stuff – worth checking out.
  • I hate all kinds of gambling, and I really hate sports gambling, and ads about it. And it drives me nuts when I see sports gambling ads in apps like NBA League Pass. So this news over at El Reg about the SBS offering consumers the chance to opt out of those kinds of ads is fantastic news. It doesn’t fix the problem, but it’s a step in the right direction.

StorPool Announces Version 21

StorPool recently announced version 21 of its storage platform, offering improvements across data protection, efficiency, availability, and compatibility. I had the opportunity to speak to Boyan Krosnov and Alex Ivanov and wanted to share some thoughts.

 

“Magic” Scale-out Erasure Coding

One of the main features announced with Version 21 was “magic” scale-out erasure coding. Previously, StorPool offered triple replication protection of data across nodes. Now, with at least five all-NVMe storage servers, you can take advantage of this new erasure coding. Key capabilities include:

  • Near-zero performance impact even for Tier 0/Tier 1 workloads;
  • Data redundancy across nodes, as information is protected across servers with two parity objects so that any two servers can fail and data remains safe and accessible;
  • Great flexibility and operational efficiency. With per-volume policy management, volumes can be protected with triple replication or Erasure Coding, with per-volume live conversion between data protection schemes;
  • Always-on, non-disruptive operations – up to two storage nodes can be rebooted or brought down for maintenance while the entire storage system remains running with all data remaining available; and
  • Incremental mesh encoding and recovery.

 

Other New Features

But that’s not all. There’s also been work done in the following areas:

  • Improved iSCSI scalability – with support for exporting up to 1000 iSCSI targets per server
  • CloudStack plug-in improvements – introduces support for CloudStack’s volume encryption and partial zone-wide storage that enables easy live migration between compute hosts.
  • OpenNebula add-on improvements – now supports multi-cluster deployments where multiple StorPool sub-clusters behave as a single large-scale primary storage system with a unified global namespace
  • OpenStack Cinder driver improvements – Easy deployment with Canonical Charmed OpenStack and OpenStack instances managed with kolla-ansible
  • Deep integration with Proxmox Virtual Environment – introduces end-to-end automation of all storage operations in Proxmox VE deployments
  • Additional hardware and software compatibility – increased the number of validated hardware and operating systems resulting in easier deployment of StorPool Storage in customers’ preferred environments

 

Thoughts and Further Reading

It’s been a little while since I’ve written about StorPool, and the team continues to add features to the platform and grow in terms of customer adoption and maturity in the market. Every time I speak to Alex and Boyan, I get a strong sense that they’re relentlessly focussed on making the platform more stable, more performance-oriented, and easier to operate. I’m a fan of many of the design principles the company has adopted for its platform, including the use of standard server hardware, fitting in with customer workflows, and addressing the needs of demanding applications. It’s great that it scales linearly, but it’s as equally exciting, at least to me, that it “fades into the background”. Good infrastructure doesn’t want to be mentioned every day, it just needs to work (and work well). The folks at StorPool understand this, and seem to working hard to ensure that the platform, and the service that supports it, meets this requirement to fade into the background. It’s not necessarily “magic”, but it can be done with good code. StorPool has been around since 2012, is self-funded, profitable, and growing. I’ve enjoyed watching the evolution of the product since I was first introduced to it, and am looking forward to seeing what’s next in future releases. For another perspective on the announcement, check out this article over at Gestalt IT.

Random Short Take #88

Welcome to Random Short Take #88. This one’s been sitting in my drafts folder for a while. Let’s get random.

VMware Cloud on AWS – TMCHAM – Part 11 – Storage Policies

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to cover Managed Storage Policy Profiles (MSPPs) on the VMware-managed VMware Cloud on AWS platform.

 

Background

VMware Cloud on AWS has MSPPs deployed on clusters to ensure that customers have sufficient resilience built into the cluster to withstand disk or node failures. By default, clusters are configured with RAID 1, Failures to Tolerate (FTT):1 for 2 – 5 nodes, and RAID 6, FTT:2 for clusters with 6 or more nodes. Note that single-node clusters have no Service Level Agreement (SLA) attached to them, as you generally only run those on a trial basis, and if the node fails, there’s nowhere for the data to go. You can read more about vSAN Storage Polices and MSPPs here, and there’s a great Tech Zone article here. The point of these policies is that they are designed to ensure your cluster(s) remain in compliance with the SLAs for the platform. You can view the policies in your environment by going to Policies and Profiles in vCenter and selecting VM Storage Policies.

 

Can I Change Them?

The MSPPs are maintained by VMware, and so it’s not a great idea to change the default policies on your cluster, as the system will change them back at some stage. And why would you want to change the policies on your cluster? Well, you might decide that 4 or 5 nodes could actually run better (from a capacity perspective) using RAID 5, rather than RAID 1. This is a reasonable thing to want to do, and as the SLA talks about FTT numbers, not RAID types, you can change the RAID type and remain in compliance. And the capacity difference can be material in some cases, particularly if you’re struggling to fit your workloads onto a smaller node count.

 

So How Do I Do It Then?

Clone The Policy

There are a few ways to approach this, but the simplest is by cloning an existing policy. In this example, I’ll clone the vSAN Default Storage Policy. In the VMware Cloud on AWS, there is an MSPP assigned to each cluster with the name “VMC Workload Storage Policy – ClusterName“. Select the policy you want to clone and then click on Clone.

The first step is to give the VM Storage Policy a name. Something cool with your initials should do the trick.

You can edit the policy structure at this point, or just click Next.

Here you can configure your Availability options. You can also do other things, like configure Tags and Advanced Policy Rules.

Once this is configured, the system will check that your vSAN datastore are compatible with your policy.

And then you’re ready to go. Click Finish, make yourself a beverage, bask in the glory of it all.

Apply The Policy

So you have a fresh new policy, now what? You can choose to apply it to your workload datastore, or apply it to specific Virtual Machines. To apply it to your datastore, select the datastore you want to modify, click on General, then click on Edit next to the Default Storage Policy option. The process to apply the policy to VMs is outlined here. Note that if you create a non-compliant policy and apply it to your datastore, you’ll get hassled about it and you should likely consider changing your approach.

 

Thoughts

The thing about managed platforms is that the service provider is on the hook for architecture decisions that reduce the resilience of the platform. And the provider is trying to keep the platform running within the parameters of the SLA. This is why you’ll come across configuration items in VMware Cloud on AWS that you either can’t change, or have some default options that seem conservative. Many of these decisions have been made with the SLAs and the various use cases in mind for the platform. That said, it doesn’t mean there’s no flexibility here. If you need a little more capacity, particularly in smaller environments, there are still options available that won’t reduce the platform’s resilience, while still providing additional capacity options.

Random Short Take #86

Welcome to Random Short Take #86. It’s been a while, and I’ve been travelling a bit for work. So let’s get random.

  • Let’s get started with three things / people I like: Gestalt IT, Justin Warren, and Pure Storage. This article by Justin digs into some of the innovation we’re seeing from Pure Storage. Speaking of Justin, if you don’t subscribe to his newsletter “The Crux”, you should. I do. Subscribe here.
  • And speaking of Pure Storage, a survey was conducted and results were had. You can read more on that here.
  • Switching gears slightly (but still with a storage focus), check out the latest Backblaze drive stats report here.
  • Oh you like that storage stuff? What about this article on file synchronisation and security from Chin-Fah?
  • More storage? What about this review of the vSAN Objects Viewer from Victor?
  • I’ve dabbled in product management previously, but this article from Frances does a much better job of describing what it’s really like.
  • Edge means different things to different people, and I found this article from Ben Young to be an excellent intro to the topic.
  • You know I hate Netflix but love its tech blog. Check out this article on migrating critical traffic at scale.

Bonus round. I’m in the Bay Area briefly next week. If you’re around, let me know! Maybe we can watch one of the NBA Finals games.

Random Short Take #85

Welcome to Random Short Take #85. Let’s get random.