VMware Cloud on AWS – TMCHAM – Part 11 – Storage Policies

In this edition of Things My Customers Have Asked Me (TMCHAM), I’m going to cover Managed Storage Policy Profiles (MSPPs) on the VMware-managed VMware Cloud on AWS platform.

 

Background

VMware Cloud on AWS has MSPPs deployed on clusters to ensure that customers have sufficient resilience built into the cluster to withstand disk or node failures. By default, clusters are configured with RAID 1, Failures to Tolerate (FTT):1 for 2 – 5 nodes, and RAID 6, FTT:2 for clusters with 6 or more nodes. Note that single-node clusters have no Service Level Agreement (SLA) attached to them, as you generally only run those on a trial basis, and if the node fails, there’s nowhere for the data to go. You can read more about vSAN Storage Polices and MSPPs here, and there’s a great Tech Zone article here. The point of these policies is that they are designed to ensure your cluster(s) remain in compliance with the SLAs for the platform. You can view the policies in your environment by going to Policies and Profiles in vCenter and selecting VM Storage Policies.

 

Can I Change Them?

The MSPPs are maintained by VMware, and so it’s not a great idea to change the default policies on your cluster, as the system will change them back at some stage. And why would you want to change the policies on your cluster? Well, you might decide that 4 or 5 nodes could actually run better (from a capacity perspective) using RAID 5, rather than RAID 1. This is a reasonable thing to want to do, and as the SLA talks about FTT numbers, not RAID types, you can change the RAID type and remain in compliance. And the capacity difference can be material in some cases, particularly if you’re struggling to fit your workloads onto a smaller node count.

 

So How Do I Do It Then?

Clone The Policy

There are a few ways to approach this, but the simplest is by cloning an existing policy. In this example, I’ll clone the vSAN Default Storage Policy. In the VMware Cloud on AWS, there is an MSPP assigned to each cluster with the name “VMC Workload Storage Policy – ClusterName“. Select the policy you want to clone and then click on Clone.

The first step is to give the VM Storage Policy a name. Something cool with your initials should do the trick.

You can edit the policy structure at this point, or just click Next.

Here you can configure your Availability options. You can also do other things, like configure Tags and Advanced Policy Rules.

Once this is configured, the system will check that your vSAN datastore are compatible with your policy.

And then you’re ready to go. Click Finish, make yourself a beverage, bask in the glory of it all.

Apply The Policy

So you have a fresh new policy, now what? You can choose to apply it to your workload datastore, or apply it to specific Virtual Machines. To apply it to your datastore, select the datastore you want to modify, click on General, then click on Edit next to the Default Storage Policy option. The process to apply the policy to VMs is outlined here. Note that if you create a non-compliant policy and apply it to your datastore, you’ll get hassled about it and you should likely consider changing your approach.

 

Thoughts

The thing about managed platforms is that the service provider is on the hook for architecture decisions that reduce the resilience of the platform. And the provider is trying to keep the platform running within the parameters of the SLA. This is why you’ll come across configuration items in VMware Cloud on AWS that you either can’t change, or have some default options that seem conservative. Many of these decisions have been made with the SLAs and the various use cases in mind for the platform. That said, it doesn’t mean there’s no flexibility here. If you need a little more capacity, particularly in smaller environments, there are still options available that won’t reduce the platform’s resilience, while still providing additional capacity options.

Random Short Take #86

Welcome to Random Short Take #86. It’s been a while, and I’ve been travelling a bit for work. So let’s get random.

  • Let’s get started with three things / people I like: Gestalt IT, Justin Warren, and Pure Storage. This article by Justin digs into some of the innovation we’re seeing from Pure Storage. Speaking of Justin, if you don’t subscribe to his newsletter “The Crux”, you should. I do. Subscribe here.
  • And speaking of Pure Storage, a survey was conducted and results were had. You can read more on that here.
  • Switching gears slightly (but still with a storage focus), check out the latest Backblaze drive stats report here.
  • Oh you like that storage stuff? What about this article on file synchronisation and security from Chin-Fah?
  • More storage? What about this review of the vSAN Objects Viewer from Victor?
  • I’ve dabbled in product management previously, but this article from Frances does a much better job of describing what it’s really like.
  • Edge means different things to different people, and I found this article from Ben Young to be an excellent intro to the topic.
  • You know I hate Netflix but love its tech blog. Check out this article on migrating critical traffic at scale.

Bonus round. I’m in the Bay Area briefly next week. If you’re around, let me know! Maybe we can watch one of the NBA Finals games.

Random Short Take #72

This one is a little behind thanks to some work travel, but whatever. Let’s get random.

Random Short Take #61

Welcome to Random Short take #61.

  • VMworld is on this week. I still find the virtual format (and timezones) challenging, and I miss the hallway track and the jet lag. There’s nonetheless some good news coming out of the event. One thing that was announced prior to the event was Tanzu Community Edition. William Lam talks more about that here.
  • Speaking of VMworld news, Viktor provided a great summary on the various “projects” being announced. You can read more here.
  • I’ve been a Mac user for a long time, and there’s stuff I’m learning every week via Howard Oakley’s blog. Check out this article covering the Recovery Partition. While I’m at it, this presentation he did on Time Machine is also pretty ace.
  • Facebook had a little problem this week, and the Cloudflare folks have provided a decent overview of what happened. As someone who works for a service provider, this kind of stuff makes me twitchy.
  • Fibre Channel? Cloud? Chalk and cheese? Maybe. Read Chin-Fah’s article for some more insights. Personally, I miss working with FC, but I don’t miss the arguing I had to do with systems and networks people when it came to the correct feeding and watering of FC environments.
  • Remote working has been a challenge for many organisations, with some managers not understanding that their workers weren’t just watching streaming video all day, but actually being more productive. Not everything needs to be a video call, however, and this post / presentation has a lot of great tips on what does and doesn’t work with distributed teams.
  • I’ve had to ask this question before. And Jase has apparently had to answer it too, so he’s posted an article on vSAN and external storage here.
  • This is the best response to a trio of questions I’ve read in some time.

Random Short Take #57

Welcome to Random Short Take #57. Only one player has worn 57 in the NBA. So it looks like this particular bit is done. Let’s get random.

  • In the early part of my career I spent a lot of time tuning up old UNIX workstations. I remember lifting those SGI CRTs from desk to desk was never a whole lot of fun. This article about a Sun Ultra 1 project bought back a hint of nostalgia for those days (but not enough to really get into it again). Hat tip to Scott Lowe for the link.
  • As you get older, you realise that people talk a whole lot of rubbish most of the time. This article calling out audiophiles for the practice was great.
  • This article on the Backblaze blog about one company’s approach to building its streaming media capability on B2 made for interesting reading.
  • DH2i recently announced the general availability of DxEnterprise (DxE) for Containers, enabling cloud-native Microsoft SQL Server container Availability Groups outside and inside Kubernetes.
  • Speaking of press releases, Zerto has made a few promotions recently. You can keep up with that news here.
  • I’m terrible when it comes to information security, but if you’re looking to get started in the field, this article provides some excellent guidance on what you should be focussing on.
  • We all generally acknowledge that NTP is important, and most of us likely assume that it’s working. But have you been checking? This article from Tony does a good job of outlining some of the reasons you should be paying some more attention to NTP.
  • This is likely the most succinct article from John you’ll ever read, and it’s right on the money too.

Random Short Take #56

Welcome to Random Short Take #56. Only three players have worn 56 in the NBA. I may need to come up with a new bit of trivia. Let’s get random.

  • Are we nearing the end of blade servers? I’d hoped the answer was yes, but it’s not that simple, sadly. It’s not that I hate them, exactly. I bought blade servers from Dell when they first sold them. But they can present challenges.
  • 22dot6 emerged from stealth mode recently. I had the opportunity to talk to them and I’ll post something soon about that. In the meantime, this post from Mellor covers it pretty well.
  • It may be a Northern Hemisphere reference that I don’t quite understand, but Retrospect is running a “Dads and Grads” promotion offering 90 days of free backup subscriptions. Worth checking out if you don’t have something in place to protect your desktop.
  • Running VMware Cloud Foundation and want to stretch your vSAN cluster across two sites? Tony has you covered.
  • The site name in VMware Cloud Director can look a bit ugly. Steve O gives you the skinny on how to change it.
  • Pure//Accelerate happened recently / is still happening, and there was a bit of news from the event, including the new and improved Pure1 Digital Experience. As a former Pure1 user I can say this was a big part of the reason why I liked using Pure Storage.
  • Speaking of press releases, this one from PDI and its investment intentions caught my eye. It’s always good to see companies willing to spend a bit of cash to make progress.
  • I stumbled across Oxide on Twitter and fell for the aesthetic and design principles. Then I read some of the articles on the blog and got even more interested. Worth checking out. And I’ll be keen to see just how it goes for the company.

*Bonus Round*

I was recently on the Restore it All podcast with W. Curtis Preston and Prasanna Malaiyandi. It was a lot of fun as always, despite the fact that we talked about something that’s a pretty scary subject (data (centre) loss). No, I’m not a DC manager in real life, but I do have responsibility for what goes into our DC so I sort of am. Don’t forget there’s a discount code for the book in the podcast too.

Random Short Take #54

Welcome to Random Short Take #54. A few players have worn 54 in the NBA, but my favourite was Horace Grant. Let’s get random.

  • This project looked like an enjoyable, and relatively accessible, home project – building your own NVMe-based storage server.
  • When I was younger I had nightmares based on horror movies and falling out of bed (sometimes with both happening at the same time). Now this is the kind of thing that keeps me awake at night.
  • Speaking of disastrous situations, the OVH problem was a real problem for a lot of people. I wish them all the best with the recovery.
  • Tony has been doing things with vSAN in his lab and in production – worth checking out.
  • The folks at StorageOS have been hard at work improving their Kubernetes storage platform. You can read more about that here.
  • DH2i has a webinar coming up on SQL Server resilience that’s worth checking out. Details here.
  • We’re talking more about burnout in the tech industry, but probably not enough still. This article from Tom was insightful.

VMware – VMworld 2017 – STO1179BU – Understanding the Availability Features of vSAN

Disclaimer: I recently attended VMworld 2017 – US.  My flights were paid for by ActualTech Media, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my rough notes from “STO1179BU – Understanding the Availability Features of vSAN”, presented by GS Khalsa (@gurusimran) and Jeff Hunter (@jhuntervmware). You can grab a PDF of the notes from here. Note that these posts don’t provide much in the way of opinion, analysis, or opinionalysis. They’re really just a way of providing you with a snapshot of what I saw. Death by bullet point if you will.

 

Components and Failure

vSAN Objects Consist of Components

VM

  • VM Home – multiple components
  • Virtual Disk – multiple components
  • Swap File – multiple components

vSAN has a cache tier and capacity tier (objects are stored here)

 

Quorum

Greater than 50% must be online to achieve quorum

  • Each component has one vote by default
  • Odd number of votes required to break tie – preserves data integrity
  • Greater than 50% of components (votes) must be online
  • Components can have more than one vote
  • Votes added by vSAN, if needed, to ensure odd number

 

Component Vote Counts Are Visible Using RVC CLI

/<vcenter>/datacenter/vms> vsan_vm_object_info <vm>

 

Storage Policy Determines Component Number and Placement

  • Primary level of failures to tolerate
  • Failure Tolerance Method

Primary level of failures to tolerate = 0 Means only one copy

  • Maximum component size is 255GB
  • vSAN will split bigger into smaller sized VMDKs
  • RAID-5/6 Erasure Coding Uses Stripes and Parity (need to be using all-flash)
  • Consumes less RAW capacity
  • Number of stripes also affects component counts

 

Each Host is an Implicit Fault Domain

  • Multiple components can end up in the same rack
  • Configure Fault Domains in the UI
  • Add at least one more host or fault domain for rebuilds

 

Component States Change as a Result of a Failure

  • Active
  • Absent
  • Degraded

vSAN selects most efficient way

Which is most efficient? Repair or Rebuild? It depends. Partial repairs are performed if full repair is not possible

 

vSAN Maintenance Mode

Three vSAN Options for Host Maintenance Mode

  • Evacuate all data to other hosts
  • Ensure data accessibility from other hosts
  • No data evacuation

 

Degraded Device Handling (DDH) in vSAN 6.6

  • vSAN 6.6 is more “intelligent”, builds on previous versions of DDH
  • When device is degraded, components are evaluated …
  • If component does not belong to last replica, mark as absent – “Lazy” evacuation since another replica of the object exists
  • If component belongs to last replica, start evacuation
  • Degraded devices will not be used for new component placement
  • Evacuation failures reported in UI

 

DDH and S.M.A.R.T.

Following items logged in vmkernel.log when drive is identified as unhealthy

  • Sectors successfully reallocated 0x05
  • Reported uncorrectable sectors 0xBB
  • Disk command timeouts 0xBC
  • Sector reallocation events 0xC4
  • Pending sector reallocations 0xC5
  • Uncorrectable sectors 0xC6

Helps GSS determine what to do with drive after evacuation

 

Stretched Clusters

Stretched Cluster Failure Scenarios

  • Extend the idea of fault domains from racks to sites
  • Witness component (tertiary site) – witness host
  • 5ms RTT (around 60 miles)
  • VM will have a preferred and secondary site
  • When component fails, starts rebuilding of preferred site

 

Stretched Cluster Local Failure Protection – new in vSAN 6.6

  • Redundancy against host failure and site failure
  • If site fails, vSAN maintains local redundancy in surviving site
  • No change in stretched cluster configuration steps
  • Optimised logic to minimise I/O traffic across sites
    • Local read, local resync
    • Single inter-site write for multiple replicas
  • RAID-1 between the sites, and then RAID-5 in the local sites

What happens during network partition or site failure?

  • HA Restart

Inter-site network disconnected (split brain)

  • HA Power-off

Witness Network Disconnected

  • Witness leaves cluster

VMs continue to operate normally. Very simple to redeploy a new one. Recommended host isolation response in a stretched cluster is power off

Witness Host Offline

  • Recover or redeploy witness host

New in 6.6 – change witness host

 

vSAN Backup, Replication and DR

Data Protection

  • vSphere APIs – Data Protection
  • Same as other datastore (VMFS, etc)
  • Verify support with backup vendor
  • Production and backup data on vSAN
    • Pros: Simple, rapid restore
    • Cons: Both copies lost if vSAN datastore is lost, can consume considerable capacity

 

Solutions …

  • Store backup data on another datastore
    • SAN or NAS
    • Another vSAN cluster
    • Local drives
  • Dell EMC Avamar and NetWorker
  • Veeam Backup and Replication
  • Cohesity
  • Rubrik
  • Others …

vSphere Replication included with Essentials Plus Kit and higher. With this you get per-VM RPOs as low as 5 minutes

 

Automated DR with Site Recovery Manager

  • HA with Stretched Cluster, Automated DR with SRM
  • SRM at the tertiary site

Useful session. 4 stars.

EMC Announces VxRail

Yes, yes, I know it was a little while ago now. I’ve been occupied by other things and wanted to let the dust settle on the announcement before I covered it off here. And it was really a VCE announcement. But anyway. I’ve been doing work internally around all things hyperconverged and, as I work for a big EMC partner, people have been asking me about VxRail. So I thought I’d cover some of the more interesting bits.

So, let’s start with the reasonably useful summary links:

  • The VxRail datasheet (PDF) is here;
  • The VCE landing page for VxRail is here;
  • Chad’s take (worth the read!) can be found here; and
  • Simon from El Reg did a write-up here.

 

So what is it?

Well it’s a re-envisioning of VMware’s EVO:RAIL hyperconverged infrastructure in a way. But it’s a bit better than that, a bit more flexible, and potentially more cost effective. Here’s a box shot, because it’s what you want to see.

VxRail_002

Basically it’s a 2RU appliance housing 4 nodes. You can scale these nodes out in increments as required. There’s a range of hybrid configurations available.

VxRail_006

As well as some all flash versions.

VxRail_007

By default the initial configuration must be fully populated with 4 nodes, with the ability to scale up to 64 nodes (with qualification from VCE). Here are a few other notes on clusters:

  • You can’t mix All Flash and Hybrid nodes in the same cluster (this messes up performance);
  • All nodes within the cluster must have the same license type (Full License or BYO/ELA); and
  • First generation VSPEX BLUE appliances can be used in the same cluster with second generation appliances but EVC must be set to align with the G1 appliances for the whole cluster.

 

On VMware Virtual SAN

I haven’t used VSAN/Virtual SAN enough in production to have really firm opinions on it, but I’ve always enjoyed tracking its progress in the marketplace. VMware claim that the use of Virtual SAN over other approaches has the following advantages:

  • No need to install Virtual Storage Appliances (VSA);
  • CPU utilization <10%;
  • No reserved memory required;
  • Provides the shortest path for I/O; and
  • Seamlessly handles VM migrations.

If that sounds a bit like some marketing stuff, it sort of is. But that doesn’t mean they’re necessarily wrong either. VMware state that the placement of Virtual SAN directly in the hypervisor kernel allows it to “be fast, highly efficient, and be able to scale with flash and modern CPU architectures”.

While I can’t comment on this one way or another, I’d like to point out that this appliance is really a VMware play. The focus here is on the benefit of using an established hypervisor (vSphere), and established management solution (vCenter) and a (soon-to-be) established software defined storage solution (Virtual SAN). If you’re looking for the flexibility of multiple hypervisors or incorporating other storage solutions this really isn’t for you.

 

Further Reading and Final Thoughts

Enrico has a good write-up on El Reg about Virtual SAN 6.2 that I think is worth a look. You might also be keen to try something that’s NSX-ready. This is as close as you’ll get to that (although I can’t comment on the reality of one of those configurations). You’ve probably noticed there have been a tonne of pissing matches on the Twitters recently between VMware and Nutanix about their HCI offerings and the relative merits (or lack thereof) of their respective architectures. I’m not telling you to go one way or another. The HCI market is reasonably young, and I think there’s still plenty of change to come before the market has determined whether this really is the future of data centre infrastructure. In the meantime though, if you’re already slow-dancing with EMC or VCE and get all fluttery when people mention VMware, then the VxRail is worth a look if you’re HCI-curious but looking to stay with your current partner. It may not be for the adventurous amongst you, but you already know where to get your kicks. In any case, have a look at the datasheet and talk to your local EMC and VCE folk to see if this is the right choice for you.

Storage Field Day 7 – Day 2 – VMware

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the VMware presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the VMware website that covers some of what they presented.

 

Overview

I’d like to say a few things about the presentation. Firstly, it was held in the “Rubber Chicken” Room at VMware HQ.

Secondly, Rawlinson was there, but we ran out of time to hear him present. This seems to happen each time I see him in real life. Still, it’s not everyday you get to hear Christos Karamanolis (@XtosK) talk about this stuff, so I’ll put my somewhat weird @PunchingClouds fanboy thing to the side for the moment.

SFD7_Day2_VMware_XtosK_HA

Thirdly, and I’ll be upfront about this, I was a bit disappointed that VMware didn’t go outside some fairly fixed parameters as far as what they could and couldn’t talk about with regards to Virtual SAN. I understand that mega software companies have to be a bit careful about what they can say publicly, but I had hoped for something fresher in this presentation. In any case, I’ve included my notes on Christos’s view on the VSAN architecture – I hope it’s useful.

 

Architecture

VMware adopted the following principles when designing VSAN.

Hyper-converged

  • Compute + storage scalability
  • Unobtrusive to existing data centre architecture
  • Distributed software running on every host
  • Pools local storage (flash + HDD) on hosts (virtual shared datastore)
  • Symmetric architecture – no single point of failure, no bottleneck

The hypervisor opens up new opportunities, with the virtualisation platform providing:

  • Visibility to individual VMs and application storage
  • Manages all applications’ resource requirements
  • Sits directly in the I/O path
  • A global view of underlying infrastructure
  • Supports an extensive hardware compatibility list (HCL)

Critical paths in ESX kernel

The cluster service allows for

  • Fast failure detection
  • High performance (especially for writes)

The data path provides

  • Low latency
  • Minimal CPU per IO
  • Minimal Mem consumption
  • Physical access to devices

This equals minimal impact on consolidation rates. This is a Good Thing™.

Optimized internet protocol

As ESXi is both the “consumer” and “producer” of data there is no need for a standard data access protocol.

Per-object coordinator = client

  • Distributed “metadata server”
  • Transactions span only object distribution

Efficient reliable data transport (RDT)

  • Protocol agnostic (now TCP/IP)
  • RDMA friendly

Standard protocol for external access?

Two tiers of storage: Hybrid

Optimise the cost of physical storage resources

  • HDDS: cheap capacity, expensive IOPS
  • Flash: expensive capacity, cheap IOPS

Combine best of both worlds

  • Performance from flash (read cache + write back)
  • Capacity from HDD (capacity tier)

Optimise workload per tier

  • Random IO to flash (high IOPS)
  • Sequential IO to HDD (high throughput)

Storage organised in disk groups (flash device and magnetic disks) – up to 5 disk groups, 1 SSD + 7 HDDs – this is the fault domain. 70% of flash is read cache, 30% is write buffer. Writes are accumulated, then staged in a magnetic disk-friendly fashion. Proximal IO – writing blocks within a certain number of cylinders. Filesystem on the magnetic disks is slightly different to the one on the SSDs. Uses the back-end of the Virsto filesystem, but doesn’t use the log-structure filesystem component.

Distributed caching

Flash device: cache of disk group (70% read cache, 30% write-back buffer)

No caching on “local” flash where VM runs

  • Flash latencies 100x network latencies
  • No data transfers, no perf hit during VM migration
  • Better overall flash utilisation (most expensive resource)

Use local cache when it matters

  • In-memory CBRC (RAM << Network latency)
  • Lots of block sharing (VDI)
  • More options in the future …

Deduplicated RAM-based caching

Object-based storage

  • VM consists of a number of objects – each object individually distributed
  • VSAN doesn’t know about VMs and VMDKs
  • Up to 62TB useable
  • Single namespace, multiple mount points
  • VMFS created in sub-namespace

The VM Home directory object is formatted with VMFS to allow a VM’s config files to be stored on it. Mounted under the root dir vsanDatastore.

  • Availability policy reflected on number of replicas
  • Performance policy may include a stripe width per replica
  • Object “components” may reside in different disks and / or hosts

VSAN cluster = vSphere cluster

Ease of management

  • Piggyback on vSphere management workflow, e.g. EMM
  • Ensure coherent configuration of hosts in vSphere cluster

Adapt to the customer’s data centre architecture while working with network topology constraints.

Maintenance mode – planned downtime.

Three options:

  • Ensure accessibility;
  • Full data migration; and
  • No data migration.

HA Integration

VM-centric monitoring and troubleshooting

VMODL APIs

  • Configure, manage, monitor

Policy compliance reporting

Combination of tools for monitoring in 5.5

  • CLI commmands
  • Ruby vSphere console
  • VSAN observer

More to come soon …

Real *software* defined storage

Software + hardware – component based (individual components), Virtual SAN ready node (40 OEM validated server configurations are ready for VSAN deployment)

VMware EVO:RAIL = Hyper-converged infrastructure

It’s a big task to get all of this working with everything (supporting the entire vSphere HCL).

 

Closing Thoughts and Further Reading

I like VSAN. And I like that VMware are working so hard at getting it right. I don’t like some of the bs that goes with their marketing of the product, but I think it has its place in the enterprise and is only going to go from strength to strength with the amount of resources VMware is throwing at it. In the meantime, check out Keith’s background post on VMware here. In my opinion, you can’t go past Cormac’s posts on VSAN if you want a technical deep dive. Also, buy his book.