Random Short Take #34

Welcome to Random Short Take #34. Some really good players have worn 34 in the NBA, including Ray Allen and Sir Charles. This one, though, goes out to my favourite enforcer, Charles Oakley. If it feels like it’s only been a week since the last post, that’s because it has.

  • I spoke to the folks at Rancher Labs a little while ago, and they’re doing some stuff around what they call “Edge Scalability” and have also announced Series D funding.
  • April Fool’s is always a bit of a trying time, what with a lot of the world being a few timezones removed from where I live. Invariably I stop checking news sites for a few days to be sure. Backblaze recognised that these are strange times, and decided to have some fun with their releases, rather than trying to fool people outright. I found the post on Catblaze Cloud Backup inspiring.
  • Hal Yaman announced the availability of version 2.6 of his Office 365 Backup sizing tool. Speaking of Veeam and handy utilities, the Veeam Extract utility is now available as a standalone tool. Cade talks about that here.
  • VMware vSphere 7 recently went GA. Here’s a handy article covering what it means for VMware cloud providers.
  • Speaking of VMware things, John Nicholson wrote a great article on SMB and vSAN (I can’t bring myself to write CIFS, even when I know why it’s being referred to that way).
  • Scale is infinite, until it isn’t. Azure had some minor issues recently, and Keith Townsend shared some thoughts on the situation.
  • StorMagic recently announced that it has acquired KeyNexus. It also announced the availability of SvKMS, a key management system for edge, DC, and cloud solutions.
  • Joey D’Antoni, in collaboration with DH2i, is delivering a webinar titled “Overcoming the HA/DR and Networking Challenges of SQL Server on Linux”. It’s being held on Wednesday 15th April at 11am Pacific Time. If that timezone works for you, you can find out more and register here.

VMware – VMworld 2017 – STO1179BU – Understanding the Availability Features of vSAN

Disclaimer: I recently attended VMworld 2017 – US.  My flights were paid for by ActualTech Media, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my rough notes from “STO1179BU – Understanding the Availability Features of vSAN”, presented by GS Khalsa (@gurusimran) and Jeff Hunter (@jhuntervmware). You can grab a PDF of the notes from here. Note that these posts don’t provide much in the way of opinion, analysis, or opinionalysis. They’re really just a way of providing you with a snapshot of what I saw. Death by bullet point if you will.

 

Components and Failure

vSAN Objects Consist of Components

VM

  • VM Home – multiple components
  • Virtual Disk – multiple components
  • Swap File – multiple components

vSAN has a cache tier and capacity tier (objects are stored here)

 

Quorum

Greater than 50% must be online to achieve quorum

  • Each component has one vote by default
  • Odd number of votes required to break tie – preserves data integrity
  • Greater than 50% of components (votes) must be online
  • Components can have more than one vote
  • Votes added by vSAN, if needed, to ensure odd number

 

Component Vote Counts Are Visible Using RVC CLI

/<vcenter>/datacenter/vms> vsan_vm_object_info <vm>

 

Storage Policy Determines Component Number and Placement

  • Primary level of failures to tolerate
  • Failure Tolerance Method

Primary level of failures to tolerate = 0 Means only one copy

  • Maximum component size is 255GB
  • vSAN will split bigger into smaller sized VMDKs
  • RAID-5/6 Erasure Coding Uses Stripes and Parity (need to be using all-flash)
  • Consumes less RAW capacity
  • Number of stripes also affects component counts

 

Each Host is an Implicit Fault Domain

  • Multiple components can end up in the same rack
  • Configure Fault Domains in the UI
  • Add at least one more host or fault domain for rebuilds

 

Component States Change as a Result of a Failure

  • Active
  • Absent
  • Degraded

vSAN selects most efficient way

Which is most efficient? Repair or Rebuild? It depends. Partial repairs are performed if full repair is not possible

 

vSAN Maintenance Mode

Three vSAN Options for Host Maintenance Mode

  • Evacuate all data to other hosts
  • Ensure data accessibility from other hosts
  • No data evacuation

 

Degraded Device Handling (DDH) in vSAN 6.6

  • vSAN 6.6 is more “intelligent”, builds on previous versions of DDH
  • When device is degraded, components are evaluated …
  • If component does not belong to last replica, mark as absent – “Lazy” evacuation since another replica of the object exists
  • If component belongs to last replica, start evacuation
  • Degraded devices will not be used for new component placement
  • Evacuation failures reported in UI

 

DDH and S.M.A.R.T.

Following items logged in vmkernel.log when drive is identified as unhealthy

  • Sectors successfully reallocated 0x05
  • Reported uncorrectable sectors 0xBB
  • Disk command timeouts 0xBC
  • Sector reallocation events 0xC4
  • Pending sector reallocations 0xC5
  • Uncorrectable sectors 0xC6

Helps GSS determine what to do with drive after evacuation

 

Stretched Clusters

Stretched Cluster Failure Scenarios

  • Extend the idea of fault domains from racks to sites
  • Witness component (tertiary site) – witness host
  • 5ms RTT (around 60 miles)
  • VM will have a preferred and secondary site
  • When component fails, starts rebuilding of preferred site

 

Stretched Cluster Local Failure Protection – new in vSAN 6.6

  • Redundancy against host failure and site failure
  • If site fails, vSAN maintains local redundancy in surviving site
  • No change in stretched cluster configuration steps
  • Optimised logic to minimise I/O traffic across sites
    • Local read, local resync
    • Single inter-site write for multiple replicas
  • RAID-1 between the sites, and then RAID-5 in the local sites

What happens during network partition or site failure?

  • HA Restart

Inter-site network disconnected (split brain)

  • HA Power-off

Witness Network Disconnected

  • Witness leaves cluster

VMs continue to operate normally. Very simple to redeploy a new one. Recommended host isolation response in a stretched cluster is power off

Witness Host Offline

  • Recover or redeploy witness host

New in 6.6 – change witness host

 

vSAN Backup, Replication and DR

Data Protection

  • vSphere APIs – Data Protection
  • Same as other datastore (VMFS, etc)
  • Verify support with backup vendor
  • Production and backup data on vSAN
    • Pros: Simple, rapid restore
    • Cons: Both copies lost if vSAN datastore is lost, can consume considerable capacity

 

Solutions …

  • Store backup data on another datastore
    • SAN or NAS
    • Another vSAN cluster
    • Local drives
  • Dell EMC Avamar and NetWorker
  • Veeam Backup and Replication
  • Cohesity
  • Rubrik
  • Others …

vSphere Replication included with Essentials Plus Kit and higher. With this you get per-VM RPOs as low as 5 minutes

 

Automated DR with Site Recovery Manager

  • HA with Stretched Cluster, Automated DR with SRM
  • SRM at the tertiary site

Useful session. 4 stars.

Storage Field Day 7 – Day 2 – VMware

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the VMware presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the VMware website that covers some of what they presented.

 

Overview

I’d like to say a few things about the presentation. Firstly, it was held in the “Rubber Chicken” Room at VMware HQ.

Secondly, Rawlinson was there, but we ran out of time to hear him present. This seems to happen each time I see him in real life. Still, it’s not everyday you get to hear Christos Karamanolis (@XtosK) talk about this stuff, so I’ll put my somewhat weird @PunchingClouds fanboy thing to the side for the moment.

SFD7_Day2_VMware_XtosK_HA

Thirdly, and I’ll be upfront about this, I was a bit disappointed that VMware didn’t go outside some fairly fixed parameters as far as what they could and couldn’t talk about with regards to Virtual SAN. I understand that mega software companies have to be a bit careful about what they can say publicly, but I had hoped for something fresher in this presentation. In any case, I’ve included my notes on Christos’s view on the VSAN architecture – I hope it’s useful.

 

Architecture

VMware adopted the following principles when designing VSAN.

Hyper-converged

  • Compute + storage scalability
  • Unobtrusive to existing data centre architecture
  • Distributed software running on every host
  • Pools local storage (flash + HDD) on hosts (virtual shared datastore)
  • Symmetric architecture – no single point of failure, no bottleneck

The hypervisor opens up new opportunities, with the virtualisation platform providing:

  • Visibility to individual VMs and application storage
  • Manages all applications’ resource requirements
  • Sits directly in the I/O path
  • A global view of underlying infrastructure
  • Supports an extensive hardware compatibility list (HCL)

Critical paths in ESX kernel

The cluster service allows for

  • Fast failure detection
  • High performance (especially for writes)

The data path provides

  • Low latency
  • Minimal CPU per IO
  • Minimal Mem consumption
  • Physical access to devices

This equals minimal impact on consolidation rates. This is a Good Thing™.

Optimized internet protocol

As ESXi is both the “consumer” and “producer” of data there is no need for a standard data access protocol.

Per-object coordinator = client

  • Distributed “metadata server”
  • Transactions span only object distribution

Efficient reliable data transport (RDT)

  • Protocol agnostic (now TCP/IP)
  • RDMA friendly

Standard protocol for external access?

Two tiers of storage: Hybrid

Optimise the cost of physical storage resources

  • HDDS: cheap capacity, expensive IOPS
  • Flash: expensive capacity, cheap IOPS

Combine best of both worlds

  • Performance from flash (read cache + write back)
  • Capacity from HDD (capacity tier)

Optimise workload per tier

  • Random IO to flash (high IOPS)
  • Sequential IO to HDD (high throughput)

Storage organised in disk groups (flash device and magnetic disks) – up to 5 disk groups, 1 SSD + 7 HDDs – this is the fault domain. 70% of flash is read cache, 30% is write buffer. Writes are accumulated, then staged in a magnetic disk-friendly fashion. Proximal IO – writing blocks within a certain number of cylinders. Filesystem on the magnetic disks is slightly different to the one on the SSDs. Uses the back-end of the Virsto filesystem, but doesn’t use the log-structure filesystem component.

Distributed caching

Flash device: cache of disk group (70% read cache, 30% write-back buffer)

No caching on “local” flash where VM runs

  • Flash latencies 100x network latencies
  • No data transfers, no perf hit during VM migration
  • Better overall flash utilisation (most expensive resource)

Use local cache when it matters

  • In-memory CBRC (RAM << Network latency)
  • Lots of block sharing (VDI)
  • More options in the future …

Deduplicated RAM-based caching

Object-based storage

  • VM consists of a number of objects – each object individually distributed
  • VSAN doesn’t know about VMs and VMDKs
  • Up to 62TB useable
  • Single namespace, multiple mount points
  • VMFS created in sub-namespace

The VM Home directory object is formatted with VMFS to allow a VM’s config files to be stored on it. Mounted under the root dir vsanDatastore.

  • Availability policy reflected on number of replicas
  • Performance policy may include a stripe width per replica
  • Object “components” may reside in different disks and / or hosts

VSAN cluster = vSphere cluster

Ease of management

  • Piggyback on vSphere management workflow, e.g. EMM
  • Ensure coherent configuration of hosts in vSphere cluster

Adapt to the customer’s data centre architecture while working with network topology constraints.

Maintenance mode – planned downtime.

Three options:

  • Ensure accessibility;
  • Full data migration; and
  • No data migration.

HA Integration

VM-centric monitoring and troubleshooting

VMODL APIs

  • Configure, manage, monitor

Policy compliance reporting

Combination of tools for monitoring in 5.5

  • CLI commmands
  • Ruby vSphere console
  • VSAN observer

More to come soon …

Real *software* defined storage

Software + hardware – component based (individual components), Virtual SAN ready node (40 OEM validated server configurations are ready for VSAN deployment)

VMware EVO:RAIL = Hyper-converged infrastructure

It’s a big task to get all of this working with everything (supporting the entire vSphere HCL).

 

Closing Thoughts and Further Reading

I like VSAN. And I like that VMware are working so hard at getting it right. I don’t like some of the bs that goes with their marketing of the product, but I think it has its place in the enterprise and is only going to go from strength to strength with the amount of resources VMware is throwing at it. In the meantime, check out Keith’s background post on VMware here. In my opinion, you can’t go past Cormac’s posts on VSAN if you want a technical deep dive. Also, buy his book.