VMware – VMworld 2019 – HCI2888BU – Site Recovery Manager 8.2: What’s New and Demo

Disclaimer: I recently attended VMworld 2019 – US.  My flights and accommodation were paid for by Digital Sense, and VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated by VMware for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my rough notes from “HCI2888BU – Site Recovery Manager 8.2: What’s New and Demo”, presented by Cato Grace and Velina Krasteva (Senior PM for SRM and vSphere replication, VMware). You can grab a PDF copy of my notes from here.

 

SRM Product Overview

When you hear “disaster recovery” what do you think of? Natural disasters? DR is not just about natural disasters. It can also power, networking, people. Site Recovery Manager supports hypervisor-based and array-based replication. SRM about adding value to your replication.

Workflows?

Non-disruptive testing

  • Automated testing isolated network
  • Ensures predictability of RTO

Automated Failback

  • Re-protect using original recovery plan
  • Streamlines bi-directional migrations

Automated Failover

  • Runbook automation
  • Single-click initiation
  • Emphasises fastest possible recovery after outage

Planned Migration

  • Ensures zero data loss and app consistency
  • Enables disaster avoidance and DC maintenance or migration

*Demo

VMware Site Recovery (DRaaS) for VMware Cloud on AWS

DRaaS

  • Accelerate time to protection
  • Cloud economics with on-demand pricing
  • Integrated into VMware Cloud console
  • Post-failover cluster scaling with Elastic DRS
  • Inter-region protection

 

What’s New In 8.2?

Simplified deployment and operations with SRM as an appliance

  • Parity with Windows version
  • Simple OVF deployment
  • SRAs with the appliance setup as Docker containers
  • Greatly simplify SRM deployment, maintenance and upgrades.

Built on PhotonOS

Upgrading to the appliance – upgrade to 8.2 on Windows first, then migrate to the appliance. There’s a blog post on that here, and documentation here.

Improved ease of use with config import / export UI

  • Now entirely UI based
  • Export / backup and import / restore capabilities for entire SRM configuration
  • Includes entire SRM configuration (VMs, PGs, RPs, IP customisation, array managers, etc)
  • Enables simple DB migration

API and vRO workflow enhancements

  • Configure IP customisation
  • Add / remove datastores from array-based replication PGs
  • Remove post-power on tasks
  • Check status of VR replication
  • List replicated VMs
  • Get VR configuration
  • List replicated RDMs and Array Managers

New Workflows

New in SRM

  • Set IP settings
  • Update group datastore
  • Delete callouts

New in vSphere Replication

  • Check replication stats

Enhancements to SRM pack for vROps

  • Overcome DR monitoring challenges with global visibility into SRM environment
  • Mitigate risk associated with SRM component downtime
  • New views displaying
    • Recovery status
    • Count of VMs in recovery plans
    • Lots more
  • New arms for VMs that are in Protection Groups and not part of recovery plans

vSphere Replication Pack for vROps

Ability to monitor

  • RPO violations
  • Per VM metrics
  • Incoming replications
  • Outgoing replications
  • Replication status
  • Transferred bytes
  • Alerts
  • Replication Settings

UI Enhancements

  • Adjust colour schemes for optimal viewing
  • Capacity information available in the Protection Groups Datastores tab
  • Ability to provide in-product feedback – the smiley face icon

Support for NSX-T

  • Integration with NSX-T lets you use the network virtualisation to simplify the creation snd execution of recovery plans and accelerate recovery

Encrypted VMs Support

  • Full support for replicating, protecting, and recovering encrypted VMs

Encryption of replication traffic available per VM

Improved Logging Options with Syslog Support

  • Increased awareness of potential issues
  • Easier to troubleshoot issues
  • More opportunity for analysis

 

Tech Preview

We also went through a tech preview of what might be on the horizon with SRM. Note that this is all futures, and VMware may or may not end up delivering this as part of a future product.

  • SRM Support for vVols with Array-based Replication
  • Support protection and orchestrated recovery of VMs that are running on Virtual Volume datastore and are replicated by policy-based native array replication.
  • Automatic protection
  • Disk resting feature for vSphere Replication

 

Thoughts

I always enjoy these SRM sessions. Every time I make it along to VMworld US I try and get to Cato’s sessions. Even if you’re familiar with SRM, they’re a great summary of current, latest, and future product capability. SRM is a really cool solution for managing both migration and DR activities. And I don’t want to think about the number of times vSphere Replication has gotten us out of a spot doing cross-platform storage migrations. Cato and the team really know their stuff, so if you get a chance, do check out their other sessions this week.

VMware – VMworld 2017 – STO1179BU – Understanding the Availability Features of vSAN

Disclaimer: I recently attended VMworld 2017 – US.  My flights were paid for by ActualTech Media, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my rough notes from “STO1179BU – Understanding the Availability Features of vSAN”, presented by GS Khalsa (@gurusimran) and Jeff Hunter (@jhuntervmware). You can grab a PDF of the notes from here. Note that these posts don’t provide much in the way of opinion, analysis, or opinionalysis. They’re really just a way of providing you with a snapshot of what I saw. Death by bullet point if you will.

 

Components and Failure

vSAN Objects Consist of Components

VM

  • VM Home – multiple components
  • Virtual Disk – multiple components
  • Swap File – multiple components

vSAN has a cache tier and capacity tier (objects are stored here)

 

Quorum

Greater than 50% must be online to achieve quorum

  • Each component has one vote by default
  • Odd number of votes required to break tie – preserves data integrity
  • Greater than 50% of components (votes) must be online
  • Components can have more than one vote
  • Votes added by vSAN, if needed, to ensure odd number

 

Component Vote Counts Are Visible Using RVC CLI

/<vcenter>/datacenter/vms> vsan_vm_object_info <vm>

 

Storage Policy Determines Component Number and Placement

  • Primary level of failures to tolerate
  • Failure Tolerance Method

Primary level of failures to tolerate = 0 Means only one copy

  • Maximum component size is 255GB
  • vSAN will split bigger into smaller sized VMDKs
  • RAID-5/6 Erasure Coding Uses Stripes and Parity (need to be using all-flash)
  • Consumes less RAW capacity
  • Number of stripes also affects component counts

 

Each Host is an Implicit Fault Domain

  • Multiple components can end up in the same rack
  • Configure Fault Domains in the UI
  • Add at least one more host or fault domain for rebuilds

 

Component States Change as a Result of a Failure

  • Active
  • Absent
  • Degraded

vSAN selects most efficient way

Which is most efficient? Repair or Rebuild? It depends. Partial repairs are performed if full repair is not possible

 

vSAN Maintenance Mode

Three vSAN Options for Host Maintenance Mode

  • Evacuate all data to other hosts
  • Ensure data accessibility from other hosts
  • No data evacuation

 

Degraded Device Handling (DDH) in vSAN 6.6

  • vSAN 6.6 is more “intelligent”, builds on previous versions of DDH
  • When device is degraded, components are evaluated …
  • If component does not belong to last replica, mark as absent – “Lazy” evacuation since another replica of the object exists
  • If component belongs to last replica, start evacuation
  • Degraded devices will not be used for new component placement
  • Evacuation failures reported in UI

 

DDH and S.M.A.R.T.

Following items logged in vmkernel.log when drive is identified as unhealthy

  • Sectors successfully reallocated 0x05
  • Reported uncorrectable sectors 0xBB
  • Disk command timeouts 0xBC
  • Sector reallocation events 0xC4
  • Pending sector reallocations 0xC5
  • Uncorrectable sectors 0xC6

Helps GSS determine what to do with drive after evacuation

 

Stretched Clusters

Stretched Cluster Failure Scenarios

  • Extend the idea of fault domains from racks to sites
  • Witness component (tertiary site) – witness host
  • 5ms RTT (around 60 miles)
  • VM will have a preferred and secondary site
  • When component fails, starts rebuilding of preferred site

 

Stretched Cluster Local Failure Protection – new in vSAN 6.6

  • Redundancy against host failure and site failure
  • If site fails, vSAN maintains local redundancy in surviving site
  • No change in stretched cluster configuration steps
  • Optimised logic to minimise I/O traffic across sites
    • Local read, local resync
    • Single inter-site write for multiple replicas
  • RAID-1 between the sites, and then RAID-5 in the local sites

What happens during network partition or site failure?

  • HA Restart

Inter-site network disconnected (split brain)

  • HA Power-off

Witness Network Disconnected

  • Witness leaves cluster

VMs continue to operate normally. Very simple to redeploy a new one. Recommended host isolation response in a stretched cluster is power off

Witness Host Offline

  • Recover or redeploy witness host

New in 6.6 – change witness host

 

vSAN Backup, Replication and DR

Data Protection

  • vSphere APIs – Data Protection
  • Same as other datastore (VMFS, etc)
  • Verify support with backup vendor
  • Production and backup data on vSAN
    • Pros: Simple, rapid restore
    • Cons: Both copies lost if vSAN datastore is lost, can consume considerable capacity

 

Solutions …

  • Store backup data on another datastore
    • SAN or NAS
    • Another vSAN cluster
    • Local drives
  • Dell EMC Avamar and NetWorker
  • Veeam Backup and Replication
  • Cohesity
  • Rubrik
  • Others …

vSphere Replication included with Essentials Plus Kit and higher. With this you get per-VM RPOs as low as 5 minutes

 

Automated DR with Site Recovery Manager

  • HA with Stretched Cluster, Automated DR with SRM
  • SRM at the tertiary site

Useful session. 4 stars.

New book on VMware SRM now available

Good news, “Disaster Recovery Using VMware vSphere Replication and vCenter Site Recovery Manager – Second Edition” has just been released via Packt Publishing. It was written by Abhilash G B and I had the pleasure of serving as the technical reviewer. While SRM 6.5 has just been announced this is nonetheless a handy manual with some great guidance (and pictures!) on how to effectively use SRM with both array-based and vSphere Replication-based protection. There’s an ebook version available for purchase with a print copy also available for order.

6096en_5349_disasterrecoveryusingvmwarevspherereplicationandvcentersiterecoverymanagersecond