VMware – VMworld 2017 – SER1166BU – Housekeeping Strategies for Platform Services Controller-Expert Talk

Disclaimer: I recently attended VMworld 2017 – US.  My flights were paid for by ActualTech Media, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my rough notes from “SER1166BU – Housekeeping Strategies for Platform Services Controller-Expert Talk”, presented by Jishnu Surendran Thankamani and Agnes James, both of whom are GSS employees with VMware. You can grab a PDF copy of them here.

 

Know more about PSC

Infrastructure Services offered by PSC

  • VMDir – internally developed LDAP service
  • Single Sign-on (IDMD, STS, SSOAdmin, LookupService)
  • VMware Certificate Authority
  • Licensing

 

Certificates

VMware Endpoint Certificate Manager

  • Each node has one Machine Endpoint Certificate

Solution User Certificates

  • machine
  • vsphere-webclient
  • vpxd
  • vpxd-extensions

 

Right Decisions at the Right Time

Topology Based Best Practices

Embedded PSC

  • Expected to be simple topology with easy maintenance
  • Availability management is a matter of protecting a single machine (vCenter HA)

External PSC

  • Expected to be used with multiple vCenters involved
  • Availability management based on load balancer options
  • When more than one PSC is involved replication becomes a point of interest
  • Maintain same build of PSCs
  • Use sites to group PSCs in multiple HA groups – PSCs behind a load balancer
  • Latency between PSCs – as low as possible

 

Configuration Maximums

  • Maximum number of PSCs supported in replication – 8 (6.0), 10 (6.5)
  • Maximum number of PSCs behind load balancer – 4
  • Maximum vCenters in single SSO domain – 10 (6.0 and 6.5), 15 (6.5 U1)
  • Group membership per user for best performance – 1015

 

Factors for Design Decisions

Area Choices Justification Implication
Deployment Topology Embedded Reduced Resource utilisation for Management, VCHA availability needed on PSC as well VCs in Linked Mode is not a supported topology
External Multi-VC and Single Management access More VMs to manage
SSO Domain One Share authentication and license data across components and regions / “disposable” PSC
More than one Embedded PSCs / Replication requirements are not met Separate availability / management practice
Replication Topology Linear No manual intervention. Agreements made in deployment order SPoF possible in more than two PSC case
Ring Each PSC with two replication partners CLI must be used
PSC HA Standby PSC without load balancer Load balancer management overhead is a constraint / manual failover acceptable Manual re-pointing on PSC failure
Two PSC behind a load balancer High availability Administrative overhead
vSphere HA VM / Platform level failures

 

More Options

  • SSH Access – Disable / Enable
  • Certificates – Custom / VMCA / VMCA as subordinate (Hybrid recommended)
  • TLS Configurator
  • Patching – Update using updaterepo.zip bundles / Full Product and VIMpatch ISO
  • NTP – sync from ESXi / NTP server

 

References for Architectural Decisions

 

Know What To Do, What Not To Do

Dos and Don’ts

Do

  • Best practice and FAQ reviews
  • Be aware of health monitoring options
  • Backup and restore points before any change
  • Know the complexity of the implementation
  • Ensure minimum one PSC availability for vSphere domain and site

Don’t

  • Unmanaged decommission procedure – delete the appliances directly –
  • Snapshot revert and backup restore of single PSC when replication involved
  • Using same vSphere domain name and Active Directory domain name
  • Make replication agreement between PSC of different SSO domains
  • PSC PNID change

Note that changing the PNID after deployment is not supported

 

Health Check Options and Maintenance – CLI

Service List

/usr/lib/vmware-vmafd/bin/dir-cli service list

Information About Nodes

/usr/lib/vmware-vmafd/bin/dir-cli nodes list

Replication quick status

/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showpartnerstatus -u Administrator -h localhost

Replication detailed status

/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showfederationstatus -u Administrator -h localhost

PSC used by vCenter

/usr/lib/vmware-vmafd/bin/vmafd-cli get-dc-name —server-name localhost

 

Managing Complexity of Implementation 

  • Know the site topology
  • Service registration to Site Mapping
  • Know the Replication agreements
  • VC to PSC dependency

*Demo – Decommission

 

Safe Recovery

Backup Plan

  • Image level backup and file level backup (vSphere 6.5)
  • Snapshots before changes – temporary restore points
  • Keep a copy of lstool.py list output for reference

Special consideration of restore when replication is involved – use powered off state snapshot of PSCs created together to revert changes.

 

Quick Recovery Options 

  • Repoint VC to available PSC at the same site
  • Quick temporary PSC deployment
  • Image based restore with two methods (6.0)
    • psc_restore
    • psc_restore with -ignore-sync
  • File based backup and image based backup (6.5)
    • /usr/bin/vcenter-restore

Very useful session. 4 stars.