Disclaimer: I recently attended VMworld 2017 – US. My flights were paid for by ActualTech Media, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Here are my rough notes from “SER1166BU – Housekeeping Strategies for Platform Services Controller-Expert Talk”, presented by Jishnu Surendran Thankamani and Agnes James, both of whom are GSS employees with VMware. You can grab a PDF copy of them here.
Know more about PSC
Infrastructure Services offered by PSC
- VMDir – internally developed LDAP service
- Single Sign-on (IDMD, STS, SSOAdmin, LookupService)
- VMware Certificate Authority
- Licensing
Certificates
VMware Endpoint Certificate Manager
- Each node has one Machine Endpoint Certificate
Solution User Certificates
- machine
- vsphere-webclient
- vpxd
- vpxd-extensions
Right Decisions at the Right Time
Topology Based Best Practices
Embedded PSC
- Expected to be simple topology with easy maintenance
- Availability management is a matter of protecting a single machine (vCenter HA)
External PSC
- Expected to be used with multiple vCenters involved
- Availability management based on load balancer options
- When more than one PSC is involved replication becomes a point of interest
- Maintain same build of PSCs
- Use sites to group PSCs in multiple HA groups – PSCs behind a load balancer
- Latency between PSCs – as low as possible
Configuration Maximums
- Maximum number of PSCs supported in replication – 8 (6.0), 10 (6.5)
- Maximum number of PSCs behind load balancer – 4
- Maximum vCenters in single SSO domain – 10 (6.0 and 6.5), 15 (6.5 U1)
- Group membership per user for best performance – 1015
Factors for Design Decisions
Area | Choices | Justification | Implication |
Deployment Topology | Embedded | Reduced Resource utilisation for Management, VCHA availability needed on PSC as well | VCs in Linked Mode is not a supported topology |
External | Multi-VC and Single Management access | More VMs to manage | |
SSO Domain | One | Share authentication and license data across components and regions / “disposable” PSC | |
More than one | Embedded PSCs / Replication requirements are not met | Separate availability / management practice | |
Replication Topology | Linear | No manual intervention. Agreements made in deployment order | SPoF possible in more than two PSC case |
Ring | Each PSC with two replication partners | CLI must be used | |
PSC HA | Standby PSC without load balancer | Load balancer management overhead is a constraint / manual failover acceptable | Manual re-pointing on PSC failure |
Two PSC behind a load balancer | High availability | Administrative overhead | |
vSphere HA | VM / Platform level failures |
More Options
- SSH Access – Disable / Enable
- Certificates – Custom / VMCA / VMCA as subordinate (Hybrid recommended)
- TLS Configurator
- Patching – Update using updaterepo.zip bundles / Full Product and VIMpatch ISO
- NTP – sync from ESXi / NTP server
References for Architectural Decisions
- VMware Validated Design
- vSphere Topology decision tree poster
- Topology upgrade planning tool
- VMware Digital Marketing white paper
Know What To Do, What Not To Do
Dos and Don’ts
Do
- Best practice and FAQ reviews
- Be aware of health monitoring options
- Backup and restore points before any change
- Know the complexity of the implementation
- Ensure minimum one PSC availability for vSphere domain and site
Don’t
- Unmanaged decommission procedure – delete the appliances directly –
- Snapshot revert and backup restore of single PSC when replication involved
- Using same vSphere domain name and Active Directory domain name
- Make replication agreement between PSC of different SSO domains
- PSC PNID change
Note that changing the PNID after deployment is not supported
Health Check Options and Maintenance – CLI
Service List
/usr/lib/vmware-vmafd/bin/dir-cli service list
Information About Nodes
/usr/lib/vmware-vmafd/bin/dir-cli nodes list
Replication quick status
/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showpartnerstatus -u Administrator -h localhost
Replication detailed status
/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showfederationstatus -u Administrator -h localhost
PSC used by vCenter
/usr/lib/vmware-vmafd/bin/vmafd-cli get-dc-name —server-name localhost
Managing Complexity of Implementation
- Know the site topology
- Service registration to Site Mapping
- Know the Replication agreements
- VC to PSC dependency
*Demo – Decommission
Safe Recovery
Backup Plan
- Image level backup and file level backup (vSphere 6.5)
- Snapshots before changes – temporary restore points
- Keep a copy of lstool.py list output for reference
Special consideration of restore when replication is involved – use powered off state snapshot of PSCs created together to revert changes.
Quick Recovery Options
- Repoint VC to available PSC at the same site
- Quick temporary PSC deployment
- Image based restore with two methods (6.0)
- psc_restore
- psc_restore with -ignore-sync
- File based backup and image based backup (6.5)
- /usr/bin/vcenter-restore
Very useful session. 4 stars.