Some few weekends ago I did some failover testing for a client using 2 EMC CLARiiON CX4-120 arrays, MirrorView/Asynchronous over iSCSI and a 2-node ESX Cluster at each site. the primary goal of the exercise was to ensure that we could promote mirrors at the DR site if need be and run Virtual Machines off the replicas. Keep in mind that the client, at this stage isn’t running SRM, just MV and ESX. I’ve read many, many articles about how MirrorView could be an awesome addition to the DR story, and in the past this has rung true for my clients running Windows hosts. But VMware ESX isn’t Windows, funnily enough, and since the client hadn’t put any production workloads on the clusters yet, we decided to run it through its paces to see how it worked outside of a lab environment.
One thing to consider, when using layered applications like SnapView or MirrorView with the CLARiiON, is that the LUNs generated by these applications are treated, rightly so, as replicas by the ESX hosts. This makes sense, of course, as the secondary image in a MirrorView relationship is a block-copy replica of the source LUN. As a result of this, there are rules in play for VMFS LUNs regarding what volumes can be presented to what, and how they’ll be treated by the host. There are variations on the LVM settings that can be configured on the ESX node. These are outlined here. Duncan of Yellow Bricks fame also discusses them here. Both of these articles are well written and explain clearly why you would take the approach that you have and use the settings that you have with LVM. However, what neither article addresses, at least clearly enough for my dumb arse, is what to do when what you see and what you expect to see are different things.
In short, we wanted to set the hosts to “State 3 – EnableResignature=0, DisallowSnapshotLUN=0”, because the hosts at the DR site had never seen the original LUNs before, nor did we want to go through and resignature the datastores at the failover site and have to put up with datastore volume labels that looked unsightly. Here’s some pretty screenshots of what your Advanced – LVM settings might look like after you’ve done this.
But we wanted it to look like this:
However, when I set the LVM settings accordingly, admin-fractured the LUN, promoted the secondary and presented it to the failover ESX host, I was able to rescan and see the LUN, but was unable to see any data on the VMFS datastore. Cool. So we set the LVM settings to “State 2 – EnableResignature=1, (DisallowSnapshotLUN is not relevant)”, and were able to resignature the LUNs and see the data, register a virtual machine and boot okay. Okay, so why doesn’t State 3 give me the desired result? I still don’t know. But I do know that a call to friendly IE at the local EMC office tipped me off to using the VI Client connected directly to the failover ESX host, rather than VirtualCenter. Lo and behold, this worked fine, and we were able to present the promoted replica, see the data, and register and boot the VMs at the failover site. I’m speculating that it’s something very obvious that I’ve missed here, but I’m also of the opinion that this should be mentioned in some of those shiny whitepapers and knowledge books that EMC like to put out promoting their solution. If someone wants to correct me, feel free to wade in at any time.