VMware High Availability – How you can help …

In my job I often have the opportunity to work with some very smart people. And sometimes I don’t. I was helping recently with a VMware HA installation that wasn’t working well at all. Or rather, it was working, but only on one node. I’ve dealt with this site in the past and it’s been a nightmare to get straight answers when it came to various infrastructure configuration issues like “What’s your default gateway?”, “How would you like your VM network setup?”, “Do you know the security code to turn off the building alarm?”. So, we’d added some HBAs to resolve another issue and had problem with one of the nodes’ HA configurations.

When I do HA setups the conversation goes something like

“Do you have DNS working?”

“Yep.”

“Great! Forward and reverse lookups work then?”

“Uh-huh.”

“Great, so if I do nslookup on the shortname, fqdn and ip address it will work on each node.”

“Should do.”

But it didn’t. By this time I’m usually cranky that someone has lied to me. I don’t care that they don’t know any better. So I fix up their DNS – which should give anyone who knows me cause for alarm – that I’m working on Windows-based DNS servers – and HA still doesn’t work. But it should at this stage. So now I’m crankier, because what normally works still won’t. If I log in to AAM I can see both nodes, but I can’t log in to AAM from the broken node. Hmmm, sounds networky. So I poked around with their vSwitch config, which I notice with some discomfort is different between nodes, and notice that their SC vSwitches are set to “Route based on IP hash”. VMware have a most excellent document on Virtual Networking here. It gives a reasonable overview of what’s going on and why you’d have NIC Teaming on your vSwitches setup in different ways. There’s a fair bit of discussion on the forums regarding what settings work best. I’ve also found during the course of various engagements that different settings work well with different physical switches depending on whether they’re running EtherChannel, LACP or whatever. I wish I could give you some more insight but my switching knowledge is more fibre-channel than ethernet-based.

So, I disabled and re-enabled HA once I’d changed the vSwitch NIC Teaming settings and were good to go. For all the good you think that HA actually does. So here’s 3 things to remember:

1. Don’t lie to me about your DNS.

2. Don’t make me fix you DNS (especially if it’s Windows-based).

3. Make sure your NIC Teaming is setup with non-defaults _only_ if you need to.

One Comment

Comments are closed.