ESXi 4.1 network weirdness – why local TSM is really handy

I haven’t had a lot of time to find out what caused some weird behaviour in our lab recently, nor whether what I saw was expected or not. And unfortunately I don’t have screenshots. So you’ll just have to believe me. I’m following the issue up with our local VMware team this week, so hopefully I can provide a KB or something.

In our lab we have some ESXi 4.1 hosts attached to Cisco 3120 switches. Each host has a single, ether-channelled vSwitch, with portgroups and vmkernel ports for the Management Network and vMotion. For whatever reason, the network nerds in our team had to do some IOS firmware updates on the switch stack that the blades were connected to. We didn’t shut anything down, because we wanted to see what would happen.

What we saw was some really weird behaviour. 4 of the 8 hosts (one test data centre) had no issues with connectivity at all. In the other test data centre, 1 of the 4 hosts showed no signs of a problem. Another 2 hosts eventually “came good” after a few hours had elapsed. And one simply wouldn’t play ball. Logging in to the DCUI showed that the Management Network now had a VLAN ID associated with the vMotion network, and had also taken on the IP address of the vMotion network. Now why we have a routable vMotion network in the first place – I’m not so sure. But it _appears_ that the ESXi host had simply decided to go with it. We could connect to the host directly using the vSphere client connecting to the vMotion IP address. No matter how many times / reboots / etc I tried to change the IP via the DCUI, it wouldn’t change.

Not good. In order to get the host sorted out, I had to remove the vMotion portgroup, re-assign the correct IP address using some commands, and then re-create the vMotion portgroup. Here’s how you do it:

esxcfg-vmknic -d vMotion
esxcfg-vswitch -D vMotion vSwitch0

esxcfg-vmknic -a “Management Network” -i 192.168.0.31 -n 255.255.255.0

esxcfg-vswitch -v 84 -p “Management Network” vSwitch0

esxcfg-vmknic -a “vMotion” -i 192.168.1.31 -n 255.255.255.0

esxcfg-vswitch -v 86 -p “vMotion” vSwitch0

Then log in to vMA and run this command:

vicfg-vmknic -h labhost31.poc.com -E vMotion

And we’re back up and running. I hope to have a follow-up post when I’ve had a chance to talk it over with VMware.

vSphere 4.1 GA

VMware vSphere 4.1 is now available for download. Release notes can be found here. Enjoy!