While most of you were doing whatever it is you do to relax over the Easter long weekend, I was lucky enough to be cutting over a chunk of our environment with the help of SAN Copy. For the most part, things went well. The only major problem was the Solaris LDOM environment, but our very patient consultant sorted that out for us.
One issue I did have, however, was when I was cutting over RDM LUNs on a number of virtualised clusters. The problem was, basically, that after remapping the RDM on the first guest, I was unable to see the RDM files on the second guest. While some people in our environment believe it’s acceptable to run single-node clusters, I don’t.
It turns out that, and I can’t remember when exactly, the behaviour of vCenter changed to mask RDMs that are already presented to a guest. For those of you playing at home, we’re running the latest vCenter 2.5 (build 227637). So, I needed to add the following setting to the Advanced Settings in vCenter’s configuration. The setting is vpxd.filter.rdmFilterand it should be set to false. Also worthy of note is that this doesn’t seem to survive restarts of the vCenter service. But that’s probably because I’ve done something boneheaded.
Here’s what you need to do.
Then click on Add Row to add the desired settings and you’ll be able to add the RDMs to multiple guests.
Yesterday a colleague of mine was having some issues performing sVMotions on guests sitting in a development ESX 3.5 cluster. He kept getting an error along the lines of:
“IP address change for 10.x.x.x to 10.x.x.y not handled, SSL certificate verification is not enabled.”
They had changed the Service Console IP address of the host manually to perform some “secure” guest migrations previously (don’t ask me why – there’s always my way or the hard way), and basically the IP address of the host hadn’t been updated in the vxpa.cfg file. VMware has a 2-3 step process to reoslve the issue, which ultimately will require you to pull the host out of the cluster and re-add it to vCenter. It’s not a big deal, but it can be confusing when things seem to be working, but aren’t really. You can read more about it here.
Sometimes you, or The Exchange Guy, find yourself in the situation where you need to run ESXi on a desktop machine for “testing”. Sometimes, however, things don’t go quite as planned and you find yourself having to do some mucking about to get it working. There’s some good coverage of the CPUIDlimit workaround here. But here’s a rough outline of the options available:
During installation: in the bootloader (booting from CD-ROM), press TAB
go to the “vmkernel.tgz” and add nocheckCPUIDLimit
“mboot.c32 vmkernel.gz nocheckCPUIDLimit —”
During the normal boot process: press SHIFT O and enter the option nocheckCPUIDLimit
Of course, you’re probably a busy person and don’t want to have to remember to do that each time you reboot. So, post-installation, go to Advanced – VMkernel – Boot and modify the setting accordingly.
I’ve created a new page, imaginatively titled “Articles”, that has a number of articles I’ve done recently covering various simple operational or implementation-focused tasks. You may or may not find them useful. I hope this doesn’t become my personal technical documentation graveyard, although I have a feeling that a number of the documents will probably stay at version 0.01 until such time as the underlying technology no longer exists. Enjoy!
This article was meant to be called “Sound on ESX and killing VMs (again)”, but I decided to go with something a little more catchy. Recently I was fortunate enough to get a message from a colleague describing his descent into a “shame spiral” as a result of attempting to add sound capabilities to a guest running on ESX 3.5 Update 3. I’ll go into some of the reasons you would think that’s a good idea later, and why it isn’t a good idea, but suffice to say that things were a bit of a mess by the time I got the call. I’ve covered this before, but the following link provides some useful hints on killing off a VM that just won’t die. The following lines, taken from here are what caused the problem in the first place:
The end result of these guest shenanigans resulted in a broken VM, and when my colleague tried to create a new VM from the existing disks, they were still in use. I ended up using the vm-support method to kill the process. This is outlined here. I also learnt that ps -auwww will give you the number of columns you’ll need to make sense of the ps output if you want to use the ps command from the Service Console.
The following link provides info on sound in ESX. Now, you might think that you need to output sound in your ESX environment, particularly if you’re doing stuff with VDI and XP guests, or perhaps running monitoring software and wanting to make it go “bing”. But you’ll need physical sound cards in your ESX boxen, and I’m not entirely convinced that it will either work, or is really supported. While I admire people on the forums who put workarounds out there, I think this is a good example of YMMV. Well, if you still think this is a good idea, but don’t have a sound card, like these guys, you could try something like Virtual Audio Cable. Hell, people have even got it working with TVersity encoding but you’ll notice that they haven’t got any, like, sound, coming out of the boxes yet. Woohoo!
You know when it says in the release notes, and pretty much every forum on the internet, that doing sVMotion migrations with snapshots attached to a vmdk is bad? Turns out they were right, and you might just end up munting your vmdk file in the process. So you might just need this link to recreate the vmdk. You may find yourself in need of this process to commit the snapshot as well. Or, if you’re really lucky, you’ll find yourself with a vmsn file that references a missing vmdk file. Wow, how rad! To work around this, I renamed the vmsn to .old, ran another snapshot, and then committed the snapshots. I reiterate that I think snapshots are good when you’re in a tight spot, in the same way that having a baseball bat can be good when you’re attacked in your home. But if you just go around swinging randomly, something’s going to get broken. Bad analogy? Maybe, but I think you get what I’m saying here.
To recap, when using svmotion.pl with VIMA, here’s the syntax:
While everyone is talking about new VMwares, I’d like to focus on the mundane stuff. Creating a VMFS datastore on an ESX host is a relatively trivial activity, and something that you’ve probably done a few times before. But I noticed, the other day, some behaviour that I can only describe as “really silly”.
I needed to create a datastore on a host that only had local SCSI disks attached in a single RAID-1 container. I wanted to do this post-installation for reasons that I’ll discuss at another time. Here’s a screenshot from the Add Storage Wizard.
Notice the problem with the first option? Yep, you can blow away your root filesystem. In Australia, we would describe this situation as “being rooted”, but probably nor for the reasons you think.
What I haven’t had a chance to test yet, having had limited access to the lab lately, is whether the Wizard is actually “silly” enough to let you go through with it. I’ve seen running systems happily blow themselves away with a miscued “dd” command – so I’m going to assume yes. I hope to have a little time in the next few weeks to test this theory.
I’ve been nuts deep in a SAN migration project recently and promptly missed the announcement that VMware VirtualCenter 2.5 Update 4 is now available for download. I haven’t had time to put it through its paces yet, but noticed in the release notes that some plugins have been updated, some more useful things have been added to Virtual Machine monitoring, and this little nugget with esxcfg-mpath (a command dear to my heart) still isn’t fixed. But, hey, it’s still better than Sun’s CAM.
Some few weekends ago I did some failover testing for a client using 2 EMC CLARiiON CX4-120 arrays, MirrorView/Asynchronous over iSCSI and a 2-node ESX Cluster at each site. the primary goal of the exercise was to ensure that we could promote mirrors at the DR site if need be and run Virtual Machines off the replicas. Keep in mind that the client, at this stage isn’t running SRM, just MV and ESX. I’ve read many, many articles about how MirrorView could be an awesome addition to the DR story, and in the past this has rung true for my clients running Windows hosts. But VMware ESX isn’t Windows, funnily enough, and since the client hadn’t put any production workloads on the clusters yet, we decided to run it through its paces to see how it worked outside of a lab environment.
One thing to consider, when using layered applications like SnapView or MirrorView with the CLARiiON, is that the LUNs generated by these applications are treated, rightly so, as replicas by the ESX hosts. This makes sense, of course, as the secondary image in a MirrorView relationship is a block-copy replica of the source LUN. As a result of this, there are rules in play for VMFS LUNs regarding what volumes can be presented to what, and how they’ll be treated by the host. There are variations on the LVM settings that can be configured on the ESX node. These are outlined here. Duncan of Yellow Bricks fame also discusses them here. Both of these articles are well written and explain clearly why you would take the approach that you have and use the settings that you have with LVM. However, what neither article addresses, at least clearly enough for my dumb arse, is what to do when what you see and what you expect to see are different things.
In short, we wanted to set the hosts to “State 3 – EnableResignature=0, DisallowSnapshotLUN=0”, because the hosts at the DR site had never seen the original LUNs before, nor did we want to go through and resignature the datastores at the failover site and have to put up with datastore volume labels that looked unsightly. Here’s some pretty screenshots of what your Advanced – LVM settings might look like after you’ve done this.
But we wanted it to look like this:
However, when I set the LVM settings accordingly, admin-fractured the LUN, promoted the secondary and presented it to the failover ESX host, I was able to rescan and see the LUN, but was unable to see any data on the VMFS datastore. Cool. So we set the LVM settings to “State 2 – EnableResignature=1, (DisallowSnapshotLUN is not relevant)”, and were able to resignature the LUNs and see the data, register a virtual machine and boot okay. Okay, so why doesn’t State 3 give me the desired result? I still don’t know. But I do know that a call to friendly IE at the local EMC office tipped me off to using the VI Client connected directly to the failover ESX host, rather than VirtualCenter. Lo and behold, this worked fine, and we were able to present the promoted replica, see the data, and register and boot the VMs at the failover site. I’m speculating that it’s something very obvious that I’ve missed here, but I’m also of the opinion that this should be mentioned in some of those shiny whitepapers and knowledge books that EMC like to put out promoting their solution. If someone wants to correct me, feel free to wade in at any time.
As you’re no doubt aware – I’m usually behind the times. But I just noticed, within one day of it being released – that VMware ESX Server 3.5, VMware VirtualCenter 2.5 and VMware Consolidated Backup 1.1 are now available for download. I also noticed that a version of VMware Converter 3.0.2 Update 1 has been available for a week. That’s great. And what I like most is that I spent the weekend in Melbourne using slightly older versions of all of these products … The release notes for the various products are here and here. Enjoy kids!