Jul 2

From agility to fragility in a few simple steps

Category: ESX, VMware

This article was meant to be called “Sound on ESX and killing VMs (again)”, but I decided to go with something a little more catchy. Recently I was fortunate enough to get a message from a colleague describing his descent into a “shame spiral” as a result of attempting to add sound capabilities to a guest running on ESX 3.5 Update 3. I’ll go into some of the reasons you would think that’s a good idea later, and why it isn’t a good idea, but suffice to say that things were a bit of a mess by the time I got the call. I’ve covered this before, but the following link provides some useful hints on killing off a VM that just won’t die. The following lines, taken from here are what caused the problem in the first place:

sound.present = “TRUE”
sound.virtualDev = “es1371″
sound.fileName = “-1″
sound.autodetect = “TRUE”
sound.startConnected = “TRUE”

The end result of these guest shenanigans resulted in a broken VM, and when my colleague tried to create a new VM from the existing disks, they were still in use. I ended up using the vm-support method to kill the process. This is outlined here. I also learnt that ps -auwww will give you the number of columns you’ll need to make sense of the ps output if you want to use the ps command from the Service Console.

The following link provides info on sound in ESX. Now, you might think that you need to output sound in your ESX environment, particularly if you’re doing stuff with VDI and XP guests, or perhaps running monitoring software and wanting to make it go “bing”. But you’ll need physical sound cards in your ESX boxen, and I’m not entirely convinced that it will either work, or is really supported. While I admire people on the forums who put workarounds out there, I think this is a good example of YMMV. Well, if you still think this is a good idea, but don’t have a sound card, like these guys, you could try something like Virtual Audio Cable. Hell, people have even got it working with TVersity encoding but you’ll notice that they haven’t got any, like, sound, coming out of the boxes yet. Woohoo!

No comments

Jul 1

sVMotion with snapshot bad

Category: VMware

You know when it says in the release notes, and pretty much every forum on the internet, that doing sVMotion migrations with snapshots attached to a vmdk is bad? Turns out they were right, and you might just end up munting your vmdk file in the process. So you might just need this link to recreate the vmdk. You may find yourself in need of this process to commit the snapshot as well. Or, if you’re really lucky, you’ll find yourself with a vmsn file that references a missing vmdk file. Wow, how rad! To work around this, I renamed the vmsn to .old, ran another snapshot, and then committed the snapshots. I reiterate that I think snapshots are good when you’re in a tight spot, in the same way that having a baseball bat can be good when you’re attacked in your home. But if you just go around swinging randomly, something’s going to get broken. Bad analogy? Maybe, but I think you get what I’m saying here.

To recap, when using svmotion.pl with VIMA, here’s the syntax:

svmotion.pl --datacenter=network.internal --url=https://virtualcenter.network.internal/sdk --username=vcadmin --vm="[VMFS_02] host01/host01.vmx:VMFS_01"

Of course, my preferred method is here:

svmotion --interactive

Enjoy!

2 comments

Jun 3

OT: My absence

To my three loyal readers, I must apologise for the relative paucity of blog posts recently. I’ve been consulting in a mid-large government department lately and haven’t done a lot of work that lends itself to this blog. Instead, I’ve been doing lots of pictures and lots of typing, and developing a plan of attack for their storage and data protection environment. It’s been a challenge, in so far as they have a bad habit of throwing storage at problems before checking if they’re really problems. To wit, the CX3-40f I configured last year with 5 DAEs is now fully populated with all 16 DAEs. There’s also a CX700 still doing its thing (almost fully populated), and a few other CLARiiONs performing other duties. I’ve also seen a few bizarre things happen there too, the strangest of which was when one of the sys admins uninstalled the e-mail archive program from the servers. This, in turn, deleted some C-CLIPs from the Centera, as they had no fixed policy on retention in place. Who’d have thought the Centera Backup and Recovery Module (CBRM) for EMC NetWorker actually worked? I could go on, but I think it’s best if I don’t.

So, my three month stint is almost up, but it looks like it will be extended another three months, and possibly extended again after that. So the blog posts may still be few and far between for the next little while, although I am hoping to start work on vSphere 4 shortly, and will no doubt have some stuff to write about Brocade / Cisco interop and how to make and break Cisco 9513 Directors. So, er, thanks for reading …

No comments

May 22

VMware vSphere 4 GA

Category: Converter, ESX, VMware, vCenter

The VMware thing that all the blogging kids are talking about is now available for download from the usual place. Updated documentation is here. The release notes are here. When I have time to scratch, I’ll run it through its GA paces.

No comments

Apr 15

Sometimes you just say “Oh, wow” …

Category: Computers, Humour

This is one of those times: Wall-E case mod.

No comments

Apr 10

Creating a VMFS datastore

Category: ESX, VMware, vCenter

While everyone is talking about new VMwares, I’d like to focus on the mundane stuff. Creating a VMFS datastore on an ESX host is a relatively trivial activity, and something that you’ve probably done a few times before. But I noticed, the other day, some behaviour that I can only describe as “really silly”.

I needed to create a datastore on a host that only had local SCSI disks attached in a single RAID-1 container. I wanted to do this post-installation for reasons that I’ll discuss at another time. Here’s a screenshot from the Add Storage Wizard.

damn_vmwares

Notice the problem with the first option? Yep, you can blow away your root filesystem. In Australia, we would describe this situation as “being rooted”, but probably nor for the reasons you think.

What I haven’t had a chance to test yet, having had limited access to the lab lately, is whether the Wizard is actually “silly” enough to let you go through with it. I’ve seen running systems happily blow themselves away with a miscued “dd” command – so I’m going to assume yes. I hope to have a little time in the next few weeks to test this theory.

No comments

Apr 9

Make the LUN go away, Daddy!

So sometimes you create a Reserved LUN Pool (RLP) for an EMC MirrorView/Asynchronous deployment and then realise that it’s not really what you wanted to do. So what if you want to unbind the LUNs in the RLP but the storage system gives you an error along the lines of “Used by another feature of the storage system” error appears and the LUN cannot be unbound. Yep, that can be a problem. One way around this problem is to issue the following command:

naviseccli -h <SP_IP_address> mirror -async -setfeature -off -lun <LUN #>

You should then be unable to unbind the LUN and get your stuff sorted. This is covered in emc114414 on powerlink.

No comments

Apr 8

HP MSA array failover

Category: HP, Storage

I’ve blogged briefly about the MSA array before, thinking it was a reasonable piece of kit for the price, assuming your expectations were low. But I had a problem recently with a particular MSA2012fc and don’t know whether I’ve got it right or whether I’m missing something fundamental.

I had it setup in a DAS configuration. Interconnect was turned on, and loop was the default topology in place.  This worked fine for the 2 RHEL boxes attached to the host. Later I connected the array and 2 hosts to 2 Brocade 300 Switches with discrete fabrics. I changed the topology to point-to-point, and changed to straight-through from interconnect. This seemed like a reasonable thing to do based on my understanding of the admin, user and reference guides.

In a switched topology / straight-through / point-to-point connection, LUNs owned by a vdisk on controller A are only presented via paths from controller A. If controller A fails however, I don’t believe the vdisk fails over. If, however, a cable or switch fails, you’re covered, because each controller is cabled to each fabric. I believe this is why I saw two paths to everything – these being the fibre ports of the controller owning the vdisk that owns the LUN.

In a direct-attach / interconnect / loop setup, controllers mirror their peer’s LUNs via the higher ports, so Controller A presents paths to controller B’s LUNs via A1. In this setup, you could sustain a controller failure, as a vdisk would be presented via the peer.The problem with this, however, is that interconnect is never used in a switched environment. I don’t believe changing the ports to loop will help, nor would removing the switches.

Have I totally missed the point here? Has anyone else seen this? Was there a workaround? Or something fixed in later revs of the code? It seems strange that HP would advertise this as an active-active array, but only for DAS configs.

No comments

Mar 12

RAID Group defragmentation halted

It only happened last month, but it feels like decades ago. As part of a data migration project I’ve been working on, I needed to defragment a SATA RAID Group on a CX3-20. Knowing it was going to take a while, I went and did some other things. When I cam back to it 24 hours later, however, it was still sitting on 0%. Hmmm, that seems like it’s taking a while. So I started digging around and noticed that the SP event logs mentioned something about “Expansion process halted” or some such. Okay, cool, so why has it halted? And why isn’t there a button marked “Resume”? The EMC KB article emc178703 – “LUN expansion halted after defragmentation operation started” gives some indication that one of the LUNs may have trespassed after the expansion process had started, and this was why the process had halted. The solution was to trespass the LUN back to its original owner. However, according to the article, this was a scary move, and you need to engage engineering, or marketing, or someone I can’t remember, to assist in identifying the correct LUN. Well, while that was an option, it wasn’t terribly appealing, so I decided to trespass the LUNs myself (I’m much more cautious when it comes to my own data).

So I attempt to migrate one of the LUNs in the RAID Group back to its owner. No dice. Some error along the lines of “You are not allowed to transfer LUN”. Sorry for the vagueness, but my notes were quite sketchy after working on this for a few days over the weekend and going to sleep dreaming of Incremental SAN Copy setups. But I digress. MirrorView/Synchronous was active on some of the LUNs as they were secondary images for some Remote Mirrors we had in play. Cool, so how do I fix this? I realised that the source LUN was owned by SP A, and the secondary image was owned by SP B. So MirrorView, rightly so, had trespassed the secondary to be on the same SP as the source. By admin-fracturing the mirror, changing the default source LUN owner, trespassing the source and re-synchronizing, I was able to get MirrorView to automatically trespass the destination LUN as well. Wonder of all wonders, the defragmentation continued after that, and finished a lot sooner than I’d expected. Although really, it would have been good to know that this could be a problem before it became a problem.

1 comment

Feb 26

VMware VirtualCenter 2.5 Update 4

Category: VMware

I’ve been nuts deep in a SAN migration project recently and promptly missed the announcement that VMware VirtualCenter 2.5 Update 4 is now available for download. I haven’t had time to put it through its paces yet, but noticed in the release notes that some plugins have been updated, some more useful things have been added to Virtual Machine monitoring, and this little nugget with esxcfg-mpath (a command dear to my heart) still isn’t fixed. But, hey, it’s still better than Sun’s CAM.

Comments are off for this post

Next Page »