HP – Clearing a terminal session on the c7000 OA

I was recently upgrading some firmware on a Cisco 9124e in the back of a Hewlett-Packard c7000 blade chassis. At one stage I lost IP connectivity to the terminal session I was running via the OA.

chassis1-oa1> connect interconnect 4
NOTICE: This pass-thru connection to the integrated I/O console is provided for convenience and does not supply additional access control.  For security reasons, use the password features of the integrated switch.

Error: Connection to integrated switch 4 in use by another user (pid 12267). Press [Enter] or [Return] to continue:

I thought it wouldn’t be a problem to reconnect. It wasn’t quite that simple though, so here’s what I had to do. 

chassis1-oa1> clear interconnect session 4
Session terminated.
chassis1-oa1> connect interconnect 4
NOTICE: This pass-thru connection to the integrated I/O console is provided for convenience and does not supply additional access control.  For security reasons, use the password features of the integrated switch.

Connecting to integrated switch 4 at 9600,N81... Escape character is '<Ctrl>_' (Control + Shift + Underscore)

And then I’m back in my terminal session and ready to continue the firmware upgrade. Hooray.

Press [Enter] to display the switch console: Do you want to continue with the installation (y/n)?  [n] y

EMC – Boot from SAN MSCS Cluster configuration

Disclaimer: I haven’t done a Windows-based CLARiiON host-attach activity in about 4 or 5 years. And it’s been a bit longer than that since I did boot from SAN configurations. So you can make of this what you will. We’ve been building a Windows 2008 R2 Boot from SAN cluster lately. We got to the point where we were ready to add the 60+ LUNs that the cluster would use. The initial configuration had 3 hosts in 3 storage groups with their respective boot LUNs. I had initially thought that I’d just create another Storage Group for the cluster’s volumes and add the 3 hosts to that. All the time I was trying to remember the rule about multiple hosts or multiple LUNs in a Storage Group. And of course I remembered incorrectly.

To get around this issue, I had to add each LUN (there are about 67 of them) to each Storage Group for the cluster nodes. And ensure that they had consistent host IDs across the Storage Groups. Which has worked fine, but isn’t, as Unisphere points out, recommended. There’s also an issue with the number of LUNs I can put in a Consistency Group (32) – but that’s a story for another time.

EMC PowerPath 5.4 SPx, ESXi 4.1 and HP CIM offline bundle

If you find yourself having problems registering EMC PowerPath 5.4.1 (unsupported) or 5.4.2 (supported) on your HP blades running ESXi 4.1, consider uninstalling the HP offline bundle hpq-esxi4.luX-bundle-1.0. We did, and PowerPath was magically able to talk to the ELM server and retrieve served licenses. I have no idea why CIM-based tools would have this effect, but there you go. Apparently a fix is on the way from HP, but I haven’t verified that yet. I’ll update as soon as I know more.

Cisco 9124(e) firmware downgrade

Sometimes, for any number of reasons, you’ll find yourself wanting to downgrade the firmware on your Cisco edge devices to match what you have running in the core. Fortunately, at least for the 9100-series switches, this is basically the same as upgrading the firmware. I’ve included the commands to run here, and also the full output of the process. For the director-class switches, there are a few more things to do, such as clearing out the space on the standby supervisor as well as the active sup card. I’ll try and post something 9500-series specific in the next few weeks.

In short, do this (assuming you’re loading version 3.3(4a) of the code):

copy running-config startup-config

copy startup-config tftp://192.168.101.9/startup-config_FOSLAB5A08_28072010

show module

copy tftp://192.168.101.9/m9100-s2ek9-mz.3.3.4a.bin bootflash:m9100-s2ek9-mz.3.3.4a.bin

copy tftp://192.168.101.9/m9100-s2ek9-kickstart-mz.3.3.4a.bin bootflash:m9100-s2ek9-kickstart-mz.3.3.4a.bin

dir bootflash:

show version image bootflash:m9100-s2ek9-mz.3.3.4a.bin

show incompatibility system m9100-s2ek9-mz.3.3.4a.bin

install all system bootflash:m9100-s2ek9-mz.3.3.4a.bin kickstart bootflash:m9100-s2ek9-kickstart-mz.3.3.4a.bin

y

show module

show version

You can also see the full output here. Note that this process works equally well for HP’s 9124e switches (the type you find in the back of c7000 blade chassis for instance), although you should be downloading the firmware from HP’s site, not Cisco’s.

HP MSA array failover

I’ve blogged briefly about the MSA array before, thinking it was a reasonable piece of kit for the price, assuming your expectations were low. But I had a problem recently with a particular MSA2012fc and don’t know whether I’ve got it right or whether I’m missing something fundamental.

I had it setup in a DAS configuration. Interconnect was turned on, and loop was the default topology in place.  This worked fine for the 2 RHEL boxes attached to the host. Later I connected the array and 2 hosts to 2 Brocade 300 Switches with discrete fabrics. I changed the topology to point-to-point, and changed to straight-through from interconnect. This seemed like a reasonable thing to do based on my understanding of the admin, user and reference guides.

In a switched topology / straight-through / point-to-point connection, LUNs owned by a vdisk on controller A are only presented via paths from controller A. If controller A fails however, I don’t believe the vdisk fails over. If, however, a cable or switch fails, you’re covered, because each controller is cabled to each fabric. I believe this is why I saw two paths to everything – these being the fibre ports of the controller owning the vdisk that owns the LUN.

In a direct-attach / interconnect / loop setup, controllers mirror their peer’s LUNs via the higher ports, so Controller A presents paths to controller B’s LUNs via A1. In this setup, you could sustain a controller failure, as a vdisk would be presented via the peer.The problem with this, however, is that interconnect is never used in a switched environment. I don’t believe changing the ports to loop will help, nor would removing the switches.

Have I totally missed the point here? Has anyone else seen this? Was there a workaround? Or something fixed in later revs of the code? It seems strange that HP would advertise this as an active-active array, but only for DAS configs.

HP MSA2012fc and linux

Sometimes, I can be a real muppet. I had to install a new MSA2012fc array at a site yesterday, with a few extra MSA2000 expansion shelves. The FC switches hadn’t arrived yet, so we did something with direct-attach. The hosts were x64 RHEL 5 U2 with dual-port Qlogic HBAs. Reasonably simple stuff, my main concern being that the LUN design discussion with the DBAs would never end. So I initialized the array, setup some IP addresses, security and vdisks, mapped the host ports and created some aliases, and presented some test LUNs to the linux hosts to confirm we could see the volumes. I was about to bail when the customer said “so what if I pull this cable”. “Well, it should failover”. But it didn’t. Some teeth grinding and fiddling yielded little result, and we decided to revisit the issue today.

Today went a lot better. For a start, I used the Qlogic modprobe.conf settings specified in the installation document for “Device Mapper Multipath Enablement Kit for HP StorageWorks Disk Arrays v4.1.0”. This seems like a reasonable thing, as I had installed version 4.1.0 on the hosts. But yesterday, like some kind of idiot, I’d being using the modprobe settings from “Installation and Reference Guide Device Mapper Multipath Enablement Kit for HP StorageWorks Disk Arrays Version 4.0.0”. This was the document I found buried somewhere in HP’s website. Not the document included with the tarball. Notice the critical difference yet? I’ll elaborate:

For version 4.0.0, you need to set the following options in modprobe.conf for Qlogic HBAs:

options qla2xxx qlport_down_retry=10 ql2xfailover=0

For version 4.1.0, the story changes slightly:

options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=10 ql2xloginretrycount=30
ql2xfailover=0 ql2xlbType=1 ql2xautorestore=0xa0 ConfigRequired=0

Look different? Uh-huh. Different enough that, even though I had enabled the MSA’s Host Port Interconnects as mentioned in the Reference Guide, I wasn’t able to see volumes presented to the host via a different port. So what I should have been seeing was 2 paths to each volume, when I was only seeing one. It’s the simple errors that lead to hours wasted.

Incidentally, the CLI is useful if you want to change the default IP addresses from 10.0.0.2 and 10.0.0.3 to something more sensible. I recommend using the provided serial cable, and issue the following commands:

# show network-parameters

Network Parameters Controller A
——————————-
IP Address     : 10.0.0.2
Gateway        : 10.0.0.1
Subnet Mask    : 255.255.255.0
MAC Address    : 00:C0:FF:D5:FD:4E
Addressing Mode: DHCP

Network Parameters Controller B
——————————-
IP Address     : 10.0.0.3
Gateway        : 10.0.0.1
Subnet Mask    : 255.255.255.0
MAC Address    : 00:C0:FF:D7:02:52
Addressing Mode: DHCP

You can then use set network-parameters to set the IP addresses for each controller as follows:

# set network-parameters ip 192.168.2.50 netmask 255.255.255.0 gateway 192.168.2.254 controller a
Success: Network parameters have been changed
# set network-parameters ip
192.168.2.51 netmask 255.255.255.0 gateway 192.168.2.254 controller b
Success: Network parameters have been changed
#

You can then log in and use HP’s Storage Management Utility (SMU). This is a pretty intuitive interface and a lot easier to navigate than Sun’s CAM.