I’ve been a bit behind on my VNX OE updates, and have only recently read docu59127_VNX-Operating-Environment-for-Block-05.33.000.5.102-and-for-File-188.8.131.52,-EMC-Unisphere-184.108.40.206.0096-Release-Notes covering VNX OE 5.33…102. Checking out the fixed problems, I noticed the following item.
The problem, you see, came to light some time ago when a few of our (and no doubt other) VNX2 customers started having disk failures on reasonably busy arrays. EMC have a KB on the topic on the support site – VNX2 slow disk rebuild speeds with high host I/O (000187088). To quote EMC “The code has been written so that the rebuild process is considered a lower priority than the Host IO. The rebuild of the new drive will take much longer if the workload from the hosts are high”. Which sort of makes sense, because host I/O is a pretty important thing. But, as a number of customers pointed out to EMC, there’s no point prioritising host I/O if you’re in jeopardy of having a data unavailable or data loss event because your private RAID groups have taken so long to complete.
Previously, the solution was to “[r]educe the amount of host I/O if possible to increase the speed of the drive rebuild”. Now, however, updated code comes to the rescue. So, if you’re running a VNX2, upgrade to the latest OE if you haven’t already.
We’re upgrading our CX4-960s to FLARE 30 tonight and, after a slew (a slew being aproximately equal to 4) of disk failures and replacements over the last few weeks, we’re still waiting for one of the SATA-II disks to rebuild. Fortunately, EMC has a handy knowledge base article entitled “What is a CLARiiON proactive hot spare?”, which talks about how to go about using proactive hot spares on the CLARiiON. You can find it on Powerlink as emc150779. The side benefit of this article is that it provides details on how to query the rebuild status of hot spare disks and the RAID groups they’re sparing for.
Using naviseccli, you can get the rebuild state of the disk thusly:
In this case, I wanted to query the status of the disk in Bus 3 / Enclosure 4 / Disk 3. As you can see from the above example – LUN 250 is at 32%. You can also see the status of the rebuild by looking at the properties of the LUN that is being equalized.
So we should be done in time for the code upgrade. I’ll let you know how that works out for us.
One of the nice things I’ve seen lately on the CX4 (running FLARE R28) is the automatic creation of Hot spares. Let me explain a little what I mean. In the olden days, to assign a global hot spare on a CLARiiON, you needed to create a RAID Group with one disk in it (usually with a high number like 1024 or 2048), then bind a LUN on the RG as type Hot Spare, and then it would be assigned as a Global Hot Spare for the array. Nowadays, there’s one less click required. As soon as I create a RAID Group of typ “Hot Spare”, the array grabs and tells me that LUN XXX has been assigned as a hot spare. Okay, so it might save me two minutes a year, but I still thought it was nice that someone’s been working away feverishly on the code to make this work.
To wit, I read the following in the latest Release Notes for FLARE Release 28: “Code has been added so that when a RAID group of type hot spare is created, it will then try to create a hot spare LUN on that RAID group (assuming the creation was successful). If the creation of the RAID group fails, then the dialog will not continue with binding the LUN. If binding the LUN fails, but the RAID group creation is successful, then you will be presented with a message stating that the RAID group has been created but the attempt to create a hot spare LUN has failed. You are instructed to try and use the Create LUN dialog to manually create it. The hot spare name format will be the following: “Hot Spare LUN XXXX” XXXX will be the last (highest) available LUN ID from the pool of available LUN IDs.
Fixed in version: 220.127.116.11.19
Exists in versions: All”