EMC – VNX – Slow Disk Rebuild Times

I’ve been a bit behind on my VNX OE updates, and have only recently read docu59127_VNX-Operating-Environment-for-Block-05.33.000.5.102-and-for-File-8.1.6.101,-EMC-Unisphere-1.3.3.1.0096-Release-Notes covering VNX OE 5.33…102. Checking out the fixed problems, I noticed the following item.

VNX_OE_RN

The problem, you see, came to light some time ago when a few of our (and no doubt other) VNX2 customers started having disk failures on reasonably busy arrays. EMC have a KB on the topic on the support site – VNX2 slow disk rebuild speeds with high host I/O (000187088). To quote EMC “The code has been written so that the rebuild process is considered a lower priority than the Host IO. The rebuild of the new drive will take much longer if the workload from the hosts are high”. Which sort of makes sense, because host I/O is a pretty important thing. But, as a number of customers pointed out to EMC, there’s no point prioritising host I/O if you’re in jeopardy of having a data unavailable or data loss event because your private RAID groups have taken so long to complete.

Previously, the solution was to “[r]educe the amount of host I/O if possible to increase the speed of the drive rebuild”. Now, however, updated code comes to the rescue. So, if you’re running a VNX2, upgrade to the latest OE if you haven’t already.

 

 

QNAP – Add swap to your NAS for large volume fsck activities

That’s right, another heading from the Department of not terribly catchy blog article titles. I’ve been having a mighty terrible time with one of my QNAP arrays lately. After updating to 4.1.2, I’ve been getting some weird symptoms. For example, every time the NAS reboots, the filesystem is marked as unclean. Worse, it mounts as read-only from time to time. And it seems generally flaky. So I’ve spent the last week trying to evacuate the data with the thought that maybe I can re-initialize it and clear out some of the nasty stuff that’s built up over the last 5 years. Incidentally, while we all like to moan about how slow SATA disks are, try moving a few TB via a USB2 interface. The eSATA seems positively snappy after that.

Of course, QNAP released version 4.1.3 of their platform recently, and a lot of the symptoms I’ve been experiencing have stopped occurring. I’m going to continue down this path though, as I hadn’t experienced these problems on my other QNAP, and just don’t have a good feeling about the state of the filesystem. And you thought that I would be all analytical about it, didn’t you?

In any case, I’ve been running e2fsck on the filesytem fairly frequently, particularly when it goes read-only and I have to stop the services, unmount and remount the volume.

[/] # cd /share/MD0_DATA/
[/share/MD0_DATA] # cd Qmultimedia/    
[/share/MD0_DATA/Qmultimedia] # mkdir temp         
mkdir: Cannot create directory `temp': Read-only file system
[/share/MD0_DATA/Qmultimedia] # cd /
[/] # /etc/init.d/services.sh stop
Stop qpkg service: chmod: /share/MD0_DATA/.qpkg: Read-only file system
Shutting down Download Station: OK
Disable QUSBCam ... 
Shutting down SlimServer... 
Error: Cannot stop, SqueezeboxServer is not running.
WARNING: rc.ssods ERROR: script /opt/ssods4/etc/init.d/K20slimserver failed.
Stopping thttpd-ssods .. OK.
rm: cannot remove `/opt/ssods4/var/run/thttpd-ssods.pid': Read-only file system
WARNING: rc.ssods ERROR: script /opt/ssods4/etc/init.d/K21thttpd-ssods failed.
Shutting down QiTunesAir services: Done
Disable Optware/ipkg
.
Stop service: cloud3p.sh vpn_openvpn.sh vpn_pptp.sh ldap_server.sh antivirus.sh iso_mount.sh qbox.sh qsyncman.sh rsyslog.sh snmp lunportman.sh iscsitrgt.sh twonkymedia.sh init_iTune.sh ImRd.sh crond.sh nvrd.sh StartMediaService.sh bt_scheduler.sh btd.sh mysqld.sh recycled.sh Qthttpd.sh atalk.sh nfs ftp.sh smb.sh versiond.sh .
[/] # umount /dev/md0

 

So then I run e2fsck to check the filesystem. But on a large volume (in this case 8 and a bit TB), it uses a lot of RAM. And invariably runs out of swap space.

[/] # e2fsck /dev/md0
e2fsck 1.41.4 (27-Jan-2009)
/dev/md0: recovering journal
/dev/md0 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Error allocating block bitmap (4): Memory allocation failed
e2fsck: aborted

 

So here’s what I did to enable some additional swap on a USB stick (courtesy of a QNAP forum post from RottUlf).

Insert a USB stick with more than 3GB of space. Create a swap file on it.

[/] # dd if=/dev/zero of=/share/external/sdi1/myswapfile bs=1M count=3072

Set it as a swap file.

[/] # mkswap /share/external/sdi1/myswapfile

Enable it as swap for the system.

[/] # swapon /share/external/sdi1/myswapfile

Check it.

[/] # cat /proc/swaps
Filename Type Size Used Priority
/dev/md8 partition 530040 8216 -1
/share/external/sdi1/myswapfile file 3145720 12560 -2

You should then be able to run e2fsck. Note that the example I linked to used e2fsck_64, but this isn’t available on the TS639 Pro II. Once you’ve fixed your filesystem issues, you’ll want to disable the swap file on the stick, remount the volume and restart your services.

[/] # swapoff /share/external/sdi1/myswapfile
[/] # mount /dev/md0
mount: can't find /dev/md0 in /etc/fstab or /etc/mtab

Oh no …

[/] # mount /dev/md0 /share/MD0_DATA/

Yeah, I don’t know what’s going on there either. I’ll report back in a while when I’ve wiped it and started again.

 

 

QNAP – Increase RAID rebuild times with mdadm

I recently upgraded some disks in my TS-412 NAS and it was taking some time. I vaguely recalled playing with min and max settings on the TS-639. Here’s a link to the QNAP forums on how to do it. The key is the min setting, and, as explained in the article, it really depends on how much you want to clobber the CPU. Keep in mind, also, that you can only do so much with a 3+1 RAID 5 configuration. I had my max set to 200000, and the min was set to 1000. As a result I was getting about 20MBs, and each disk was taking a little less than 24 hours to rebuild. I bumped up the min setting to 50000, and it’s now rebuilding at about 40MBs. The CPU is hanging at around 100%, but the NAS isn’t used that frequently.

To check your settings, use the following commands:

cat /proc/sys/dev/raid/speed_limit_max
cat /proc/sys/dev/raid/speed_limit_min

To increase the min setting, issue the following command:

echo 50000 >/proc/sys/dev/raid/speed_limit_min

And you’ll notice that, depending on the combination of disks, CPU and RAID configuration, your rebuild will go a wee bit faster than before.

QNAP – How to repair RAID brokenness – Redux

I did a post a little while ago (you can see it here) that covered using mdadm to repair a munted RAID config on a QNAP NAS. So I popped another disk recently, and took the opportunity to get some proper output. Ideally you’ll want to use the web interface on the QNAP to do this type of thing but sometimes it no worky. So here you go.

Stop everything on the box.

[~] # /etc/init.d/services.sh stop
Stop service: recycled.sh mysqld.sh atalk.sh ftp.sh bt_scheduler.sh btd.sh ImRd.sh init_iTune.sh twonkymedia.sh Qthttpd.sh crond.sh nfs smb.sh lunportman.sh iscsitrgt.sh nvrd.sh snmp rsyslog.sh qsyncman.sh iso_mount.sh antivirus.sh .
Stop qpkg service: Disable Optware/ipkg
Shutting down SlimServer...
Stopping SqueezeboxServer 7.5.1-30836 (please wait) .... OK.
Stopping thttpd-ssods .. OK.
/etc/rcK.d/QK107Symform: line 48: /share/MD0_DATA/.qpkg/Symform/Symform.sh: No such file or directory

(By the way it really annoys me when I’ve asked software to remove itself and it doesn’t cleanly uninstall – I’m looking at you Symform plugin)

Unmount the volume

[~] # umount /dev/md0

Stop the array

[~] # mdadm -S /dev/md0
mdadm: stopped /dev/md0

Reassemble the volume

[~] # mdadm --assemble /dev/md0 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3
mdadm: /dev/md0 has been started with 5 drives (out of 6).

Wait, wha? What about that other disk that I think is okay?

[~] # mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Fri May 22 21:05:28 2009
Raid Level : raid5
Array Size : 9759728000 (9307.60 GiB 9993.96 GB)
Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
Raid Devices : 6
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Dec 14 19:09:25 2011
State : clean, degraded
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 7c440c84:4b9110fe:dd7a3127:178f0e97
Events : 0.4311172
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 0 0 1 removed
2 8 35 2 active sync /dev/sdc3
3 8 51 3 active sync /dev/sdd3
4 8 67 4 active sync /dev/sde3
5 8 83 5 active sync /dev/sdf3

Or in other words

[~] # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md0 : active raid5 sda3[0] sdf3[5] sde3[4] sdd3[3] sdc3[2]
9759728000 blocks level 5, 64k chunk, algorithm 2 [6/5] [U_UUUU]
md6 : active raid1 sdf2[2](S) sde2[3](S) sdd2[4](S) sdc2[1] sda2[0]
530048 blocks [2/2] [UU]
md13 : active raid1 sdb4[2] sdc4[0] sdf4[5] sde4[4] sdd4[3] sda4[1]
458880 blocks [6/6] [UUUUUU]
bitmap: 0/57 pages [0KB], 4KB chunk
md9 : active raid1 sdf1[1] sda1[0] sdc1[4] sdd1[3] sde1[2]
530048 blocks [6/5] [UUUUU_]
bitmap: 34/65 pages [136KB], 4KB chunk
unused devices: <none>

So, when you see [U_UUUU] you’ve got a disk missing, but you knew that already. You can add it back in to the array thusly.

[~] # mdadm --add /dev/md0 /dev/sdb3
mdadm: re-added /dev/sdb3

So let’s check on the progress.

[~] # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md0 : active raid5 sdb3[6] sda3[0] sdf3[5] sde3[4] sdd3[3] sdc3[2]
9759728000 blocks level 5, 64k chunk, algorithm 2 [6/5] [U_UUUU]
[>....................] recovery = 0.0% (355744/1951945600) finish=731.4min speed=44468K/sec
md6 : active raid1 sdf2[2](S) sde2[3](S) sdd2[4](S) sdc2[1] sda2[0]
530048 blocks [2/2] [UU]
md13 : active raid1 sdb4[2] sdc4[0] sdf4[5] sde4[4] sdd4[3] sda4[1]
458880 blocks [6/6] [UUUUUU]
bitmap: 0/57 pages [0KB], 4KB chunk
md9 : active raid1 sdf1[1] sda1[0] sdc1[4] sdd1[3] sde1[2]
530048 blocks [6/5] [UUUUU_]
bitmap: 34/65 pages [136KB], 4KB chunk
unused devices: <none>
[~] #

And it will rebuild. Hopefully. Unless the disk is really truly dead. You should probably order yourself a spare in any case.

CLARiiON Hot Spare Rebuild Progress and naviseccli

We’re upgrading our CX4-960s to FLARE 30 tonight and, after a slew (a slew being aproximately equal to 4) of disk failures and replacements over the last few weeks, we’re still waiting for one of the SATA-II disks to rebuild. Fortunately, EMC has a handy knowledge base article entitled “What is a CLARiiON proactive hot spare?”, which talks about how to go about using proactive hot spares on the CLARiiON. You can find it on Powerlink as emc150779. The side benefit of this article is that it provides details on how to query the rebuild status of hot spare disks and the RAID groups they’re sparing for.

Using naviseccli, you can get the rebuild state of the disk thusly:

I:\>naviseccli -h SP_IP_address -user username -scope 0 getdisk 3_4_3 -state -rb

Enter password:

Bus 3 Enclosure 4 Disk 3

State: Equalizing

Prct Rebuilt: 241: 100 820: 100 7056: 100 275: 100 7460: 100 7462: 100 321: 100 341: 100 270: 100 250: 32

In this case, I wanted to query the status of the disk in Bus 3 / Enclosure 4 / Disk 3. As you can see from the above example – LUN 250 is at 32%. You can also see the status of the rebuild by looking at the properties of the LUN that is being equalized.

So we should be done in time for the code upgrade. I’ll let you know how that works out for us.