QNAP – How to repair RAID brokenness

I use a QNAP 639 Pro NAS at home to store my movies on. It’s a good unit and overall I’ve found it to be relatively trouble-free. I was recently upgrading the disks in the RAID set from 1TB to 2TB drives and it was going swimmingly. But then I heard a beep while the RAID was rebuilding disk 5 of 6 in the set. And I started to get some concerned e-mails from the NAS.

Server Name: qnap639
 IP Address: 256.256.256.256
 Date/Time: 2011/06/09 16:27:33
 Level:  Error
 [RAID5 Disk Volume: Drive 1 2 3 4 5 6] Error occurred while accessing Drive 3.

Server Name: qnap639
 IP Address: 256.256.256.256
 Date/Time: 2011/06/09 16:27:40
 Level:  Error
 [RAID5 Disk Volume: Drive 1 2 3 4 5 6] Error occurred while accessing the devices of the volume in degraded mode.

Server Name: qnap639
 IP Address: 256.256.256.256
 Date/Time: 2011/06/09 16:29:32
 Level:  Warning
 [RAID5 Disk Volume: Drive 1 2 3 4 5 6] Mount the file system read-only.

Server Name: qnap639
 IP Address: 256.256.256.256
 Date/Time: 2011/06/09 16:31:41
 Level:  Warning
 [RAID5 Disk Volume: Drive 1 2 3 4 5 6] Rebuilding skipped.

Basically, it looks like the NAS thought one of the disks had popped. You can see this thing all over the QNAP forums – here‘s a good example – and it’s usually because of incompatibility between the QNAP firmware and various green hard disks. but I’d checked that my disks were on the Official QNAP HCL, and, well, that couldn’t be it. So I rebooted a bunch of times and ran S.M.A.R.T. scans on the allegedly failed disk. I pulled it out and erased it on an XP box and put it back in. The NAS wanted no part of it though. So it was time to get dirty with mdadm.

Firstly, make sure there’s nothing going on on the NAS, stop the running services and unmount the RAID device.

/etc/init.d/services.sh stop
umount /dev/md0

Once the volume’s unmounted you can stop the volume.

mdadm -S /dev/md0

Now for the bit where you hold your breath for a while – the reassembly of the volume with the components you want.

mdadm --assemble /dev/md0 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3

To see the progress, you can access a couple fo different commands.

mdadm --detail /dev/md0
cat /proc/mdstat

Once that’s complete, it’s best to run a filesystem check.

e2fsck -f /dev/md0

If there’re no errors, mount the volume and check that your stuff is still there.

mount /dev/md0 /share/MD0_DATA

I then rebooted and confirmed that everything started up correctly and my data was still there. But when I added the 6th drive, I got an error about a missing superblock and there didn’t seem to be any mdadm magic that would solve this problem. So like I good admin I rebooted, and the NAS started rebuilding the volume with the 6th disk. Now if I can only fix the problem where smbd kills the CPU and disconnects guests from the share we’ll be gold.