Happy 2019. I’ve been on holidays for three full weeks and it was amazing. I’ll get back to writing about boring stuff soon, but I thought I’d post a quick summary of some issues I’ve had with my home-built NAS recently and what I did to fix it.
Where Are The Disks Gone?
I got an email one evening with the following message.
I do enjoy the “Faithfully yours, etc” and the post script is the most enlightening bit. See where it says [UU____UU]? Yeah, that’s not good. There are 8 disks that make up that device (/dev/md0), so it should look more like [UUUUUUUU]. But why would 4 out of 8 disks just up and disappear? I thought it was a little odd myself. I had a look at the ITX board everything was attached to and realised that those 4 drives were plugged in to a PCI SATA-II card. It seems that either the slot on the board or the card are now failing intermittently. I say “seems” because that’s all I can think of, as the S.M.A.R.T. status of the drives is fine.
The short-term fix to get the filesystem back on line and useable was the classic “assemble” switch with mdadm. Long time readers of this blog may have witnessed me doing something similar with my QNAP devices from time to time. After panic rebooting the box a number of times (a silly thing to do, really), it finally responded to pings. Checking out /proc/mdstat wasn’t good though.
dan@openmediavault:~$ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] unused devices: <none>
Notice the lack of, erm, devices there? That’s non-optimal. The fix requires a forced assembly of the devices comprising /dev/md0.
dan@openmediavault:~$ sudo mdadm --assemble --force --verbose /dev/md0 /dev/sd[abcdefhi] [sudo] password for dan: mdadm: looking for devices for /dev/md0 mdadm: /dev/sda is identified as a member of /dev/md0, slot 0. mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1. mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3. mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2. mdadm: /dev/sde is identified as a member of /dev/md0, slot 5. mdadm: /dev/sdf is identified as a member of /dev/md0, slot 4. mdadm: /dev/sdh is identified as a member of /dev/md0, slot 7. mdadm: /dev/sdi is identified as a member of /dev/md0, slot 6. mdadm: forcing event count in /dev/sdd(2) from 40639 upto 40647 mdadm: forcing event count in /dev/sdc(3) from 40639 upto 40647 mdadm: forcing event count in /dev/sdf(4) from 40639 upto 40647 mdadm: forcing event count in /dev/sde(5) from 40639 upto 40647 mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sdd mdadm: clearing FAULTY flag for device 2 in /dev/md0 for /dev/sdc mdadm: clearing FAULTY flag for device 5 in /dev/md0 for /dev/sdf mdadm: clearing FAULTY flag for device 4 in /dev/md0 for /dev/sde mdadm: Marking array /dev/md0 as 'clean' mdadm: added /dev/sdb to /dev/md0 as 1 mdadm: added /dev/sdd to /dev/md0 as 2 mdadm: added /dev/sdc to /dev/md0 as 3 mdadm: added /dev/sdf to /dev/md0 as 4 mdadm: added /dev/sde to /dev/md0 as 5 mdadm: added /dev/sdi to /dev/md0 as 6 mdadm: added /dev/sdh to /dev/md0 as 7 mdadm: added /dev/sda to /dev/md0 as 0 mdadm: /dev/md0 has been started with 8 drives.
In this example you’ll see that /dev/sdg isn’t included in my command. That device is the SSD I use to boot the system. Sometimes Linux device conventions confuse me too. If you’re in this situation and you think this is just a one-off thing, then you should be okay to unmount the filesystem, run fsck over it, and re-mount it. In my case, this has happened twice already, so I’m in the process of moving data off the NAS onto some scratch space and have procured a cheap little QNAP box to fill its role.
My rush to replace the homebrew device with a QNAP isn’t a knock on the OpenMediaVault project by any stretch. OMV itself has been very reliable and has done everything I needed it to do. Rather, my ability to build semi-resilient devices on a budget has simply proven quite poor. I’ve seen some nasty stuff happen with QNAP devices too, but at least any issues will be covered by some kind of manufacturer’s support team and warranty. My NAS is only covered by me, and I’m just not that interested in working out what could be going wrong here. If I’d built something decent I’d get some alerting back from the box telling me what’s happened to the card that keeps failing. But then I would have spent a lot more on this box than I would have wanted to.
I’ve been lucky thus far in that I haven’t lost any data of real import (the NAS devices are used to store media that I have on DVD or Blu-Ray – the important documents are backed up using Time Machine and Backblaze). It is nice, however, that a tool like mdadm can bring you back from the brink of disaster in a pretty efficient fashion.
Incidentally, if you’re a macOS user, you might have a bunch of .ds_store files on your filesystem. Or stuff like .@Thumb or some such. These things are fine, but macOS doesn’t seem to like them when you’re trying to move folders around. This post provides some handy guidance on how to get rid of a those files in a jiffy.
As always, if the data you’re storing on your NAS device (be it home-built or off the shelf) is important, please make sure you back it up. Preferably in a number of places. Don’t get yourself in a position where this blog post is your only hope of getting your one copy of your firstborn’s pictures from the first day of school back.