I’ve broken this post into three parts not because I’m trying to being a tease, but rather so it’s a little easier to digest and you can head straight for the money shot if you so desire. Part 1 covers the intitial failure and subsequent troubleshooting. Part 2 covers the eventual workaround to the problem. Part 3 covers the work that needed to be done once the problem was resolved.
It’s strange to think of EMC’s CX700 CLARiiON array as a “legacy” array. Yet it’s now two generations behind EMC’s flagship mid-range array – the CX4-960. Our project was given access to two CX700s to use as test arrays for a multi-site data centre project we’re working on. That’s cool, as the CX700 is still a reasonably well-specced array, with multiple back-end loops and the a fair bit of useable cache (at least compared to the CX4-120). So after the data centre guys Macguyvered the kit into racks that were too big for the rails, we cabled the lot up and thought it would be a fairly trivial process to get everything up and running.
As usual, I was wrong. The department that had provided these hand-me-down arrays had bought a service from EMC whereby the data was securely erased. For those of you playing at home, this is known as the “Certified Data Erasure Service“, and you can grab the datasheet from here. So basically, these arrays were saved from the scrapheap, but not before they were rendered basically unbootable.
When we powered them up, we got the following output via the terminal:
Disk 2 Read Error 0x00000187
Number Sectors: 1
LBA: 0x0002284B
Buffer: 0x1000A114
DDBS: Can’t read MDB from first disk.
DDBS: Can’t read MDB from second disk.
DDBS: Using first disk for boot – but inaccessible.
FLARE image (0x00400007) located at sector LBA 0x0002284C
Disk Set: 0
ErrorCode: 0x0000018D
ErrorDesc:
Device: BOOT PATH
FRU: STORAGE PROCESSOR
Description: Dual-Mode Fibre Driver Exchange Error!
DualMode Driver Exchange Status: 0x1000000C
Target ID: 0x00
EndError:
ErrorTime: 01/19/2010 05:07:11
(the full boot log can be found here)
So I tried booting to the utility partition via the DDBS submenu. You would be familiar with this submenu if you’ve ever performed an out-of-familty array conversion, it’s where you go to run the conversion image over the new array and tell FLARE that it’s brain is, er, bigger. In any case, this can be accessed by pressing ESC during the initial POST and then typing in “DB_key“. Note that on newer CX3 and CX4 arrays, you don’t press ESC anymore, but rather CTRL-C is used to break the boot. You’ll then be presented with menus that look something like this:
Copyright (c) EMC Corporation , 2007
Disk Array Subsystem Controller
Model: CX700: SAN GBFCC4
DiagName: Extended POST
DiagRev: Rev. 02.39
Build Date: Fri Jul 13 16:36:03 2007
StartTime: 01/19/2010 05:38:05
SaSerialNo: LKE00051202843
AabcdefgBCDEabFabcdGHabIabcJabKabLab
EndTime: 01/19/2010 05:38:20
…. Storage System Failure – Contact your Service Representative …
******
****** Aborting!!!! ******
Hit ESC to begin running diagnostic menu…
Diagnostic Menu
1) Reset Controller 3) DDBS Service Sub-Menu
2) Display Warnings/Errors 4) FCC Boot Sub-Menu
Enter Option : 3
So I select option 3, and then attempt the Utility Partition Boot and get the following:
1) Drive Slot ID Check 2) Utility Partition Boot
0) Exit
Enter Option : 2
DDBS: K10_REBOOT_DATA: Count = 0
DDBS: K10_REBOOT_DATA: State = 0
DDBS: K10_REBOOT_DATA: ForceDegradedMode = 0
DDBS: Read default MDDE off disk 1
DDBS: MDDE (Rev 2) on disk 1
DDBS: Read default DDE (0x40000F) off disk 1
DDBS: Read default MDDE off disk 3
DDBS: MDDE (Rev 2) on disk 3
DDBS: Read default DDE (0x400010) off disk 3
DDBS: Can’t read MDB from first disk.
DDBS: Can’t read MDB from second disk.
DDBS: Using first disk for boot – but inaccessible.
Utility Partition image (0x0040000F) located at sector LBA 0x00BE804C
Disk Set: 1
ErrorCode: 0x00000187
ErrorDesc:
Device: DIAG MENU
FRU: STORAGE PROCESSOR
Description: Disk not logged in Error!
Target ID: 0x01
Targets Found: 0xF000FF53
EndError:
ErrorTime: 01/19/2010 05:39:52
(the full boot log can be found here)
Okay, so that’s not cool. I had hoped that I would be able to boot from the Utility Partition, because the process to load the Recovery Image either from the repository or via ftp is fairly simple. At this point we started to think of a number of whacky alternatives that could be used, including, but not limited to, reconstructing the FLARE disks from another CX700’s hot spares, using the Vault disks from a CX300 and performing an in-place conversion to a CX700, and begging and pleading with our local EMC office for a Vault pack. None of these options really struck us as awesome ideas. Read Part 2 for the solution to the problem.