EMC – CX4 FAST Cache cosmetic issues and using /debug

I noticed that one of our CX4s was exhibiting some odd behaviour the other day. When looking at the System Information window, I noticed that FAST Cache seemed broken. Here’s a picture of it.

Going to the FAST Cache tab on System Properties yielded the same result, as did the output of naviseccli (using naviseccli -h IPaddress cache -fast -info). Interestingly, though, it was still showing up with dirty pages.

We tried recreating it, but the 8 * 100GB EFDs we were using for FAST Cache weren’t available. So we logged a call, and after a bit of back and forth with support, worked out how to fix it. A few things to note first though. If support tell you that FAST Cache can’t be used because you’re using EFDs, not SSDs, ask to have the call escalated. Secondly, the solution I’m showing here fixes the specific problem we had. If you frig around with the tool you may end up causing yourself more pain than it’s worth.

So, to fix the problem we had, we needed to log in to the /debug page on the CX4. To do this, go to http://<yourSPaddress>/debug.

You’ll need your Navisphere or LDAP credentials to gain access. Once you’ve logged in, the page should look something like the following (paying particular attention to the warning).

 Now scroll down until you get to “Force A Full Poll”. Click on that and wait a little while.

Once this is done, you can log back into Unisphere and FAST Cache should look normal again.

 Hooray!

EMC – DIY Heatmaps – Updated Version

Mat has updated the DIY Heatmaps for EMC CLARiiON and VNX arrays to version 3.021. You can get it from the Utilities page here. Any and all feedback welcome. Changes below:

Add command line options:

 

–min_color –mid_color –max_color

To allow the user to select different color schemes for their heatmap graphs. The available colors to choose from are (red, green, blue, yellow, cyan, magenta, purple, orange, black, white)

 

–steps

Change the granularity of the heatmap steps, for example on an attribute like % Utilization, if steps is set to 20, there will be different color bands for 0-4%, 5-9%, 10-14%,etc the default is 10 so color bands will be at 0-9%,10-19%,20-29%, etc

 

–detail_data

This option will allow you to display detail heat graph for an object over time when it has been selected. For example, selecting the SP-B heatmap object below, produces a heat graph for that object over the duration of the NAR file.  Thanks to Ian  for the idea and code behind this.

There have been some other script improvements that:

Add exit code checking after running naviseccli

Browser compatibility fixes – mainly with Chrome, but this should improve display consistency across different browser platforms

EMC – Broken Vault drive munts FAST Cache

Mat sent me an e-mail this morning, asking “why would FAST Cache be degraded after losing B0 E0 D2 in one of the CX4-960s?”. For those of you playing at home 0_0_2 is one of the Vault disks in the CX4 and VNX. Here’s a picture of the error:

Check out the 0x7576 that pops up shortly after the array says there’s a faulted disk. Here’s a closeup of the error:

Weird, huh?  So here’s the output of the naviseccli command that will give you the same information, but with a text-only feel.

"c:/Program Files/EMC/Navisphere CLI/NaviSECCli.exe"  -user Ebert -scope 0 -password xx -h 255.255.255.255  cache -fast -info -disks -status
Disks:
Bus 0 Enclosure 7 Disk 0
Bus 2 Enclosure 7 Disk 0
Bus 0 Enclosure 7 Disk 1
Bus 2 Enclosure 7 Disk 1
Bus 1 Enclosure 7 Disk 1
Bus 1 Enclosure 7 Disk 0
Bus 3 Enclosure 7 Disk 1
Bus 3 Enclosure 7 Disk 0
Mode:  Read/Write
Raid Type:  r_1
Size (GB):  366
State:  Enabled_Degraded
Current Operation:  N/A
Current Operation Status:  N/A
Current Operation Percent Completed:  N/A

So what’s with the degraded cache? The reason for this is that FAST Cache stores a small database on the first 3 drives (0_0_0, 0_0_1, 0_0_2). if any of these disks fail, FAST Cache flushes to disk and goes into a degraded state. But it shouldn’t, because the database is triple-mirrored. And what does it mean exactly? It means your FAST Cache is not processing writes at the moment. Which is considered “bad darts”.

This is a bug. Have a look on Powerlink for emc267579. Hopefully this will be fixed in R32 for the VNX. I couldn’t see details about the CX4 though. I strongly recommend that if you’re a CX4 user and you experience this issue, you raise a service request with your local EMC support mechanisms as soon as possible. The only way they get to know the severity of a problem is if people in the field feedback issues.

EMC – DIY Heatmaps – Updated Version

Mat has done an updated version of the heatmaps script for CLARiiON with LUN info and good things like that. You can download it here. Updated release notes can be found here. A sample of the output is here. Enjoy, and feel free to send requests for enhancements.

EMC – FAST Cache and LUN expansion or shrink operations

Someone on twitter asked me about a white paper they were reading on the EMC site recently that suggested that LUN expansion or shrink operations would require that FAST Cache be disabled. The white paper in question is located here. For those of you loitering on Powerlink the EMC Part Number is h8046.7. In any case, on page 8 it covers a number of requirements for using FAST Cache – most of which seem fairly reasonable. However, this one kind of got my attention (once my attention was drawn to it by @andrewhatfield) – “Once FAST Cache has been created, expansion or shrink operations require disabling the cache and re-creating the FAST Cache“. Wow. So if I want to do a LUN expansion I need to delete and re-create FAST Cache once it’s complete? Seriously? I informally confirmed this with my local Account TC as well.

It takes a while to create FAST Cache on a busy VNX. It takes even longer to disable it on a busy system. What a lot of piss-farting around to do something which used to be a fairly trivial operation (the expansion I mean). Now, I’ll be straight with you, I haven’t had the opportunity to test what happens if I don’t disable FAST Cache before I perform these operations. Knowing my luck the damn thing will catch on fire. But it’s worth checking this document out before you pull the trigger on FAST Cache.

[Edit: Or maybe they mean if you want to expand or shrink the FAST Cache? Because that makes sense. I hope that’s what they mean.]

[Edit #2: Joe (@virtualtacit) kindly clarified that this requirement relates to the shrinking or expansion of FAST Cache, not LUNs. My bad! Nothing to see here, move along :)]

Updated Article – Storage Design Principles

I’ve updated the Storage Design Principles document with a brief discussion on how expanding FAST VP pools can be “teh suck”, and some brief information on IBM SDD. Tell your friends.

New Article – Storage Design Principles

I’ve added another new article to the articles section of the blog. This one is basically a high level storage design principles document aimed at giving those not so familiar with midrange storage design a bit of background on some of the things to consider when designing for VNX. It’s really just a collection of notes from information available elsewhere on the internet, so make of it what you will. As always, your feedback is welcome.

New Article – VNX5700 Configuration Guidelines

I’ve added a new article to the articles section of the blog. This one is basically a rehash of the recent posts I did on the VNX7500, but focussed on the VNX5700 instead. As always, your feedback is welcome.

EMC – Configure FAST Cache disks with naviseccli

I’m sorry I couldn’t think of a fancy title for this post, but did you know you can configure FAST Cache with naviseccli? I can’t remember whether I’ve talked about this before or not. So just go with it. This one’s quick and dirty, by the way. I won’t be talking about where you should be putting your EFDs in the array. That really depends on the model array you have and the number of EFDs at your disposal. But don’t just go and slap them in any old way. Please, think of the children.

To use FAST Cache, you’ll need:

  • The FAST Cache enabler installed;
  • EFD disks that are not in a RAID group or Storage Pool;
  • To have configured the FAST Cache (duh);
  • The correct number of disks for the model of CLARiiON or VNX you’re configuring; and
  • To have enabled FAST Cache for the RAID group LUNs and/or the pools with LUNs that will use FAST Cache.

Basically, you can run the following switches after the standard naviseccli -h sp-ip-address

cache -fast -create – this creates FAST Cache.

cache -fast -destroy – this destroys FAST Cache.

cache -fast -info – this displays FAST Cache information.

When you create FAST Cache, you have the following options:

cache -fast -create -disks disksList [-rtype raidtype] [-mode ro|rw] [-o]

Here is what the options mean:

-disks disksList – You need to specify what disks you’re adding, or it no worky. Also, pay close attention to the order in which you bind the disks.

-mode ro|rw – The ro is read only mode and rw is readwrite mode.

-rtype raidtype – I don’t know why this is in here, but valid RAID types are disk and r_1.

-o – Just do it and stop asking questions!

naviseccli cache -fast -create -disks 0_1_6 1_1_6 -mode rw -rtype r_1

In this example I’ve used disks on Bus 0, Enclosure 1, Disk 6 and Bus 1, Enclosure 1, Disk 6.

Need info about what’s going on? Use the following command:

cache -fast -info [-disks] [-status] [-perfData]

I think -perfdata is one of the more interesting options here.

EMC – Other VNX Configuration Guidelines that may be useful

Firstly, apologies for the recent lack of posts. I’ve been on holidays and then started a new job and it’s all been not very related to this blog. Secondly, while it was tempting to call this blog part 5 in the VNX7500 series – these configuration guidelines work well for most all of the VNX range of arrays, not just the 7500. Thirdly, forgive me if I’ve said some of this stuff before. And finally, yes, I know I promised I’d upload some sample designs and talk about them, and I promise I will. Soon. Or soonish. So, in no particular order, here’s a list of things that you should keep in mind when designing solutions around the VNX.

Note that Pool-based LUNs, EFD-based LUNs, FAST VP LUNs, and FAST Cached LUNs do not benefit from file system defragmentation in the way traditional LUNs do. This might require a bit of education on the part of the system administrators – because you know they loves them some defragmentation action.

When configuring FAST Cache on an array, it is important to locate the primary and secondary drives of the RAID 1 pair on different Back End ports. The order the drives are added into FAST Cache is the order in which they are bound. So pay attention when you do this. The disabling of  FAST Caching of Private LUNs is recommended (these include the WIL, Clone private LUNs and Reserved LUN Pool LUNs). However, you shouldn’t disable FAST Cache for MetaLUN components.

If you’re using EFDs for “Tier 0”, you’ll get good performance with up to 12 EFDs per Back End port. But if you’re on the hunt for the highest throughput, it is recommended that this number be kept to about 5.

It is recommended that you use RAID 6 with NL-SAS drives of 1TB or greater. This has some interesting implications for FAST VP hetergenous Pool configurations and the use of 15 vs 25-disk DAEs. I’m hoping to put together a brief article on ways around that in the next week or so.

When architecting for optimal response time, limit throughput to about 70% of the following values: 

 

 

 

 

It is considered prudent to plan for 2/3 of IOPS for normal use – this will give you some margin for burst and degraded mode operation.

When it comes to fancy RAID Group configurations – EMC recommend that a single DAE should be the default method for RAID Group provisioning. If you use vertical provisioning make sure that: for RAID 5, at least 2 drives per port are in the same DAE; for RAID 6, 3 drives in are the same DAE; and for RAID 1/0, both drives of a mirrored pair are on separate Back End ports. It should be noted that parity RAID Groups of 10 drives or more can benefit from binding across 2 Back End ports – this reduces rebuild times when you pop a disk.

Finally, it should be noted that you can’t use Vault drives in a FAST VP pool. I still prefer to not use them for anything.