Mat has updated the DIY Heatmaps script to support SAS-type Flash drives. Download it from here, take it for a spin and let us know what you think. And tell your friends.
Mat came across a weird problem with Unisphere the other day while he was trying to retrieve some nar files for EMC to look at. Normally I like to post solutions up here but in this case I don’t know what the solution is. Previously, it was my understanding that we could retrieve access multiple nar files from multiple arrays in the same domain via the one Unisphere session. This doesn’t seem to be the case anymore. As background, we run 4 CX4-960s and a CX4-240 in a single Unisphere domain. These arrays were all upgraded to 30.525 recently.
Normally, I’d login to one of the arrays and go to Monitoring -> Analyzer and then Retrieve Archive. So far, so good.
But when I change the “Retrieve Archive From:” selection, I get the following.
Notice that I’m trying to retrieve files from the array serial number 0260, but I’m still seeing 0679, even after hitting Refresh. Maybe it’s Unisphere on that array, let’s try another.
Ok, I’ve logged on to array 0260 and can now retrieve the files I want. But do I still get weird behaviour? Yes, yes I do.
If anyone has any ideas, I’m all ears. I think Mat is going to log a call at some stage.
Mat has updated the DIY Heatmaps for EMC CLARiiON and VNX arrays to version 4.01. You can get it from the Utilities page. Any and all feedback welcome.
Updates and Changes to the script
- Add database storage / retrieval for performance stats. The database size will be approximately 2.1 x the size of the NAR file based on the default interval of 30 minutes. On my PC it took a bit over 9 hours to process 64 NAR files into a database, the NAR files were 1.95GB and the resulting database was 4.18GB. However running the script over the database to produce a heatmap only takes seconds.
- Changed to use temporary tables for transitional data. This should slightly reduce the size of the database file, as the temporary data is not written to disk.
- Changed the way the script processes multiple NAR files, the script previously bunched all NAR files into a single naviseccli process, this was problematic if you were processing multiple large NAR files, the script now processes them one at a time.
- Add command line options:
–output_db Output the processed NAR file to the nominated database
–input_db Use the nominated database as the source of data for the heatmap
–s_date Specify a start date/time must be in the format (with quotes if specifying date and time “mm/dd/yyyy hh:mm:ss”
–e_date Specify an end date/time
–retrieve_all_nar When retrieving NAR files from the array, you can now retireve all nar files (it wont overwrite files already downloaded)
–process_only_new If you are downloading NAR files, only process files that haven’t been downloaded previously
–max_nar_files Set the maximum number of files to download and process
Please let us know if you find any bugs or problems with the script, or if you have any further suggestions for changes and enhancements.
Just a quick one to start the year off on the right note. I was installing updated Utility Partition software on our lab CX4s today and noticed that USM was a bit confused as to when it had started installing a bit of the code. Notice the Time started and Time elapsed section. Well, I thought it was amusing.
Mat has been trying to create a 42TB LUN to use temporarily for Centera backups. I don’t want to go into why we’re doing Centera backups, but let’s just say we need the space. He created a Storage Pool on one of the CX4-960s, using 28 2TB spindles and 6+1 private RAID Groups. However, when he tried to bind the LUN, he got the following error.
Weird. So what if we set the size to 44000GB?
No, that doesn’t work either. Turns out, I should really read some of the stuff that I post here, like my article entitled “EMC CLARiiON VNX7500 Configuration guidelines – Part 1“, where I mention that the maximum size of a Pool LUN is 16TB. I was wrong in any case, as it looks more like it’s 14TB. Seems like we’ll be using RAID Groups and MetaLUNs to get over the line on this one.
I sometimes get asked what the definition of these states is, and I frequently have trouble defining it clearly. Fortunately, EMC’s MirrorView Knowledgebook has been updated to incorporate Release 32, and Appendix A has some succinct definitions. If you can’t be bothered looking them up for yourself, here they are.
- Synchronized – The secondary image is identical to the primary. This state persists only until the next write to the primary image, at which time the image state becomes Consistent.
- Consistent – The secondary image is identical to either the current primary image or to some previous instance of the primary image. This means that the secondary image is available for recovery when you promote it.
- Synchronizing – The software is applying changes to the secondary image to mirror the primary image, but the current contents of the secondary are not known and are not usable for recovery.
- Out-of-Sync – The secondary image requires synchronization with the primary image. The image is unusable for recovery.
- Rolling Back (MV/A only) – A successful promotion occurred where there was an unfinished update to the secondary image. This state persists until the Rollback operation completes.
I think one of the key things here is to pay attention to the various image states, particularly if you’re seeing a lot of out-of sync states on your secondaries. You don’t want to have to explain to people why they can’t recover secondaries in the event of a serious failure. And, more importantly, get on Powerlink and check out the MirrorView Knowledgebook (H2417).
I’ve been commissioning some new CX4-960s recently (it’s a long story), and came across a few things that I’d forgotten about for some reason. If you’re running older disks, and they get replaced by EMC, there’s a good chance they’ll be a higher capacity. In our case I was creating a storage pool with 45 300GB FC disks and kept getting the following error.
This error was driving me nuts for a while, until I realised that one of the 300GB disks had, at some point, been replaced with a 450GB drive. Hence the error.
The other thing I came across was the restriction that Private LUNs (Write Intent Log, Reserved LUN Pool, MetaLUN Components) have to reside on traditional RAID Groups and can’t live in storage pools. Not a big issue, but I hadn’t really planned to use RAID Groups on these arrays. If you search for emc254739 you’ll find a handy KB article on WIL performance considerations, including this nugget “Virtual Provisioning LUNs are not supported for the WIL; RAID group-based LUNs or metaLUNs should be used”. Which clarifies why I was unable to allocate the 2 WIL LUNs I’d configured in the pool.
*Edit* I re-read the KB article and realised it doesn’t address the problem I saw. I had created thick LUNs on a storage pool, but these weren’t able to be allocated as WIL LUNs. Even though the article states “[The WIL LUNs] can either be RAID-group based LUNs, metaLUNs or Thick Pool LUNs”. So I don’t really know. Maybe it’s a VNX vs CX4 thing. Maybe not.
Mat has updated the DIY Heatmaps for EMC CLARiiON and VNX arrays to version 3.0211. You can get it from the Utilities page here. Any and all feedback welcome. Changes below:
Add –min_colour, –mid_colour, –max_colour options (just a change of spelling of colour)
Remove case sensitivity for colours
Added FC SSD drive type
I noticed that one of our CX4s was exhibiting some odd behaviour the other day. When looking at the System Information window, I noticed that FAST Cache seemed broken. Here’s a picture of it.
Going to the FAST Cache tab on System Properties yielded the same result, as did the output of naviseccli (using naviseccli -h IPaddress cache -fast -info). Interestingly, though, it was still showing up with dirty pages.
We tried recreating it, but the 8 * 100GB EFDs we were using for FAST Cache weren’t available. So we logged a call, and after a bit of back and forth with support, worked out how to fix it. A few things to note first though. If support tell you that FAST Cache can’t be used because you’re using EFDs, not SSDs, ask to have the call escalated. Secondly, the solution I’m showing here fixes the specific problem we had. If you frig around with the tool you may end up causing yourself more pain than it’s worth.
So, to fix the problem we had, we needed to log in to the /debug page on the CX4. To do this, go to http://<yourSPaddress>/debug.
We recently had to convert some old CX4-480s to 960s and rename them. For some reason we couldn’t change the names on the SPs or get them to join the Unisphere domain properly. EMC Support used a very useful tool called natest.exe to rename the SPs via a RemotelyAnywhere session. Here’s how to do it. Obviously you should use this tool with caution and only if instructed by support.
Log in to the affected SP with RemotelyAnywhere and fire up a command prompt.
You can then use natest to rename the SP, amongst other things.
Microsoft Windows [Version 5.2.3790] (C) Copyright 1985-2003 Microsoft Corp. C:\Documents and Settings\clariion>natest == Main Menu == 0: Exit 1: Physical Port -> 2: Virtual Port -> 3: Portal Group -> 4: Portal -> 5: Global Props -> 6: Test -> selection: 5 == Global Props Menu == 0: Exit 1: Display global properties 2: Set global properties 3: Do full refresh selection: 1 ---------------------------------------------------------------------------- supported: IPv6 VLAN machine name: ERNIE domain name: reboot required: FALSE ---------------------------------------------------------------------------- == Global Props Menu == 0: Exit 1: Display global properties 2: Set global properties 3: Do full refresh selection: 2 Enter machine name: BERT == Global Props Menu == 0: Exit 1: Display global properties 2: Set global properties 3: Do full refresh
Once you’ve set the machine name, back out of the utility and use Unisphere to reboot the SP. Once it has rebooted you should see the new SP name in Unisphere and you should then be able to join it to the domain.