EMC – Configure the Reserved LUN Pool with naviseccli

I’ve been rebuilding our lab CLARiiONs recently, and wanted to configure the Reserved LUN Pool (RLP) for use with SnapView and MirrorView/Asynchronous. Since I spent approximately 8 days per week in Unisphere recently performing storage provisioning, I’ve since made it a goal of mine to never, ever have to log in to Unisphere to do anything again. While this may be unattainable, you can get an awful lot done with a combination of Microsoft Excel, Notepad and naviseccli.

So I needed to configure a Reserved LUN Pool for use with MV/A, SnapView Incremental SAN Copy, and so forth. I won’t go into the reasons for what I’ve created, but let’s just say I needed to create about 50 LUNs and give them each a label. Here’s what I did:

Firstly, I created a RAID Group with an ID of 1 using disks 5 – 9 in the first enclosure.

C:\>naviseccli -h 256.256.256.256 createrg 1 0_0_5 0_0_6 0_0_7 0_0_8 0_0_9

It was then necessary to bind a series of 20GB LUNs to use, 25 for each SP. If you’re smart with Excel you can set the following command to do this for you with little fuss.

C:\>naviseccli -h 256.256.256.256  bind r5 50 -rg 1 -aa 0 -cap 20 -sp a -sq gb  

Here I’ve specified the raid-type (r5), the lun id (50), the RAID Group (1),  -aa 0 (disabling auto-assign), -cap (the capacity), -sp (a or b), and the -sq (size qualifier, which can be mb|gb|tb|sc|bc). Note that if you don’t specify the LUN ID, it will automatically use the next available ID.

So now I’ve bound the LUNs, I can use another command to give them a label that corresponds with our naming standard (using our old friend chglun):

C:\>naviseccli -h 256.256.256.256 chglun -l 50 -name TESTLAB1_RLP01_0050

Once you’ve created the LUNs you require, you can then add them to the Reserved LUN Pool with the reserved command.

C:\>naviseccli -h 256.256.256.256 reserved -lunpool -addlun 99

To check that everything’s in order, use the -list switch to get an output of the current RLP configuration.

C:\>naviseccli -h 256.256.256.256 reserved -lunpool -list
Name of the SP:  GLOBAL
Total Number of LUNs in Pool:  50
Number of Unallocated LUNs in Pool:  50
Unallocated LUNs:  53, 63, 98, 78, 71, 56, 88, 69, 92, 54, 99, 79, 72, 58, 81, 5
7, 85, 93, 61, 96, 67, 76, 86, 64, 50, 66, 52, 62, 68, 77, 89, 70, 55, 65, 91, 8
0, 73, 59, 82, 90, 94, 84, 97, 74, 60, 83, 95, 75, 87, 51
Total size in GB:  999.975586
Unallocated size in GB:  999.975586
Used LUN Pool in GB:  0
% Used of LUN Pool:  0
Chunk size in disk blocks:  128
No LUN in LUN Pool associated with target LUN.
C:\>

If, for some reason, you want to remove a LUN from the RLP, and it isn’t currently in use by one of the layered applications, you can use the -rmlun switch.

C:\>naviseccli -h 256.256.256.256 reserved -lunpool -rmlun 99 -o

If you omit the override [-o] option, the CLI prompts for confirmation before removing the LUN from reserved LUN pool. It’s possible to argue that, with the ability to create multiple LUNs from Unisphere, it might be simpler to not worry about naviseccli, but I think that it’s a very efficient way to get things done quickly, particularly if you’re working in a Unisphere domain with a large number of CLARiiONs, or on a workstation that has some internet browser “issues”.

VMware Lab Manager, ssmove.exe and why I don’t care

Sounds like a depressing topic, but really it’s not all bad. As I’d mentioned previously, I’ve spent a good chunk of the previous 4 months commissioning a CLARiiON CX4-960 array and migrating data from our production CX3-40f and CX700. All told, there’s about 112TB in use, and I’ve moved about 90TB so far. I’ve had to use a number of different methods, including Incremental SAN Copy, sVMotion, vmkfstools, and, finally, ssmove. For those of you who pay attention to more knowledgeable people’s blogs, Scott Lowe had a succinct but useful summary of how to use the ssmove utility here. So I had to move what amounted to about 3TB of SATA-II configs in a Lab Manager 3.0.1 environment. You can read the VMware KB article for the instructions, but ultimately it’s a very simple process. Except when it doesn’t work. By doesn’t work I mean wait for 25 hours and see no progress doesn’t work. So I got to spend about 6 hours on the phone with the Live queue, and the SR took a long time to resolve. The utility really doesn’t provide a lot in terms of logging, nor does it provide a lot of information if it’s not working but has ultimately timed out. It’s always the last 400GB that we get stuck on with data migrations, isn’t it?

The solution involved manually migrating the vmdk files and then updating the database. There’s an internal-only KB article that refers to the process, but VMware don’t really want to tell you about it, because it’s a bit hairy. Hairier stil was the fact that we only had a block replica of the environment, and rolling back would have meant losing all the changes that I’d done over the weekend. The fortunate thing is that this particular version of ssmove does a copy, not a move, so we were able to cancel the failed ssmove process and still use the original, problematic configuration. If you find yourself needing to migrate LM datastores and ssmove isn’t working for you, let me know and I can send you the KB reference for the process to do it manually.

So to celebrate the end of my involvement in the project, I thought I’d draw a graph. Preston is a lot better at graphs than I am, but I thought this one summed up quite nicely my feelings about this project.

vCenter 2.5 and RDMs and multiple guests

While most of you were doing whatever it is you do to relax over the Easter long weekend, I was lucky enough to be cutting over a chunk of our environment with the help of SAN Copy. For the most part, things went well. The only major problem was the Solaris LDOM environment, but our very patient consultant sorted that out for us.

One issue I did have, however, was when I was cutting over RDM LUNs on a number of virtualised clusters. The problem was, basically, that after remapping the RDM on the first guest, I was unable to see the RDM files on the second guest. While some people in our environment believe it’s acceptable to run single-node clusters, I don’t.

It turns out that, and I can’t remember when exactly, the behaviour of vCenter changed to mask RDMs that are already presented to a guest. For those of you playing at home, we’re running the latest vCenter 2.5 (build 227637). So, I needed to add the following setting to the Advanced Settings in vCenter’s configuration. The setting is vpxd.filter.rdmFilter and it should be set to false. Also worthy of note is that this doesn’t seem to survive restarts of the vCenter service. But that’s probably because I’ve done something boneheaded.

Here’s what you need to do.

Then click on Add Row to add the desired settings and you’ll be able to add the RDMs to multiple guests.

By the numbers – Part 2

Here’s the latest list of silly numbers I’ve been working with:

  • Attached 56 78 hosts via dual paths to the new array.
  • Created 234 322 new zonesets.
  • Created 23 28 Storage groups.
  • Created 131 156 RAID Groups.
  • Added 26 30 hot spare disks.
  • Designed and provisioned 620 726 LUNs. This includes 52 64 4-component MetaLUNs.
  • Established 33 164 Incremental SAN Copy Sessions.

The good news I only have to work on this project this weekend and next weekend. Then hopefully someone from EMC will come in to finish it off, while I get to go back to my real work. The downside is that I’ll have done about 90+TB by that point …

By the numbers – Part 1

As I mentioned in the previous post, I’ve being working on a large data migration project. After a brief hiatus, I’m back at work, and thought I’d take a moment to share what I’ve done so far.

  • Attached 56 hosts via dual paths to the new array.
  • Created 234 new zonesets.
  • Created 23 Storage groups.
  • Created 131 RAID Groups.
  • Added 26 hot spare disks.
  • Designed and provisioned 620 LUNs. This includes 52 4-component MetaLUNs.
  • Established 33 Incremental SAN Copy Sessions.

I don’t know how many sVMotions I’ve done so far, but it feels like a lot. I can’t exactly say how many TB I’ve moved yet either, but by the end we’ll have moved over 112TB of configured storage. Once I’ve finished this project – by end of June this year – I’ll tally up the final numbers and make a chart or something.

Broken sVMotion

It’s been a very long few weeks, and I’m looking forward to taking a few weeks off. I’ve been having a lot of fun doing a data migration from a CLARiiON CX3-40f and CX700 to a shiny, new CX4-960. To give you some background, instead of doing an EMC-supported data in place upgrade in August last year, we decided to buy another 4-960 (on top of the already purchased 4-960 upgrade kit) and migrate the data manually to the new array. The only minor probelm with this is that there’s about 100TB of data in various forms that I need to get on to the new array. Sucks to be me, but I am paid by the hour.

I started off by moving VMs from one cluster to the new array using sVMotion, as there was a requirement to be a bit non-disruptive where possible. Unfortunately, on the larger volumes attached to fileservers, I had a few problems. I’ll list them, just for giggles:

There were 3 800GB volumes and 5 500GB volumes and 1 300GB volume that had 0% free space on the VMFS. And I mean 0%, not 5MB or 50MB, 0%. So that’s not cool for a few reasons. ESX likes to update journaling data on VMFS, because it’s a journaling filesystem. If you don’t give it space to do this, it can’t do it, and you’ll find volumes start to get remounted with very limited writeability. If you try to storage VMotion these volumes, you’ll again be out of luck, as it wants to keep a dmotion file on the filesystem to track any changes to the vmdk file while the migration is happening. I found my old colleague Leo’s post to be helpful when a few migrations fail, but unfortunately the symptoms he described were not the same as mine, in my case the VMs fell over entirely. More info from VMware can be had here.

If you want to move just a single volume, you try your luck with this method, which I’ve used successfully before. But I was tired, and wanted to use vmkfstools since I already had an unexpected outage and had to get something sorted.

The problem with vmkfstools is that there’s no restartable copy option – as far as I know. So when you get 80% through a 500GB file and it fails, well, that’s 400GB of pointless copying and time you’ll never get back. Multiply that out over 3 800GB volumes and a few ornery 500GB vmdks and you’ll start to get a picture of what kind of week I had.

After suffering through a number of failures, I ended up taking one node out of the 16-node cluster and placing it and its associated datastores (the ones I needed to migrate) in their own Navisphere storage group. That way, there’d be no “busy-ness” affecting the migration process (we had provisioned about 160 LUNs to the cluster at this stage and we were, obviously, getting a few “SCSI reservation conflicts” and “resource temporarily unavailable” issues). This did the trick, and I was able to get some more stuff done. now there’s only about 80TB to go before the end of April. Fun times.

And before you ask why didn’t I use SAN Copy? I don’t know, I suppose I’ve never had the opportunity to test it with ESX, and while I know that underlying technology is the same as MirrorView, I just really didn’t feel I was in a position to make that call. I probably should have just done it, but I didn’t really expect that I’d have as much trouble as I did with sVMotion and / or vmkfstools. So there you go.

2009 and penguinpunk.net

It was a busy year, and I don’t normally do these type of posts, but I thought I’d try to do a year in review type thing so I can look back at the end of 2010 and see what kind of promises I’ve broken. Also, the Exchange Guy will no doubt enjoy the size comparison. You can see what I mean by that here.

In any case, here’re some broad stats on the site. In 2008 the site had 14966 unique visitors according to Advanced Web Statistics 6.5 (build 1.857). But in 2009, it had 15856 unique visitors – according to Advanced Web Statistics 6.5 (build 1.857). That’s an increase of some 890 unique visitors, also known as year-on-year growth of approximately 16.82%. I think. My maths are pretty bad at the best of times, but I normally work with storage arrays, not web statistics. In any case, most of the traffic is no doubt down to me spending time editing posts and uploading articles, but it’s nice to think that it’s been relatively consistent, if not a little lower than I’d hoped. This year (2010 for those of you playing at home), will be the site’s first full year using Google analytics, so assuming I don’t stuff things up too badly, I’ll have some prettier graphs to present this time next year. That said, MYOB / smartyhost are updating the web backend shortly so I can’t make any promises that I’ll have solid stats for this year, or even a website :)

What were the top posts? Couldn’t tell you. I do, however, have some blogging-type goals for the year:

1. Blog with more focus and frequency – although this doesn’t mean I won’t throw in random youtube clips at times.

2. Work more on the promotion of the site. Not that there’s a lot of point promoting something if it lacks content.

3. Revisit the articles section and revise where necessary. Add more articles to the articles page.

On the work front, I’m architecting the move of my current employer from a single data centre to a 2+1 active / active architecture (from a storage and virtualisation perspective). There’s more blades, more CLARiiON, more MV/S, some vSphere and SRM stuff, and that blasted Cisco MDS fabric stuff is involved too. Plus a bunch of stuff I’ve probably forgotten. So I think it will be a lot of fun, and a great achievement if we actually get anything done by June this year. I expect there’ll be some moments of sheer boredom as I work my way through 100s of incremental SAN Copies and sVMotions. But I also expect there will be moments of great excitement when we flick the switch on various things and watch a bunch of visio illustrations turn into something meaningful.

Or I might just pursue my dream of blogging about the various media streaming devices on the market. Not sure yet. In any case, thanks for reading, keep on reading, tell your friends, and click on the damn Google ads.

CLARiiON Virtual LUN Technology and Layered Applications

I recently had one of those head-slapping moments where a ridiculously simple thing had me scratching my head and sifting through release notes to find out how to do something that I’d done many times before, but couldn’t get working this time. I have a client with some EMC CLARiiON AX4-5 arrays running full Navisphere and MirrorView/Asynchronous. He was running out of space on a NetWare fileserver LUN and needed to urgently expand the volume. As there was some space on another Raid Group, and he was limited by the size of the secondary image he could create on the DR array, we came up with a slightly larger size and I suggest using the LUN Migration tool to perform the migration. EMC calls this “Virtual LUN Technology”, and, well, it’s a pretty neat thing to have access to. I think it came in with FLARE Release 16 or 19, but I’m not entirely sure.

In any case, we went through the usual steps of creating a larger LUN on the Raid Group, removing the MirrorView relationship, but, for some reason, we couldn’t see the newer, larger LUN. I did some testing and found that we could migrate to a LUN that was the same size, but not a larger LUN. This was strange, as I thought we’d removed the MirrorView relationship and freed the LUN from any obligations it may have felt to the CLARiiON’s Layered Applications. To wit, the latest FLARE release notes refer to this limitation, which also applies to the CX4 – “If a layered application is using the source LUN, the source LUN can be migrated only to a LUN of the same size”. What I didn’t realise, until I’d spent a few hours on this, was that the SAN Copy sessions I’d created originally to migrate the LUNs from the CX200 to the AX4-5 were still there. Even though they weren’t active (the CX200 is no long gone), Navisphere wasn’t too happy about the idea that the LUN in question would be bigger than it was originally. Removing the stale SAN Copy sessions allowed me to migrate the LUN to the larger destination, and from a NetWare perspective things went smoothly. Of course, recreating the secondary image on the DR array required a defrag of the RAID Group to make enough space for the larger image, but that’s a story for another time.

What have I been doing? – Part 2

Dell memory configuration

The Dell PowerEdge 2950 has some reasonably specific rules for memory installation. You can read them here.

“FBDs must be installed in pairs of matched memory size, speed, and technology, and the total number of FBDs in the configuration must total two, four, or eight. For best system performance, all four, or eight FBDs should be identical memory size, speed, and technology.”

Dear Sales Guy, that means you can’t sell 6, or some other weird number, and expect it to just work.

SAN Copy Pull from CX200 to AX4-5

SAN Copy is a neat bit of software that EMC (or DG, I forget which) developed to rip data off competitor’s arrays when it came time to trade in your HDS system for something a little more EMC-like. It’s also very useful for doing other things like once-off backups, etc, but I won’t bore you with that right now. However, if the two CLARiiONs you’re migrating between are not in the same management domain, you need to enter the WWN of the LUN to be copied across. In this instance I was doing a SAN Copy push from a CX200 to an AX4-5 in order to avoid using host-based copy techniques on the NCS cluster (my ncopy skills are not so good).

So when you are selecting the destination storage and can’t see the AX4-5, you’ll need the WWN:

sancopy1

Click on “Enter WWN…”

sancopy2

And you should then be able to see the destination LUN.

You’ll also need an enabler loaded on the AX4-5 to use SAN Copy – this enabler is not the same enabler you would load on a CX, CX3 or CX4. Trust me on that one …

Setting Up MV/A

I setup MV/A between two AX4-5 arrays recently (woot! baby SANs) and thought I’d shed some light on the configuration requirements:

MirrorView/A Setup
– MirrorView/A software must be loaded on both Primary and Secondary storage systems (ie the enabler file via ndu)
– Secondary LUN must be exactly the same size as Primary LUN (use the block count when creating the mirrors)
– Secondary LUN does not need to be the same RAID type as Primary (although it’s a nice-to-have)
– Secondary LUN is not directly accessible to host(s) (Mirror must be removed or Secondary promoted to Primary for host to have access)
– Bi-directional mirroring fully supported (hooray for confusing DR setups and active/active sites!)

Management via Navisphere Manager and Secure CLI
– Provides ease of management
– Returns detailed status information

So what resources are used by MirrorView/A? I’m glad you asked.

SnapView
– Makes a golden copy of remote image before update starts

Incremental SAN Copy
– Transfers data to secondary image. Uses SnapView as part of ISC

MirrorView/A license (enabler – the bit you paid for)
– MirrorView/A licensed for user
– SnapView, SAN Copy licensed for system use (meaning you can’t use it to make your own snapshots)

Adequate Reserved LUN Pool space
– Local and remote system (this is critical to getting it working and often overlooked during the pre-sales and design phases)

Provisioning of adequate space in the Reserved LUN Pool on the primary and secondary storage systems is vital to the successful operation of MirrorView/A. The exact amount of space needed may be determined in the same manner as the required space for SnapView is calculated.

So how do we do that?

Determine average Source LUN size
– Total size of Source LUNs / # of Source LUNs

Reserved LUN size = 10% average Source LUN size
– COFW factor

Create 2x as many Reserved LUNs as Source LUNs
– Overflow LUN factor

Example
– LUNs to be snapped: 10 GB, 20 GB, 30 GB, 100 GB
– Average LUN size = 160 GB/4 = 40 GB
– Make each Reserved LUN 4 GB in size
– Make 8 Reserved LUNs

Due to the dynamic nature of Reserved LUN assignment per Source LUN, it may be better to have many smaller LUNs that can be used as a pool of individual resources. A limiting factor here is that the total number of Reserved LUNs allowed varies by storage system model. Each Reserved LUN can be a different size, and allocation to Source LUNs is based on which is the next available Reserved LUN, without regard to size. This means that there is no mechanism to ensure that a specified Reserved LUN will be allocated to a specified Source LUN. Because of the dynamic nature of the SnapView environment, assignment may be regarded as a random event (though, in fact, there are rules governing the assignment of Reserved LUNs).

Frequency of synchronization. What are you using MirrorView/A for? If you promote the secondary LUN to a DR target, what are you hoping to see? Setting up MV/A is normally a question of understanding the Recovery Point Objective (RPO) of the business data you’re trying to protect, as well as the Recovery Time Objective (RTO). With MV/A (and MV/S), the RTO can be quite quick – normally as long as it takes to promote the secondary image and mount the data on a DR host, assuming you have a warm standby DR site in operation. Of course, what data you’ve replicated will decide how fast you get back up and running. If it’s a simple fileserver, then presenting the clone to another host is fairly trivial. But if you’ve presented a bunch of VMs sitting on a VMFS, you need to do some other things (discussed later), to get back up and running. So how do you decide on an acceptable RPO? Well, start with how much money you can spend on a link (kidding). It’s probably a good idea to look at what the business can tolerate in terms of data loss before you go off and configure mirrors with a 5-day replication cycle. Once you’ve worked that out, and tested it, you should be good to go. Oh, and keep in mind that, if you’re going to failover an entire site to DR, you need to consider the storage subsystem you’ve put in place at the DR site. Imagine trying to run a full-blown ESX environment on 11 1TB SATA-II spindles in RAID 5. Uh-huh.