2009 and penguinpunk.net

It was a busy year, and I don’t normally do these type of posts, but I thought I’d try to do a year in review type thing so I can look back at the end of 2010 and see what kind of promises I’ve broken. Also, the Exchange Guy will no doubt enjoy the size comparison. You can see what I mean by that here.

In any case, here’re some broad stats on the site. In 2008 the site had 14966 unique visitors according to Advanced Web Statistics 6.5 (build 1.857). But in 2009, it had 15856 unique visitors – according to Advanced Web Statistics 6.5 (build 1.857). That’s an increase of some 890 unique visitors, also known as year-on-year growth of approximately 16.82%. I think. My maths are pretty bad at the best of times, but I normally work with storage arrays, not web statistics. In any case, most of the traffic is no doubt down to me spending time editing posts and uploading articles, but it’s nice to think that it’s been relatively consistent, if not a little lower than I’d hoped. This year (2010 for those of you playing at home), will be the site’s first full year using Google analytics, so assuming I don’t stuff things up too badly, I’ll have some prettier graphs to present this time next year. That said, MYOB / smartyhost are updating the web backend shortly so I can’t make any promises that I’ll have solid stats for this year, or even a website :)

What were the top posts? Couldn’t tell you. I do, however, have some blogging-type goals for the year:

1. Blog with more focus and frequency – although this doesn’t mean I won’t throw in random youtube clips at times.

2. Work more on the promotion of the site. Not that there’s a lot of point promoting something if it lacks content.

3. Revisit the articles section and revise where necessary. Add more articles to the articles page.

On the work front, I’m architecting the move of my current employer from a single data centre to a 2+1 active / active architecture (from a storage and virtualisation perspective). There’s more blades, more CLARiiON, more MV/S, some vSphere and SRM stuff, and that blasted Cisco MDS fabric stuff is involved too. Plus a bunch of stuff I’ve probably forgotten. So I think it will be a lot of fun, and a great achievement if we actually get anything done by June this year. I expect there’ll be some moments of sheer boredom as I work my way through 100s of incremental SAN Copies and sVMotions. But I also expect there will be moments of great excitement when we flick the switch on various things and watch a bunch of visio illustrations turn into something meaningful.

Or I might just pursue my dream of blogging about the various media streaming devices on the market. Not sure yet. In any case, thanks for reading, keep on reading, tell your friends, and click on the damn Google ads.

MV/S consistency groups and multiple secondary images

I recently completed a migration for a client from a CX3-20 to a CX4-240. I’d done similar work for them in the past, moving their primary site from a CX500 to a CX4-240. This time things were simpler, as the secondary site I was working on contained primarily replicas of LUNs from the primary site. For those LUNs that weren’t mirrors, I used either Incremental SAN Copy, or sVMotion to migrate the data. The cool thing about MirrorView/Synchronous on the CLARiiON is that you can send replicas from one source to multiple (2) targets. As my client was understandably nervous about the exposure of not having current replicas if there was a problem, we decided that adding a secondary image on the CX4 and waiting for it to synchronize before removing the CX3 image would be the safest. The minor issue was that most of these LUNs were in MirrorView Consistency groups. If you’ve played with VMware’s Site Recovery Manager, you’ll know what these are. But for those of you that don’t, let me drop a little knowledge.

Consistency Groups are in essence a group of secondary images on a CLARiiON that are treated as a single entity. As a result of this treatment, the remote images are consistent, but may contain information that is ever so slightly older than information on the primary images. The thinking behind this is fairly obvious – you don’t want your SQL database LUN to be out of sync with the SQL logs LUN, in the same way that you wouldn’t want the Exchange mailbox LUN to be out of sync with the transaction logs. That would be silly. And it would make recovery in the event of a site failure really difficult. And thus Consistency Groups were born. Hooray.

But there are a few rules that you need to follow with Consistency Groups:

  • Up to 16 groups per CLARiiON;
  • Up to 16 mirrors per group;
  • Primary image LUNs must all be on the same CLARiiON;
  • Secondary image LUNs must all be on the same CLARiiON;
  • Images must be of the same type (you can’t mix sync and async in the same group);
  • Operations happen on all mirrors at the same time.

As I mentioned previously, you can have 1:2 LUN:replica ratio when using MV/S. Unfortunately, when Consistency Groups are in use, the ratio goes back to 1:1. So our precautionary strategy for replica migration was already under pressure. As well as this, we couldn’t have multiple CLARiiONs providing replica images in the same consistency group. The only option that my tired brain could really think of was to remove the replicas from the Consistency Groups, re-sync the new replicas with the primary copies, and then add the new replicas into the Consistency Groups. By using this methodology, we risked having inconsistent volumes on a host, but we didn’t have the risk of not having replicas at all. If any one has a better approach, I’d love to hear about it.

CLARiiON Virtual LUN Technology and Layered Applications

I recently had one of those head-slapping moments where a ridiculously simple thing had me scratching my head and sifting through release notes to find out how to do something that I’d done many times before, but couldn’t get working this time. I have a client with some EMC CLARiiON AX4-5 arrays running full Navisphere and MirrorView/Asynchronous. He was running out of space on a NetWare fileserver LUN and needed to urgently expand the volume. As there was some space on another Raid Group, and he was limited by the size of the secondary image he could create on the DR array, we came up with a slightly larger size and I suggest using the LUN Migration tool to perform the migration. EMC calls this “Virtual LUN Technology”, and, well, it’s a pretty neat thing to have access to. I think it came in with FLARE Release 16 or 19, but I’m not entirely sure.

In any case, we went through the usual steps of creating a larger LUN on the Raid Group, removing the MirrorView relationship, but, for some reason, we couldn’t see the newer, larger LUN. I did some testing and found that we could migrate to a LUN that was the same size, but not a larger LUN. This was strange, as I thought we’d removed the MirrorView relationship and freed the LUN from any obligations it may have felt to the CLARiiON’s Layered Applications. To wit, the latest FLARE release notes refer to this limitation, which also applies to the CX4 – “If a layered application is using the source LUN, the source LUN can be migrated only to a LUN of the same size”. What I didn’t realise, until I’d spent a few hours on this, was that the SAN Copy sessions I’d created originally to migrate the LUNs from the CX200 to the AX4-5 were still there. Even though they weren’t active (the CX200 is no long gone), Navisphere wasn’t too happy about the idea that the LUN in question would be bigger than it was originally. Removing the stale SAN Copy sessions allowed me to migrate the LUN to the larger destination, and from a NetWare perspective things went smoothly. Of course, recreating the secondary image on the DR array required a defrag of the RAID Group to make enough space for the larger image, but that’s a story for another time.

Make the LUN go away, Daddy!

So sometimes you create a Reserved LUN Pool (RLP) for an EMC MirrorView/Asynchronous deployment and then realise that it’s not really what you wanted to do. So what if you want to unbind the LUNs in the RLP but the storage system gives you an error along the lines of “Used by another feature of the storage system” error appears and the LUN cannot be unbound. Yep, that can be a problem. One way around this problem is to issue the following command:

naviseccli -h <SP_IP_address> mirror -async -setfeature -off -lun <LUN #>

You should then be unable to unbind the LUN and get your stuff sorted. This is covered in emc114414 on powerlink.

RAID Group defragmentation halted

It only happened last month, but it feels like decades ago. As part of a data migration project I’ve been working on, I needed to defragment a SATA RAID Group on a CX3-20. Knowing it was going to take a while, I went and did some other things. When I cam back to it 24 hours later, however, it was still sitting on 0%. Hmmm, that seems like it’s taking a while. So I started digging around and noticed that the SP event logs mentioned something about “Expansion process halted” or some such. Okay, cool, so why has it halted? And why isn’t there a button marked “Resume”? The EMC KB article emc178703 – “LUN expansion halted after defragmentation operation started” gives some indication that one of the LUNs may have trespassed after the expansion process had started, and this was why the process had halted. The solution was to trespass the LUN back to its original owner. However, according to the article, this was a scary move, and you need to engage engineering, or marketing, or someone I can’t remember, to assist in identifying the correct LUN. Well, while that was an option, it wasn’t terribly appealing, so I decided to trespass the LUNs myself (I’m much more cautious when it comes to my own data).

So I attempt to migrate one of the LUNs in the RAID Group back to its owner. No dice. Some error along the lines of “You are not allowed to transfer LUN”. Sorry for the vagueness, but my notes were quite sketchy after working on this for a few days over the weekend and going to sleep dreaming of Incremental SAN Copy setups. But I digress. MirrorView/Synchronous was active on some of the LUNs as they were secondary images for some Remote Mirrors we had in play. Cool, so how do I fix this? I realised that the source LUN was owned by SP A, and the secondary image was owned by SP B. So MirrorView, rightly so, had trespassed the secondary to be on the same SP as the source. By admin-fracturing the mirror, changing the default source LUN owner, trespassing the source and re-synchronizing, I was able to get MirrorView to automatically trespass the destination LUN as well. Wonder of all wonders, the defragmentation continued after that, and finished a lot sooner than I’d expected. Although really, it would have been good to know that this could be a problem before it became a problem.

LVM settings with EMC MirrorView and VMware ESX

Some few weekends ago I did some failover testing for a client using 2 EMC CLARiiON CX4-120 arrays, MirrorView/Asynchronous over iSCSI and a 2-node ESX Cluster at each site. the primary goal of the exercise was to ensure that we could promote mirrors at the DR site if need be and run Virtual Machines off the replicas. Keep in mind that the client, at this stage isn’t running SRM, just MV and ESX. I’ve read many, many articles about how MirrorView could be an awesome addition to the DR story, and in the past this has rung true for my clients running Windows hosts. But VMware ESX isn’t Windows, funnily enough, and since the client hadn’t put any production workloads on the clusters yet, we decided to run it through its paces to see how it worked outside of a lab environment.

One thing to consider, when using layered applications like SnapView or MirrorView with the CLARiiON, is that the LUNs generated by these applications are treated, rightly so, as replicas by the ESX hosts. This makes sense, of course, as the secondary image in a MirrorView relationship is a block-copy replica of the source LUN. As a result of this, there are rules in play for VMFS LUNs regarding what volumes can be presented to what, and how they’ll be treated by the host. There are variations on the LVM settings that can be configured on the ESX node. These are outlined here. Duncan of Yellow Bricks fame also discusses them here. Both of these articles are well written and explain clearly why you would take the approach that you have and use the settings that you have with LVM. However, what neither article addresses, at least clearly enough for my dumb arse, is what to do when what you see and what you expect to see are different things.

In short, we wanted to set the hosts to “State 3 – EnableResignature=0, DisallowSnapshotLUN=0”, because the hosts at the DR site had never seen the original LUNs before, nor did we want to go through and resignature the datastores at the failover site and have to put up with datastore volume labels that looked unsightly. Here’s some pretty screenshots of what your Advanced – LVM settings might look like after you’ve done this.

LVM Settings

But we wanted it to look like this:

LVM Settings

However, when I set the LVM settings accordingly, admin-fractured the LUN, promoted the secondary and presented it to the failover ESX host, I was able to rescan and see the LUN, but was unable to see any data on the VMFS datastore. Cool. So we set the LVM settings to “State 2 – EnableResignature=1, (DisallowSnapshotLUN is not relevant)”, and were able to resignature the LUNs and see the data, register a virtual machine and boot okay. Okay, so why doesn’t State 3 give me the desired result? I still don’t know. But I do know that a call to friendly IE at the local EMC office tipped me off to using the VI Client connected directly to the failover ESX host, rather than VirtualCenter. Lo and behold, this worked fine, and we were able to present the promoted replica, see the data, and register and boot the VMs at the failover site. I’m speculating that it’s something very obvious that I’ve missed here, but I’m also of the opinion that this should be mentioned in some of those shiny whitepapers and knowledge books that EMC like to put out promoting their solution. If someone wants to correct me, feel free to wade in at any time.

What have I been doing? – Part 2

Dell memory configuration

The Dell PowerEdge 2950 has some reasonably specific rules for memory installation. You can read them here.

“FBDs must be installed in pairs of matched memory size, speed, and technology, and the total number of FBDs in the configuration must total two, four, or eight. For best system performance, all four, or eight FBDs should be identical memory size, speed, and technology.”

Dear Sales Guy, that means you can’t sell 6, or some other weird number, and expect it to just work.

SAN Copy Pull from CX200 to AX4-5

SAN Copy is a neat bit of software that EMC (or DG, I forget which) developed to rip data off competitor’s arrays when it came time to trade in your HDS system for something a little more EMC-like. It’s also very useful for doing other things like once-off backups, etc, but I won’t bore you with that right now. However, if the two CLARiiONs you’re migrating between are not in the same management domain, you need to enter the WWN of the LUN to be copied across. In this instance I was doing a SAN Copy push from a CX200 to an AX4-5 in order to avoid using host-based copy techniques on the NCS cluster (my ncopy skills are not so good).

So when you are selecting the destination storage and can’t see the AX4-5, you’ll need the WWN:

sancopy1

Click on “Enter WWN…”

sancopy2

And you should then be able to see the destination LUN.

You’ll also need an enabler loaded on the AX4-5 to use SAN Copy – this enabler is not the same enabler you would load on a CX, CX3 or CX4. Trust me on that one …

Setting Up MV/A

I setup MV/A between two AX4-5 arrays recently (woot! baby SANs) and thought I’d shed some light on the configuration requirements:

MirrorView/A Setup
– MirrorView/A software must be loaded on both Primary and Secondary storage systems (ie the enabler file via ndu)
– Secondary LUN must be exactly the same size as Primary LUN (use the block count when creating the mirrors)
– Secondary LUN does not need to be the same RAID type as Primary (although it’s a nice-to-have)
– Secondary LUN is not directly accessible to host(s) (Mirror must be removed or Secondary promoted to Primary for host to have access)
– Bi-directional mirroring fully supported (hooray for confusing DR setups and active/active sites!)

Management via Navisphere Manager and Secure CLI
– Provides ease of management
– Returns detailed status information

So what resources are used by MirrorView/A? I’m glad you asked.

SnapView
– Makes a golden copy of remote image before update starts

Incremental SAN Copy
– Transfers data to secondary image. Uses SnapView as part of ISC

MirrorView/A license (enabler – the bit you paid for)
– MirrorView/A licensed for user
– SnapView, SAN Copy licensed for system use (meaning you can’t use it to make your own snapshots)

Adequate Reserved LUN Pool space
– Local and remote system (this is critical to getting it working and often overlooked during the pre-sales and design phases)

Provisioning of adequate space in the Reserved LUN Pool on the primary and secondary storage systems is vital to the successful operation of MirrorView/A. The exact amount of space needed may be determined in the same manner as the required space for SnapView is calculated.

So how do we do that?

Determine average Source LUN size
– Total size of Source LUNs / # of Source LUNs

Reserved LUN size = 10% average Source LUN size
– COFW factor

Create 2x as many Reserved LUNs as Source LUNs
– Overflow LUN factor

Example
– LUNs to be snapped: 10 GB, 20 GB, 30 GB, 100 GB
– Average LUN size = 160 GB/4 = 40 GB
– Make each Reserved LUN 4 GB in size
– Make 8 Reserved LUNs

Due to the dynamic nature of Reserved LUN assignment per Source LUN, it may be better to have many smaller LUNs that can be used as a pool of individual resources. A limiting factor here is that the total number of Reserved LUNs allowed varies by storage system model. Each Reserved LUN can be a different size, and allocation to Source LUNs is based on which is the next available Reserved LUN, without regard to size. This means that there is no mechanism to ensure that a specified Reserved LUN will be allocated to a specified Source LUN. Because of the dynamic nature of the SnapView environment, assignment may be regarded as a random event (though, in fact, there are rules governing the assignment of Reserved LUNs).

Frequency of synchronization. What are you using MirrorView/A for? If you promote the secondary LUN to a DR target, what are you hoping to see? Setting up MV/A is normally a question of understanding the Recovery Point Objective (RPO) of the business data you’re trying to protect, as well as the Recovery Time Objective (RTO). With MV/A (and MV/S), the RTO can be quite quick – normally as long as it takes to promote the secondary image and mount the data on a DR host, assuming you have a warm standby DR site in operation. Of course, what data you’ve replicated will decide how fast you get back up and running. If it’s a simple fileserver, then presenting the clone to another host is fairly trivial. But if you’ve presented a bunch of VMs sitting on a VMFS, you need to do some other things (discussed later), to get back up and running. So how do you decide on an acceptable RPO? Well, start with how much money you can spend on a link (kidding). It’s probably a good idea to look at what the business can tolerate in terms of data loss before you go off and configure mirrors with a 5-day replication cycle. Once you’ve worked that out, and tested it, you should be good to go. Oh, and keep in mind that, if you’re going to failover an entire site to DR, you need to consider the storage subsystem you’ve put in place at the DR site. Imagine trying to run a full-blown ESX environment on 11 1TB SATA-II spindles in RAID 5. Uh-huh.

What have I been doing? – Part 1

I recently had the “pleasure” of working on a project before Christmas that had a number of, er, interesting elements involved.  During the initial scoping, the only thing mentioned was two new arrays (with MirrorView/Asynchronous), a VMware ESX upgrade, and a few new ESX hosts.  But here’s what there really was:

– 4 NetWare 6.5 hosts in 2 NCS clusters;
– An EMC CLARiiON CX200 (remember them?) hosting a large amount (around 5TB) of NetWare and VMware data;
– A single McData switch running version 7 firmware;
– 2 new Dell hosts with incompatible CPUs with the existing 2950 hosts;
– A memory upgrade to the two existing nodes that meant one host had 20GB and the other had 28GB;
– A MirrorView target full of 1TB SATA-II spindles;
– A DR target with only one switch;
– Singley-attached (ie one HBA) hosts everywhere;
– An esXpress installation that needed to be upgraded / re-installed;
– A broken VUM implementation.

Hmmm, sound like fun? It kind of was, just because some of the things I had to do to get it to work were things I wouldn’t normally expect to do.  I don’t know whether this is such a good thing.  There’re a number of things that popped up during the project, each of which would benefit from dedicated blog posts.  But given that I’m fairly lazy, I think I’ll try and cram it all into one post.

Single switches and single HBAs are generally a bad idea

<rant> When I first started working on SANs about 10 minutes ago, I was taught that redundancy in a mid-range system is a good thing. The components that go into your average mid-range systems, while being a bit more reliable than your average gamedude’s gear, are still prone to failure. So you build a level of redundancy into the system such that when, for whatever reason, a component fails (such as a disk, fibre cable, switch or HBA), the system stays up and running. On good systems, the only people who know there’s a failure are the service personnel called out to replace the broken component in question. On a cheapy system, like the one you keep the Marketing Department’s critical morning tea photos on, a few more people might know about it. Mid-range disk arrays can run into the tens and hundreds of thousands of dollars, so sometimes people think that they can save a bit of cash but cutting a few corners by, for example, leaving the nodes with single HBAs, or having only one switch at the DR site, or using SATA as a replication target. But I would argue that, given your spending all of this cash on a decent mid-range array, why wouldn’t you do all you can to ensure it’s available all the time? Saying “My cluster provides the resiliency / We’re not that mission critical / I needed to shave $10K off the price” strikes me as counter-intuitive to the goal of providing reliable, available and sustainable infrastructure solutions. </rant>

All that said, I do understand that sometimes the people making the purchasing decisions aren’t necessarily the best-equipped people to understand the distinction between single- and dual-attached hosts, and what good performance is all about. All I can suggest is that you start with a solid design, and do the best you can to keep that design through to deployment. So what should you be doing? For a simple FC deployment (let’s assume two switches, one array, two HBAs per host), how about something like this?

Zoning

Notice that there’s no connection between the two FC switches here. That’s right kids, you don’t want to merge these fabrics. The idea is that if you munt the config on one switch, it won’t automatically pass that muntedness on to the peer switch. This is a good thing if you, like me, like to do zoning from the CLI but occassionally forget to check the syntax and spelling before you make changes. And for the IBM guys playing at home, the “double redundant loops” excuse doesn’t apply to the CLARiiON. So do yourself a favour, and give yourself 4 paths to the box, dammit!

And don’t listen to Apple and get all excited about just using one switch either – not that they’re necessarily saying that, of course … Or that they’re necessarily saying anything much at all about storage any more, unless Time Capsules count as storage. But I digress …