EMC – Silly things you can do with stress testing – Part 2

I’ve got a bunch of graphs that indicate you can do some bad things to EFDs when you run certain SQLIO stress tests against them and compare the results to FC disks. But EMC is pushing back on the results I’ve gotten for a number of reasons. So in the interests of keeping things civil I’m not going to publish them – because I’m not convinced the results are necessarily valid and I’ve run out of time and patience to continue testing. Which might be what EMC hoped for – or I might just be feeling a tad cynical.

What I have learnt though, is that it’s very easy to generate QFULL errors on a CX4 if you follow the EMC best practice configs for Qlogic HBAs and set the execution throttle to 256. In fact, you might even be better off leaving it at 16, unless you have a real requirement to set it higher. I’m happy for someone to tell me why EMC suggests it be set to 256, because I’ve not found a good reason for it yet. Of course, this is dependent on a number of environmental factors, but the 256 figure still has me scratching my head.

Another thing that we uncovered during stress testing had something to do with the Queue Depth of LUNs. For our initial testing, we had a Storage Pool created with 30 * 200GB EFDs, 70 * 450GB FC spindles, and 15 * 1TB SATA-II Spindles with FAST-VP enabled. The LUNs on the EFDs were set to no data movement – so everything sat on the EFDs. We were getting kind of underwhelming performance stats out of this config, and it seems like the main culprit was the LUN queue depth. In a traditonal RAID Group setup, the queue depth of the LUN is (14 * (the number of data drives in the LUN) + 32). So for a RAID 5 (4+1) LUN, the queue depth is 88. If, for some reason, you want to drive a LUN harder, you can increase this by using MetaLUNs, with the sum of the components providing the LUN’s queue depth. What we observed on the Pool LUN, however, was that this seemed to stay fixed at 88, regardless of the number of internal RAID Groups servicing the Pool LUN. This seems like it’s maybe a bad thing, but that’s probably why EMC quietly say that you should stick to traditional MetaLUNs and RAID Groups if you need particular performance characteristics.

So what’s the point I’m trying to get at? Storage Pools and FAST-VP are awesome for the majority of workloads, but sometimes you need to use more traditional methods to get what you want. Which is why I spent last weekend using the LUN Migration tool to move 100TB of blocks around the array to get back to the traditional RAID Group / MetaLUN model. Feel free to tell me if you think I’ve gotten this arse-backwards too, because I really want to believe that I have.

6 Comments

  1. Hi Dan,

    we are also moving around TBs of data between our pools and back to traditional RAID groups.
    What we and EMC learned from that lesson: Before implementing FAST II and activate Auto Tiering you need to analyze your datastreams to identify hotspots and performance characteristics. Not all data is best placed in a sotrage pool with Auto Tiering enabled.

    Do you have any experience with VMware SIOC and/or EMC NQM in cooperation with FAST (II)?

    Regards,

    daniel

  2. Hi Daniel,

    We turned off SIOC initially because it was causing cosmetic issues with mirrored volumes. We have NQM but haven’t tested it in conjunction with FAST tiering. We’re a few weeks away from getting some EFDs in our lab to test FAST and FAST cache with the latest FLARE, so I’ll add those things to our list of things to look at.

    Cheers
    Dan

  3. Which FLARE rev you at? .517 fixes some important FAST cache bugs with pools.

    Dave

  4. Hi Dave,
    We did this testing on .509. We’re moving to .517 next week. The thing we want to test though is the queue depth performance between pools / non-pools – because that’s what made us turn our backs on pools initially. We’ve also been getting conflicting stories about FAST Cache / Pool LUNs / MV/S. Will let you know how it goes.

  5. I think an interesting thing to test would be to test a traditional RG R1+0 LUN with say 24 disks. Place 12 disks on a different bus than the 2nd 12 disks. Also make sure that the data/mirror disks are split correctly across the buses. Meaning 3_1_0 2_1_0 3_1_1 2_1_1 …. etc. Then… test this same theory with pools. My initial experience is that when you create the pool from the CLI, even though you split the data/mirror disks across the buses, they don’t seem to be in that order when you display the pool from Naviseccli. Hence, does this affect performance? What does the heat map show?

    naviseccli -address storagepool -list -name

  6. Hi Dave, I agree that would be an interesting test. Unfortunately we’re only using CX4-120s in our lab at the moment – although I may have some spare spindles on one of the 4-960s that we could test this on. We didn’t have access to heat maps when we tested – we were just giving SQLIO numbers to EMC and they were working back from there. But I will be doing some testing in the next few weeks and publishing the results from our labs to either prove or disprove my theory that Pools aren’t as peachy, at least on the CX4, as people would like to believe.

Comments are closed.