Dell EqualLogic MasterClass 2012 – Part 3

Disclaimer: I recently attended the Dell EqualLogic MasterClass 2012.  This was a free event delivered by Dell and I was provided with breakfast and lunch, however there was no requirement for me to blog about any of the content presented and I was not compensated in any way for my time at the event.

This is the third of three posts covering the Dell EqualLogic MasterClass that I recently attended. The first post is here. the second post is here.

EqualLogic MasterClass 301: Advanced Features II

Dell described this session as follows: “Dell EqualLogic SAN Headquarters (SAN HQ) gives you the power to do more by providing in-depth reporting and analysis tools. With the information you get from SAN HQ, you can better customise, protect, and optimise your storage. In this session we will cover:

  • Introduction of Storage Blade – networking considerations/DCB
  • Synchronous Replication Implementation
  • Live Demonstration of replication
  • Aysnc Failover/Failback
  • SyncRep Failover/Switch
  • NEW Virtual Storage Manager 3.5
  • Monitor EqualLogic with SAN HQ”

So here’re my notes from the third session.

Steven started off with a clarification around VMware snapshots. They are hypervisor-consistent, but don’t talk to application VSS. So if you want to protect Microsoft SQL running on a Windows guest running on ESX, you need to leverage 2 methods to protect both the VM and the application. To do this, install ASM/ME on the guest and go direct via iSCSI. This provides more granularity with a combined approach. Steven then explained how to setup iSCSI initiators on both the guest and ESX host (VM portgroups vs VMkernel portgroups). This design is more advanced than what’s commonly deployed, but it does give you more flexibility for recovery options. Audience feedback (from a bloke I’ve known for years – clearly a troublemaker) is that this configuration caused nothing but trouble when it came to vMotion. A few other people shared their poor experiences at this point as well. Steven’s on the back foot a little at this point, but moves on swiftly. Note that this solution also doesn’t work with VMware SRM. Steven said that VMware were happy to present this solution as recommended at the Dell Storage Forum recently. I’m not entirely convinced that’s a glowing endorsement. Best to say that YMMV, depending on the environment. An audience member then makes the point that you might be unnecessarily over-complicating things to achieve a fairly simple goal – backing up SQL and Exchange – and that there are plenty of other tools you can leverage. You might want to use this if your filesystem is larger than 2TB.

Demo Time – SAN HQ

Steven ran through a demo of SAN HQ 2.5 (early production release, about to be Generally Available). SAN HQ is a monitoring and reporting tool – not a management tool. EQL Management is done primarily through Group Manager, or via the vCenter plug-in or using Microsoft Simple SAN. Went through some of the things it shows you: IOPS, Firmware level, Groups, Capacity and a bunch of other summary information that you can drill down into if required.SAN HQ is a free install on Windows Server 32- or 64-bit. You can get info on latency, iSCSI connections, etc. It also comes with the RAID evaluator, which examines current members, read / write ratios, and current RAID policy. You can then perform some scenario-based stuff like “What happens if I move from RAID 5 to RAID 6 on this member? Will I still be able to meet the current I/O requirements?” Also provides some experimental analysis that can help with capacity planning and when you’ll be hitting performance ceilings based on current usage. If you’re looking at the network part and are seeing in excess of 1% TCP re-transmits, there’s a problem at the network layer (flow control is not enabled, for example). Also does Dell Integrated Support (DIS), which sends diagnostics to Dell Pro Support weekly. Future releases will support more proactive diagnostics. There’s also a bunch of pre-canned reports that can be run, or information can be exported to csv files for further manipulation. Audience question – can you configure e-mail alerting to e-mail when a threshold has been exceeded (ie when latency is above 20ms)? E-mail alerting is configured via Group Manager. There’re only pre-defined alerts available at the moment. Another audience member pointed out that you can configure some specific alerting in SAN HQ, but it’s not quite as specific as hoped.

 

Demo Time – What’s new in FW6?

Snapshot borrowing. This is a checkbox that can be ticked on a per-volume basis. Snapshot borrowing adheres to the rule of keep count (the number of snapshot reference points that we want to keep online). The point of borrowing is that, instead of deleting old snapshots when you run out of snapshot reserve space, it respects the keep count and uses reserve space from elsewhere to ensure you have a sufficient number of snapshots. You can theoretically dial everything down to 5% snapshot reserve and just borrow everything, but that’s not what this is designed to do. It’s designed to help keep the desired number of snapshots where you have unexpected spikes in your rate of change. Audience question – Where are you borrowing the space from? The first place is the pool’s snapshot reserve. Once this space is consumed, free pool space is used.

Re-thinning. SCSI unmap commands are now supported with FW6. It is supported natively in Windows Server 2012. With Windows 2008 R2, use the CLI or ASM/ME to re-thin the volume. VMware had SCSI unmap support with vSphere 5.0, which was then pulled. It has been made available as of Update 1 as an option with vmkfstools. Linux supports re-claiming thin-provisioned space as well, although Steven couldn’t recall precisely from what version it was supported. Steven then covered off what re-thinning is and why it’s an important capability to have available. Note that re-thinning is an I/O intensive activity. Audience question – can you make the volume smaller? Yes, you always could, but you need to use the CLI to do this.

SyncRep. Synchronous Replication works within a Group and across Pools, whereas Asynchronous Replication works between Groups. To configure SyncRep, you need a minimum of two separate arrays in two separate Pools. It leverages a single iSCSI target portal, and therefore the iSCSI redirection capability of the iSCSI specification. It is designed for low-latency, short-distance (LAN / MAN), sub-5ms, sub-1km. You could go beyond 1km, assuming your latency is sub-5ms. There’s no concept of local and remote reserve or delegated space with SyncRep. Only one iSCSI target device is accessible at any one time (this isn’t an active-active volume mirroring solution). Write performance will be dependent on the bandwidth between Pools. Reads are only coming from the primary storage device. The first iteration of SyncRep does not have automatic failover. Primary and secondary Pools are known as SyncActive and SyncAlternate. The future view is to deliver automatic failover on a per-volume defined basis. A volume cannot be replicated by Synchronous and Asynchronous methods at the same time. More info can be found here.

 

M4110 Storage Blade

Steven provides a little background on the Dell M1000E blade chassis. M4110 is a dual-width, half-height blade. As noted previously, you can have two in a Group, and the M1000E chassis can support up to four of these. You could have 4 M4110 blades and 16 “quarter-height” blades in one 10RU enclosure. Runs 10Gb. Fabric design on the M1000E is important. With the storage blade, you can use Fabric A or B (one or the other, never both at the same time). Check the version of the chassis that you have. With version 1.0 of the chassis, iSCSI only works in Fabric B. Version 1.1 supports either Fabric A or B. You can use PowerConnect, Force10, or pass-through modules for your chassis networking.

 

Date Centre Bridging

Also known as Data Centre Ethernet (DCE) or Convergence. EQL arrays support DCB. DCB allows you to carve up a 10Gb link, for example, into a few different traffic types. The Converged Network Adapter (CNA) needs to be supported to give you end-to-end DCB. Benefits? You can do iSCSI over DCB and leverage its lossless capabilities. Steven reiterates that you need a CNA for the initiator, a Force10 switch (if you’re using blades, PowerConnect switches work with rackmount servers), and EQL running 5.1+ firmware.

 

Fluid FS

Introduced scale-out NAS appliance running Dell’s Fluid FS. Offers SMB or NFS. You can have 2 appliances (4 processing nodes) that can present a single namespace up to 509TB. Replication between NAS appliances is asynchronous, at the Fluid FS layer (replicates from one NAS volume to another NAS volume). If you’re running a mix of file and block, it would be recommended that you stick with asynchronous replication all the way through. Still uses iSCSI network to replicate. Audience question – does it support SMB v3? Not yet. Supports NFS v3 and SMB v1. Audience question – will the maximum size of the namespace be increased any time in the future? Not a top priority right now. Compellent and PowerVault implementations of Fluid FS both scale to one PB at the moment. @DanMoz points out that the filesystem can go bigger, the problem is that the current implementation can only use storage out of one Pool. The most connections you can have is 1024, and the biggest volume is 15TB. So when you have 4 nodes clustered together, by the time the connection count is added up with those 15TB volumes, the biggest you can get is 506TB. This will increase in the future when they increase the maximum connection count on the EqualLogic side of things.

 

Conclusion

Steven finished off with a demonstration of the new multiple Groups support in VSM in terms of replication and failover. It’s not a replacement for VMware SRM, but it’s still pretty handy. Steven covered a lot of ground in the three 1.5 hour sessions, and I’d like to thank Dell for putting on this event.