DanMoz posted on his blog recently that Dell is running another series of EqualLogic Masterclass sessions in the near future. I attended these last year and found the day to be very useful (you can read my posts here, here and here). Register here.
Tag Archives: EQL
Dell EqualLogic MasterClass 2012 – Part 3
Disclaimer: I recently attended the Dell EqualLogic MasterClass 2012. This was a free event delivered by Dell and I was provided with breakfast and lunch, however there was no requirement for me to blog about any of the content presented and I was not compensated in any way for my time at the event.
This is the third of three posts covering the Dell EqualLogic MasterClass that I recently attended. The first post is here. the second post is here.
EqualLogic MasterClass 301: Advanced Features II
Dell described this session as follows: “Dell EqualLogic SAN Headquarters (SAN HQ) gives you the power to do more by providing in-depth reporting and analysis tools. With the information you get from SAN HQ, you can better customise, protect, and optimise your storage. In this session we will cover:
- Introduction of Storage Blade – networking considerations/DCB
- Synchronous Replication Implementation
- Live Demonstration of replication
- Aysnc Failover/Failback
- SyncRep Failover/Switch
- NEW Virtual Storage Manager 3.5
- Monitor EqualLogic with SAN HQ”
So here’re my notes from the third session.
Steven started off with a clarification around VMware snapshots. They are hypervisor-consistent, but don’t talk to application VSS. So if you want to protect Microsoft SQL running on a Windows guest running on ESX, you need to leverage 2 methods to protect both the VM and the application. To do this, install ASM/ME on the guest and go direct via iSCSI. This provides more granularity with a combined approach. Steven then explained how to setup iSCSI initiators on both the guest and ESX host (VM portgroups vs VMkernel portgroups). This design is more advanced than what’s commonly deployed, but it does give you more flexibility for recovery options. Audience feedback (from a bloke I’ve known for years – clearly a troublemaker) is that this configuration caused nothing but trouble when it came to vMotion. A few other people shared their poor experiences at this point as well. Steven’s on the back foot a little at this point, but moves on swiftly. Note that this solution also doesn’t work with VMware SRM. Steven said that VMware were happy to present this solution as recommended at the Dell Storage Forum recently. I’m not entirely convinced that’s a glowing endorsement. Best to say that YMMV, depending on the environment. An audience member then makes the point that you might be unnecessarily over-complicating things to achieve a fairly simple goal – backing up SQL and Exchange – and that there are plenty of other tools you can leverage. You might want to use this if your filesystem is larger than 2TB.
Demo Time – SAN HQ
Steven ran through a demo of SAN HQ 2.5 (early production release, about to be Generally Available). SAN HQ is a monitoring and reporting tool – not a management tool. EQL Management is done primarily through Group Manager, or via the vCenter plug-in or using Microsoft Simple SAN. Went through some of the things it shows you: IOPS, Firmware level, Groups, Capacity and a bunch of other summary information that you can drill down into if required.SAN HQ is a free install on Windows Server 32- or 64-bit. You can get info on latency, iSCSI connections, etc. It also comes with the RAID evaluator, which examines current members, read / write ratios, and current RAID policy. You can then perform some scenario-based stuff like “What happens if I move from RAID 5 to RAID 6 on this member? Will I still be able to meet the current I/O requirements?” Also provides some experimental analysis that can help with capacity planning and when you’ll be hitting performance ceilings based on current usage. If you’re looking at the network part and are seeing in excess of 1% TCP re-transmits, there’s a problem at the network layer (flow control is not enabled, for example). Also does Dell Integrated Support (DIS), which sends diagnostics to Dell Pro Support weekly. Future releases will support more proactive diagnostics. There’s also a bunch of pre-canned reports that can be run, or information can be exported to csv files for further manipulation. Audience question – can you configure e-mail alerting to e-mail when a threshold has been exceeded (ie when latency is above 20ms)? E-mail alerting is configured via Group Manager. There’re only pre-defined alerts available at the moment. Another audience member pointed out that you can configure some specific alerting in SAN HQ, but it’s not quite as specific as hoped.
Demo Time – What’s new in FW6?
Snapshot borrowing. This is a checkbox that can be ticked on a per-volume basis. Snapshot borrowing adheres to the rule of keep count (the number of snapshot reference points that we want to keep online). The point of borrowing is that, instead of deleting old snapshots when you run out of snapshot reserve space, it respects the keep count and uses reserve space from elsewhere to ensure you have a sufficient number of snapshots. You can theoretically dial everything down to 5% snapshot reserve and just borrow everything, but that’s not what this is designed to do. It’s designed to help keep the desired number of snapshots where you have unexpected spikes in your rate of change. Audience question – Where are you borrowing the space from? The first place is the pool’s snapshot reserve. Once this space is consumed, free pool space is used.
Re-thinning. SCSI unmap commands are now supported with FW6. It is supported natively in Windows Server 2012. With Windows 2008 R2, use the CLI or ASM/ME to re-thin the volume. VMware had SCSI unmap support with vSphere 5.0, which was then pulled. It has been made available as of Update 1 as an option with vmkfstools. Linux supports re-claiming thin-provisioned space as well, although Steven couldn’t recall precisely from what version it was supported. Steven then covered off what re-thinning is and why it’s an important capability to have available. Note that re-thinning is an I/O intensive activity. Audience question – can you make the volume smaller? Yes, you always could, but you need to use the CLI to do this.
SyncRep. Synchronous Replication works within a Group and across Pools, whereas Asynchronous Replication works between Groups. To configure SyncRep, you need a minimum of two separate arrays in two separate Pools. It leverages a single iSCSI target portal, and therefore the iSCSI redirection capability of the iSCSI specification. It is designed for low-latency, short-distance (LAN / MAN), sub-5ms, sub-1km. You could go beyond 1km, assuming your latency is sub-5ms. There’s no concept of local and remote reserve or delegated space with SyncRep. Only one iSCSI target device is accessible at any one time (this isn’t an active-active volume mirroring solution). Write performance will be dependent on the bandwidth between Pools. Reads are only coming from the primary storage device. The first iteration of SyncRep does not have automatic failover. Primary and secondary Pools are known as SyncActive and SyncAlternate. The future view is to deliver automatic failover on a per-volume defined basis. A volume cannot be replicated by Synchronous and Asynchronous methods at the same time. More info can be found here.
M4110 Storage Blade
Steven provides a little background on the Dell M1000E blade chassis. M4110 is a dual-width, half-height blade. As noted previously, you can have two in a Group, and the M1000E chassis can support up to four of these. You could have 4 M4110 blades and 16 “quarter-height” blades in one 10RU enclosure. Runs 10Gb. Fabric design on the M1000E is important. With the storage blade, you can use Fabric A or B (one or the other, never both at the same time). Check the version of the chassis that you have. With version 1.0 of the chassis, iSCSI only works in Fabric B. Version 1.1 supports either Fabric A or B. You can use PowerConnect, Force10, or pass-through modules for your chassis networking.
Date Centre Bridging
Also known as Data Centre Ethernet (DCE) or Convergence. EQL arrays support DCB. DCB allows you to carve up a 10Gb link, for example, into a few different traffic types. The Converged Network Adapter (CNA) needs to be supported to give you end-to-end DCB. Benefits? You can do iSCSI over DCB and leverage its lossless capabilities. Steven reiterates that you need a CNA for the initiator, a Force10 switch (if you’re using blades, PowerConnect switches work with rackmount servers), and EQL running 5.1+ firmware.
Fluid FS
Introduced scale-out NAS appliance running Dell’s Fluid FS. Offers SMB or NFS. You can have 2 appliances (4 processing nodes) that can present a single namespace up to 509TB. Replication between NAS appliances is asynchronous, at the Fluid FS layer (replicates from one NAS volume to another NAS volume). If you’re running a mix of file and block, it would be recommended that you stick with asynchronous replication all the way through. Still uses iSCSI network to replicate. Audience question – does it support SMB v3? Not yet. Supports NFS v3 and SMB v1. Audience question – will the maximum size of the namespace be increased any time in the future? Not a top priority right now. Compellent and PowerVault implementations of Fluid FS both scale to one PB at the moment. @DanMoz points out that the filesystem can go bigger, the problem is that the current implementation can only use storage out of one Pool. The most connections you can have is 1024, and the biggest volume is 15TB. So when you have 4 nodes clustered together, by the time the connection count is added up with those 15TB volumes, the biggest you can get is 506TB. This will increase in the future when they increase the maximum connection count on the EqualLogic side of things.
Conclusion
Steven finished off with a demonstration of the new multiple Groups support in VSM in terms of replication and failover. It’s not a replacement for VMware SRM, but it’s still pretty handy. Steven covered a lot of ground in the three 1.5 hour sessions, and I’d like to thank Dell for putting on this event.
Dell EqualLogic MasterClass 2012 – Update
I just noticed that they’ve added a few dates to the MasterClass schedule – so go here and register if you’re into that kind of thing.
Dell EqualLogic MasterClass 2012 – Part 2
Disclaimer: I recently attended the Dell EqualLogic MasterClass 2012. This was a free event delivered by Dell and I was provided with breakfast and lunch, however there was no requirement for me to blog about any of the content presented and I was not compensated in any way for my time at the event.
This is the second of three posts covering the Dell EqualLogic MasterClass that I recently attended. The first one is here.
EqualLogic MasterClass 201: Advanced Features
Dell described this as follows: “The EqualLogic MasterClass 201 course builds upon our 101 Core technology course and provides the opportunity for students to explore further the advanced features and software included in PS Series Storage Array, namely replication and SAN Headquarters Monitoring Tool. By understanding and utilising these advanced features you will be able to maximise your use of the storage array to fulfill business requirements today and well into the future. In this session we will cover:
Disaster Recovery using replication
- Auto-replication
- Auto Snapshot Manager (Microsoft ed)
- Auto Snapshot Manager (VMware ed)
- Off-host backup techniques with EqualLogic
- Visibility to your EqualLogic Storage by leveraging SAN HQ monitoring”
So here’re the rough notes from Session 2.
[Note: I missed the first 15 minutes of this session and had a coffee with @DanMoz instead.]
Snapshot Architecture
There are three different types of snapshots that you can do:
- Crash-consistent (using Group Manager);
- Application-aware; and
- Hypervisor-aware.
Steven then went on to provide an in-depth look at how snapshots work using re-direct on write. Audience question – What if you’re referencing data from both A and A1 data? EQL will move data in the background to optimise for sequential reads. Reads may be proxied initially, (data movement is done when the arrays are not under load). Replicas use the same engine as snapshots, but with smaller segments (talking in KB, not MB). The default snapshot space % when you create a volume is dictated by the Group configuration. You can set this to 0% if you want, and then set snapshot space on each volume as required. You can set to no snapshots if you want, but this will cause VSS-based apps to fail (because there’s no snap space). The default Group configuration is 100% (like for like) snapshot space. With this you can guarantee 100% of the time that you will have at least one PiT copy of the data. The key in setting up snapshots is to know what the rate of change is. Start with a conservative figure, then look at RTO / RPO, how many on-line copies do you want? Think about the keep count – how many snaps you want to keep before the oldest ones get deleted? Audience question – Are there any tools to help identify the rate of change? Steven suggested that you could use SAN HQ to measure this by doing a daily snap and running a report every day to see how much snapshot reserve is being used. Audience question – is there a high watermark for snaps? Policy is to delete older snaps once 100% of reserved space is reached (or take the snapshot off-line). FW6 introduces the concept of snapshot borrowing. If you’re using thin-provisioned volumes, snapshots are thin-provisioned as well. Audience question – Are snapshots distributed across members in the same way as volumes are? Yes.
Manual Tiering
Steven then moved on to a discussion on manual tiering / RAID preferencing. When you create a volume the RAID preference is set to auto – you can’t change that until after it’s created. The RAID preference will either be honoured or not honoured. RAID preferencing can be used to override the capacity load balancer. What happens if you have more volumes asking for a certain RAID type than you have space of that RAID type? At that point the array will randomly choose which volume gets the preference – this can’t be configured. You can’t do a tier within a tier (ie These volumes definitely RAID 1/0, these ones nice to be RAID 1/0). If you have 80% RAID 1/0 capacity, the capacity load balancer will take over and stripe it 50/50 instead, not 80/20. There is one other method, via the CLI, using the bind command. This is generally not recommended. Binds a particular volume to a particular member. RAID preferencing vs binding? If you had two R1/0 members in your pool, RAID preferencing would give you wide-striping across both R1/0 members, binding would not. Audience question – what happens when the member you’ve bound a volume to fails? Volumes affected by a member failure are unavailable until they’re bought back on-line. You don’t lose the data, but they’re unavailable. Audience question – what if you’ve bound a volume to a member and then evacuate that member? It should give you a warning about unbinding first. Last method of tiering is to use different pools (the old way of doing things) – a pool of fast disk and a tier of slow disk, for example.
Brief discussion about HVS – a production-ready, EqualLogic-supported storage VM running on ESX – which is due for release late next year. Some of the suggested use cases for this include remote office deployments, or for cloud providers combining an on-premise hardware and off-premise virtual solution.
Snapshots for VMware
CentOS appliance in ovf format. Talks to the EQL Group and talks to vCenter. Tells vCenter to place a VM in snapshot-mode (*snapshot.vmdk), tells EQL to take a volume (hardware) snap, tells vCenter to put the VM back in production mode, then merges the snap vmdk with the original. Steven noted that the process leverages vCenter capabilities and is not a proprietary process. Uses snapshot reserve space on the array, as opposed to native VMware snapshots which only use space on the datastore.
Demo time
VMware Virtual Storage Manager 3.5 (this was an early production release – General Availability is in the next few weeks). This can now address multiple EQL groups in the back-end. An audience member member noted that you still need 2 VSMs if you’re running vCenter in linked mode. Apparently linked mode is not recommended by EQL support for the VSM. Previously you needed (for replication) a vCenter and VSM appliance at each site to talk to each site’s EQL environment. Now you only need one vCenter and one appliance to talk to multiple groups (ie Prod and DR). Now supports NFS datastore provisioning. Now supports thin-provision stun (with FW6 and ESX5). When a thin-provisioned datastore runs out of space, puts VMs on the datastore in a paused state, caches in memory, notifies the admin, who can remediate, and then brings the VMs back on-line. This a free download from Dell – so get on it. The remote setup wizard is really now only for array initialisation tasks. Previously you could initialize the array, setup MPIO and configure PS Group access – the last two tasks have been moved to ASM/ME. If your VSS snapshots have been failing, maybe the host isn’t able to properly access the VSS volume presented by EQL?
Asynchronous Replication
Talked briefly about the difference between sync and async (async’s smallest window is every 5 minutes). With synchronous replication the array needs an acknowledgement from secondary array (as well as primary) to tell the host I/O has been received. Async replication works between two groups – bi-directional replication is supported. A single volume can only have one partner. IP replication is over iSCSI. What about the MTU? EQL autosenses replication traffic and brings it down to 1500. The local reserve space is defined at the volume level (%) – normally temporary – 5% minimum – up to 200% can be reserved. If the minimum is exceeded, space can be “borrowed”. The remote reserve space is defined at the volume level (%) – eg 100GB, minimum 105GB (one full copy plus 5% of changes) at the remote site. Delegated space on the target is a Group parameter. Holds replicas from other partners and is defined by GB/TB. For “Fast failback”, increase the local reserve space and incorporate fast failback snapshot space. This keeps a last good copy of a volume (pages, not just the page table) on the source site. For example, if you’ve got a volume in sync, you fail it over to DR, do something at Prod, change some stuff on the volume at DR, then failback, pages are still resident at the original source site (Prod), only the deltas are sent back. To do this you’ll need 100% or more of local reserve space configured. If you don’t use a fast failback snapshot, you need to send the whole lot back when doing a failback. Consider how fast you want to failback, as this feature will have an impact on the space you consume to achieve this. A manual transfer utility exists (enables you to use a portable device to seed the data on the DR array before it’s shipped to DR, for example).
More info on EqualLogic Array Software can be found here. For more information on EqualLogic Synchronous Replication (SyncRep), have a look at this Tech Report.
Dell EqualLogic MasterClass 2012 – Part 1
Disclaimer: I recently attended the Dell EqualLogic MasterClass 2012. This was a free event delivered by Dell and I was provided with breakfast and lunch, however there was no requirement for me to blog about any of the content presented and I was not compensated in any way for my time at the event.
This is the first of three posts covering the Dell EqualLogic MasterClass that I recently attended. This was a free training event and, while I haven’t deployed or used EQL gear since 2009, I thought it would be worthwhile going along to refresh my memory and see what’s new.
EqualLogic MasterClass 101: Core Technology
Dell described this as “an entry-level storage course designed for users relatively new to the Dell EqualLogic series of arrays. You will learn what hardware and software features are available with the PS Series Storage Array. In this session we will cover:
- EqualLogic page-based architecture
- EqualLogic load-balancing & host MPIO
- Discuss manual tiering techniques
- Automatic tiering with firmware 5.1
- Group Manager snapshots
- Auto-replication.”
The presenter for all three sessions in Brisbane was Steven Lee, an Enterprise Technologist who’s been with Dell Australia for some time. Here’re my notes in a fairly, er, rough format.
iSCSI Basics
It’s a client / server deployment, using traditional tcp/ip to send SCSI commands. The host is the initiator, the volume is the target, and we use access controls to control what can and can’t see the storage. The iSCSI target portal (in EQL terms this is referred to as the group IP address) masks physical ethernet and mac addresses behind a “Virtual” SAN. iSCSI works best when the MTU is set to 9000 (AKA Jumbo frames). You need it end-to-end or bad things happen.
EQL Basics
The EQL Group uses a Virtual IP Address (or VIP). The EQL Resources are Pools and Members. A Pool contains 1 or more members (EQL iSCSI arrays). You can have up to 4 pools of storage and 8 members per pool. The maximum configuration is 16 members per Group. When you have 1 member in a pool, adding another member to that pool causes the load to be spread across the pool (auto load balanced across the pool – wide-striped, not copied). You can move LUNs from one pool to another, and load balancing does not happen across pools, only within.
EQL is known as scale-out, not scale-up. Every EQL array has drives, controllers, cache and software. You can aggregate performance as you add to the environment. Midrange arrays are traditionally constrained by disk performance (eg 2+2 R 1/0 disk set). When you add another member to EQL Group you get everything additional for added performance. Arguably, with some vendors you can still do this by using MetaLUNs, etc. The point he’s trying to make is that controller performance is also increased. I think maybe a better argument could also be made for improved front-end performance. When pressed he seemed to be comparing EQL scale out with traditional mid-range scale-up via concatenation – I think this message could be improved. But hey, we’re sitting in a Dell course.
Audience question: how does wide-striping work with different capacities? EQL wide-stripes over different RAID types and different disk types. This makes pool design critical for optimal performance.
Portfolio overview
Covered the PS61x0, PS65x0 and PS4000 (now available in 1 or 10Gb options). The PS4000 series can only have 2 per Group (this is a software defined limitation). You can, however, add more PS6000-class arrays to fully populate the Group. Once you’ve added in a PS6000 series into the Group, the limitations on replicas, connections, etc are taken away. As for the maximum number of Group iSCSI connections – the tested limits are 4096 (1024 per Pool). The EQL Storage Blade or PS4110 (put four in a M1000E chassis) has 10Gb connectivity but still has the aforementioned PS4000 series software limitations. It introduces hybrid for this class (Mix of SAS and SSD drives). Hybrid arrays have a single fixed RAID type.
RAID and Firmware
The EQL arrays, generally, support RAID 1/0, 5, 5/0, 6, and 6acc (accelerated). Note also that there is a single RAID type per member. R5/0 is recommended for a balance of capacity and write performance (as it outperforms R6). R6 is recommended for large capacity drives. The number of hot spares configured is dependent on the member type, RAID type, etc. FW 5.25 and 6.01 are recommended firmware levels at the moment. HD FW upgrade KD0A is now available to address some issues with Seagate drive reliability observed in the field in the last 12 months. The FW upgrade should be done during a planned maintenance window as it can have a performance impact. The upgrade is, however, non-disruptive. This firmware is EQL-specific, not for use with the PowerVault range of disk enclosures (someone did ask). As of FW6 R5 is not an option in the Remote Setup Wizard. While it remains available through the CLI, it is no longer recommended by EQL support. No on-line way to move from R5 to R6 currently available. R6acc is only available on hybrid arrays. R6 across the array, with files move back and forth between the two tiers based on access frequency. It uses SSDs as cache, with up to 20GB reserved from the SSDs as “spillover” cache. Hybrid is the only one that does “smart swapping” of data within a member, load balancing is normally done across multiple members.
Host <-> EQL connectivity
Access Control can be performed three ways: using IP / IP range (wildcards), CHAP, or a specific IQN. Host connectivity via iSCSI doesn’t always use the shortest path, so use stacked switches with decent ISLs (on a single, flat subnet). Windows, Linux and VMware are able to perform multipathing on a single iSCSI Target Portal. XenServer no likey this – it wants to see 2 subnets. The Host Integration Toolkit (available for Microsoft, Linux, and a “broader suite” for VMware) provides the host with access to the group ToC – an understanding of data that sits in the back-end. This influences whether the host will make more connections to get to the member it wants to get to. Audience uestion – Is there a limit to the number of SANs a volume will stripe across? No fundamental limit, however the algorithm is optimised for 3 members. This can’t be tuned. Audience question – what happens when you hit the connection limit on a pool? Will it arbitrarily drop host connections? No. But it will start scaling down existing connections to one per host. If that’s still a problem – you may need to rethink what you’re trying to achieve and how you’re trying to achieve it. You can reduce the used iSCSI connection count by going from 1Gb to 10Gb. The default multipathing policy for all hosts by default is 2 sessions – this is a configurable parameter. Change this via the HIT – Remote Setup Wizard – Multi-pathing Component. The default is 6 sessions per Volume, 2 sessions per slice (assuming 3 members in the pool). Dell are looking to improve the connection numbers, but this is not on a roadmap yet. With HIT installed, you can take advantage of not only the array ToC, but also a multipathing policy that uses least queue depth. Multipathing routes traffic to the member with the slice you need, reducing the overall latency on the connections.
Load Balancers
For a good primer on the load balancers used with EQL, check out TR1070 – EqualLogic PS Series Architecture: Load Balancers. There are three load balancers – Capacity, Network, Performance. The Performance LB was introduced in FW5.1. The Capacity LB allocates slices across volumes and ensures that the free space headroom across all members in a pool is roughly the same %. Example using 2TB SSD and 8TB SAS arrays. Using 5TB (50%), LB puts 1TB on SSD, and 4TB on SAS member. What if you have 2TB of SSD and 48TB of NL-SAS? Still using 5TB (10%), 200GB goes to SSD, and 4.8TB goes to NL-SAS. The Network LB works by ensuring traffic in the back-end network is evenly distributed. Looks at port “busyness” and tells the host to connect to another port if the port is too busy. This can be a problem when accessing single-port 10Gb PS6110 arrays. While it’s a trade-off (losing the extra port), the 10Gb ports are rated at “line-speed”. NLB runs every 6 minutes in the background and just checks on port busyness, executing changes (if required) after the port check is done. Automatic Performance LB (APLB) used to focus on RAID (pre FW5.1), now looks at the overall latency of all members in a pool as well. Now also looks at the sub-volume, not just the whole volume. As a result of this, it takes minutes now to move stuff around rather than weeks and days. In the “olden days” (pre 5.1), for example, a SATA array running RAID 1/0 would be treated as faster than a SAS array running RAID 5/0. The goal now, however, is to ensure that the latency across the pool is balanced. This is done with hot-cold page swapping. Pages are 15MB (?) in size. An assessment and moves are made every 2 minutes. This only happens across members that are hosting a portion (pages) belonging to the volume. Members that don’t host the volume do not participate in swapping for that volume. Empty, pre-allocated pages can be swapped as well. Back-end page swapping can impact bandwidth on switches as well. For every active iSCSI port, you should have a 1-to-1 relationship. For example, if you have 3 PS6000s with 12Gb back-end port bandwidth, you should have a 12Gb ISL between your switches. You can get away with 75%. This is critical on 1Gb, not as much on 10Gb infrastructure.
Host Integration Tools
Steven then briefly covered the HIT and talked about the VMware plugin for vCenter, MEM (Multipathing Extension Module), SRM SRA. He also covered the various snapshot tools: ASM/ME ASM/LE VSM/VE.
A member of the audience asked a question about migrating EQL storage from physical hosts to virtual guests, and then we took a 15 minute break.