DanMoz posted on his blog recently that Dell is running another series of EqualLogic Masterclass sessions in the near future. I attended these last year and found the day to be very useful (you can read my posts here, here and here). Register here.
Tag Archives: Equallogic
Dell EqualLogic MasterClass 2012 – Part 3
Disclaimer: I recently attended the Dell EqualLogic MasterClass 2012. This was a free event delivered by Dell and I was provided with breakfast and lunch, however there was no requirement for me to blog about any of the content presented and I was not compensated in any way for my time at the event.
This is the third of three posts covering the Dell EqualLogic MasterClass that I recently attended. The first post is here. the second post is here.
EqualLogic MasterClass 301: Advanced Features II
Dell described this session as follows: “Dell EqualLogic SAN Headquarters (SAN HQ) gives you the power to do more by providing in-depth reporting and analysis tools. With the information you get from SAN HQ, you can better customise, protect, and optimise your storage. In this session we will cover:
- Introduction of Storage Blade – networking considerations/DCB
- Synchronous Replication Implementation
- Live Demonstration of replication
- Aysnc Failover/Failback
- SyncRep Failover/Switch
- NEW Virtual Storage Manager 3.5
- Monitor EqualLogic with SAN HQ”
So here’re my notes from the third session.
Steven started off with a clarification around VMware snapshots. They are hypervisor-consistent, but don’t talk to application VSS. So if you want to protect Microsoft SQL running on a Windows guest running on ESX, you need to leverage 2 methods to protect both the VM and the application. To do this, install ASM/ME on the guest and go direct via iSCSI. This provides more granularity with a combined approach. Steven then explained how to setup iSCSI initiators on both the guest and ESX host (VM portgroups vs VMkernel portgroups). This design is more advanced than what’s commonly deployed, but it does give you more flexibility for recovery options. Audience feedback (from a bloke I’ve known for years – clearly a troublemaker) is that this configuration caused nothing but trouble when it came to vMotion. A few other people shared their poor experiences at this point as well. Steven’s on the back foot a little at this point, but moves on swiftly. Note that this solution also doesn’t work with VMware SRM. Steven said that VMware were happy to present this solution as recommended at the Dell Storage Forum recently. I’m not entirely convinced that’s a glowing endorsement. Best to say that YMMV, depending on the environment. An audience member then makes the point that you might be unnecessarily over-complicating things to achieve a fairly simple goal – backing up SQL and Exchange – and that there are plenty of other tools you can leverage. You might want to use this if your filesystem is larger than 2TB.
Demo Time – SAN HQ
Steven ran through a demo of SAN HQ 2.5 (early production release, about to be Generally Available). SAN HQ is a monitoring and reporting tool – not a management tool. EQL Management is done primarily through Group Manager, or via the vCenter plug-in or using Microsoft Simple SAN. Went through some of the things it shows you: IOPS, Firmware level, Groups, Capacity and a bunch of other summary information that you can drill down into if required.SAN HQ is a free install on Windows Server 32- or 64-bit. You can get info on latency, iSCSI connections, etc. It also comes with the RAID evaluator, which examines current members, read / write ratios, and current RAID policy. You can then perform some scenario-based stuff like “What happens if I move from RAID 5 to RAID 6 on this member? Will I still be able to meet the current I/O requirements?” Also provides some experimental analysis that can help with capacity planning and when you’ll be hitting performance ceilings based on current usage. If you’re looking at the network part and are seeing in excess of 1% TCP re-transmits, there’s a problem at the network layer (flow control is not enabled, for example). Also does Dell Integrated Support (DIS), which sends diagnostics to Dell Pro Support weekly. Future releases will support more proactive diagnostics. There’s also a bunch of pre-canned reports that can be run, or information can be exported to csv files for further manipulation. Audience question – can you configure e-mail alerting to e-mail when a threshold has been exceeded (ie when latency is above 20ms)? E-mail alerting is configured via Group Manager. There’re only pre-defined alerts available at the moment. Another audience member pointed out that you can configure some specific alerting in SAN HQ, but it’s not quite as specific as hoped.
Demo Time – What’s new in FW6?
Snapshot borrowing. This is a checkbox that can be ticked on a per-volume basis. Snapshot borrowing adheres to the rule of keep count (the number of snapshot reference points that we want to keep online). The point of borrowing is that, instead of deleting old snapshots when you run out of snapshot reserve space, it respects the keep count and uses reserve space from elsewhere to ensure you have a sufficient number of snapshots. You can theoretically dial everything down to 5% snapshot reserve and just borrow everything, but that’s not what this is designed to do. It’s designed to help keep the desired number of snapshots where you have unexpected spikes in your rate of change. Audience question – Where are you borrowing the space from? The first place is the pool’s snapshot reserve. Once this space is consumed, free pool space is used.
Re-thinning. SCSI unmap commands are now supported with FW6. It is supported natively in Windows Server 2012. With Windows 2008 R2, use the CLI or ASM/ME to re-thin the volume. VMware had SCSI unmap support with vSphere 5.0, which was then pulled. It has been made available as of Update 1 as an option with vmkfstools. Linux supports re-claiming thin-provisioned space as well, although Steven couldn’t recall precisely from what version it was supported. Steven then covered off what re-thinning is and why it’s an important capability to have available. Note that re-thinning is an I/O intensive activity. Audience question – can you make the volume smaller? Yes, you always could, but you need to use the CLI to do this.
SyncRep. Synchronous Replication works within a Group and across Pools, whereas Asynchronous Replication works between Groups. To configure SyncRep, you need a minimum of two separate arrays in two separate Pools. It leverages a single iSCSI target portal, and therefore the iSCSI redirection capability of the iSCSI specification. It is designed for low-latency, short-distance (LAN / MAN), sub-5ms, sub-1km. You could go beyond 1km, assuming your latency is sub-5ms. There’s no concept of local and remote reserve or delegated space with SyncRep. Only one iSCSI target device is accessible at any one time (this isn’t an active-active volume mirroring solution). Write performance will be dependent on the bandwidth between Pools. Reads are only coming from the primary storage device. The first iteration of SyncRep does not have automatic failover. Primary and secondary Pools are known as SyncActive and SyncAlternate. The future view is to deliver automatic failover on a per-volume defined basis. A volume cannot be replicated by Synchronous and Asynchronous methods at the same time. More info can be found here.
M4110 Storage Blade
Steven provides a little background on the Dell M1000E blade chassis. M4110 is a dual-width, half-height blade. As noted previously, you can have two in a Group, and the M1000E chassis can support up to four of these. You could have 4 M4110 blades and 16 “quarter-height” blades in one 10RU enclosure. Runs 10Gb. Fabric design on the M1000E is important. With the storage blade, you can use Fabric A or B (one or the other, never both at the same time). Check the version of the chassis that you have. With version 1.0 of the chassis, iSCSI only works in Fabric B. Version 1.1 supports either Fabric A or B. You can use PowerConnect, Force10, or pass-through modules for your chassis networking.
Date Centre Bridging
Also known as Data Centre Ethernet (DCE) or Convergence. EQL arrays support DCB. DCB allows you to carve up a 10Gb link, for example, into a few different traffic types. The Converged Network Adapter (CNA) needs to be supported to give you end-to-end DCB. Benefits? You can do iSCSI over DCB and leverage its lossless capabilities. Steven reiterates that you need a CNA for the initiator, a Force10 switch (if you’re using blades, PowerConnect switches work with rackmount servers), and EQL running 5.1+ firmware.
Fluid FS
Introduced scale-out NAS appliance running Dell’s Fluid FS. Offers SMB or NFS. You can have 2 appliances (4 processing nodes) that can present a single namespace up to 509TB. Replication between NAS appliances is asynchronous, at the Fluid FS layer (replicates from one NAS volume to another NAS volume). If you’re running a mix of file and block, it would be recommended that you stick with asynchronous replication all the way through. Still uses iSCSI network to replicate. Audience question – does it support SMB v3? Not yet. Supports NFS v3 and SMB v1. Audience question – will the maximum size of the namespace be increased any time in the future? Not a top priority right now. Compellent and PowerVault implementations of Fluid FS both scale to one PB at the moment. @DanMoz points out that the filesystem can go bigger, the problem is that the current implementation can only use storage out of one Pool. The most connections you can have is 1024, and the biggest volume is 15TB. So when you have 4 nodes clustered together, by the time the connection count is added up with those 15TB volumes, the biggest you can get is 506TB. This will increase in the future when they increase the maximum connection count on the EqualLogic side of things.
Conclusion
Steven finished off with a demonstration of the new multiple Groups support in VSM in terms of replication and failover. It’s not a replacement for VMware SRM, but it’s still pretty handy. Steven covered a lot of ground in the three 1.5 hour sessions, and I’d like to thank Dell for putting on this event.
Dell EqualLogic MasterClass 2012 – Update
I just noticed that they’ve added a few dates to the MasterClass schedule – so go here and register if you’re into that kind of thing.
Dell EqualLogic MasterClass 2012 – Part 2
Disclaimer: I recently attended the Dell EqualLogic MasterClass 2012. This was a free event delivered by Dell and I was provided with breakfast and lunch, however there was no requirement for me to blog about any of the content presented and I was not compensated in any way for my time at the event.
This is the second of three posts covering the Dell EqualLogic MasterClass that I recently attended. The first one is here.
EqualLogic MasterClass 201: Advanced Features
Dell described this as follows: “The EqualLogic MasterClass 201 course builds upon our 101 Core technology course and provides the opportunity for students to explore further the advanced features and software included in PS Series Storage Array, namely replication and SAN Headquarters Monitoring Tool. By understanding and utilising these advanced features you will be able to maximise your use of the storage array to fulfill business requirements today and well into the future. In this session we will cover:
Disaster Recovery using replication
- Auto-replication
- Auto Snapshot Manager (Microsoft ed)
- Auto Snapshot Manager (VMware ed)
- Off-host backup techniques with EqualLogic
- Visibility to your EqualLogic Storage by leveraging SAN HQ monitoring”
So here’re the rough notes from Session 2.
[Note: I missed the first 15 minutes of this session and had a coffee with @DanMoz instead.]
Snapshot Architecture
There are three different types of snapshots that you can do:
- Crash-consistent (using Group Manager);
- Application-aware; and
- Hypervisor-aware.
Steven then went on to provide an in-depth look at how snapshots work using re-direct on write. Audience question – What if you’re referencing data from both A and A1 data? EQL will move data in the background to optimise for sequential reads. Reads may be proxied initially, (data movement is done when the arrays are not under load). Replicas use the same engine as snapshots, but with smaller segments (talking in KB, not MB). The default snapshot space % when you create a volume is dictated by the Group configuration. You can set this to 0% if you want, and then set snapshot space on each volume as required. You can set to no snapshots if you want, but this will cause VSS-based apps to fail (because there’s no snap space). The default Group configuration is 100% (like for like) snapshot space. With this you can guarantee 100% of the time that you will have at least one PiT copy of the data. The key in setting up snapshots is to know what the rate of change is. Start with a conservative figure, then look at RTO / RPO, how many on-line copies do you want? Think about the keep count – how many snaps you want to keep before the oldest ones get deleted? Audience question – Are there any tools to help identify the rate of change? Steven suggested that you could use SAN HQ to measure this by doing a daily snap and running a report every day to see how much snapshot reserve is being used. Audience question – is there a high watermark for snaps? Policy is to delete older snaps once 100% of reserved space is reached (or take the snapshot off-line). FW6 introduces the concept of snapshot borrowing. If you’re using thin-provisioned volumes, snapshots are thin-provisioned as well. Audience question – Are snapshots distributed across members in the same way as volumes are? Yes.
Manual Tiering
Steven then moved on to a discussion on manual tiering / RAID preferencing. When you create a volume the RAID preference is set to auto – you can’t change that until after it’s created. The RAID preference will either be honoured or not honoured. RAID preferencing can be used to override the capacity load balancer. What happens if you have more volumes asking for a certain RAID type than you have space of that RAID type? At that point the array will randomly choose which volume gets the preference – this can’t be configured. You can’t do a tier within a tier (ie These volumes definitely RAID 1/0, these ones nice to be RAID 1/0). If you have 80% RAID 1/0 capacity, the capacity load balancer will take over and stripe it 50/50 instead, not 80/20. There is one other method, via the CLI, using the bind command. This is generally not recommended. Binds a particular volume to a particular member. RAID preferencing vs binding? If you had two R1/0 members in your pool, RAID preferencing would give you wide-striping across both R1/0 members, binding would not. Audience question – what happens when the member you’ve bound a volume to fails? Volumes affected by a member failure are unavailable until they’re bought back on-line. You don’t lose the data, but they’re unavailable. Audience question – what if you’ve bound a volume to a member and then evacuate that member? It should give you a warning about unbinding first. Last method of tiering is to use different pools (the old way of doing things) – a pool of fast disk and a tier of slow disk, for example.
Brief discussion about HVS – a production-ready, EqualLogic-supported storage VM running on ESX – which is due for release late next year. Some of the suggested use cases for this include remote office deployments, or for cloud providers combining an on-premise hardware and off-premise virtual solution.
Snapshots for VMware
CentOS appliance in ovf format. Talks to the EQL Group and talks to vCenter. Tells vCenter to place a VM in snapshot-mode (*snapshot.vmdk), tells EQL to take a volume (hardware) snap, tells vCenter to put the VM back in production mode, then merges the snap vmdk with the original. Steven noted that the process leverages vCenter capabilities and is not a proprietary process. Uses snapshot reserve space on the array, as opposed to native VMware snapshots which only use space on the datastore.
Demo time
VMware Virtual Storage Manager 3.5 (this was an early production release – General Availability is in the next few weeks). This can now address multiple EQL groups in the back-end. An audience member member noted that you still need 2 VSMs if you’re running vCenter in linked mode. Apparently linked mode is not recommended by EQL support for the VSM. Previously you needed (for replication) a vCenter and VSM appliance at each site to talk to each site’s EQL environment. Now you only need one vCenter and one appliance to talk to multiple groups (ie Prod and DR). Now supports NFS datastore provisioning. Now supports thin-provision stun (with FW6 and ESX5). When a thin-provisioned datastore runs out of space, puts VMs on the datastore in a paused state, caches in memory, notifies the admin, who can remediate, and then brings the VMs back on-line. This a free download from Dell – so get on it. The remote setup wizard is really now only for array initialisation tasks. Previously you could initialize the array, setup MPIO and configure PS Group access – the last two tasks have been moved to ASM/ME. If your VSS snapshots have been failing, maybe the host isn’t able to properly access the VSS volume presented by EQL?
Asynchronous Replication
Talked briefly about the difference between sync and async (async’s smallest window is every 5 minutes). With synchronous replication the array needs an acknowledgement from secondary array (as well as primary) to tell the host I/O has been received. Async replication works between two groups – bi-directional replication is supported. A single volume can only have one partner. IP replication is over iSCSI. What about the MTU? EQL autosenses replication traffic and brings it down to 1500. The local reserve space is defined at the volume level (%) – normally temporary – 5% minimum – up to 200% can be reserved. If the minimum is exceeded, space can be “borrowed”. The remote reserve space is defined at the volume level (%) – eg 100GB, minimum 105GB (one full copy plus 5% of changes) at the remote site. Delegated space on the target is a Group parameter. Holds replicas from other partners and is defined by GB/TB. For “Fast failback”, increase the local reserve space and incorporate fast failback snapshot space. This keeps a last good copy of a volume (pages, not just the page table) on the source site. For example, if you’ve got a volume in sync, you fail it over to DR, do something at Prod, change some stuff on the volume at DR, then failback, pages are still resident at the original source site (Prod), only the deltas are sent back. To do this you’ll need 100% or more of local reserve space configured. If you don’t use a fast failback snapshot, you need to send the whole lot back when doing a failback. Consider how fast you want to failback, as this feature will have an impact on the space you consume to achieve this. A manual transfer utility exists (enables you to use a portable device to seed the data on the DR array before it’s shipped to DR, for example).
More info on EqualLogic Array Software can be found here. For more information on EqualLogic Synchronous Replication (SyncRep), have a look at this Tech Report.
Dell EqualLogic MasterClass 2012 – Part 1
Disclaimer: I recently attended the Dell EqualLogic MasterClass 2012. This was a free event delivered by Dell and I was provided with breakfast and lunch, however there was no requirement for me to blog about any of the content presented and I was not compensated in any way for my time at the event.
This is the first of three posts covering the Dell EqualLogic MasterClass that I recently attended. This was a free training event and, while I haven’t deployed or used EQL gear since 2009, I thought it would be worthwhile going along to refresh my memory and see what’s new.
EqualLogic MasterClass 101: Core Technology
Dell described this as “an entry-level storage course designed for users relatively new to the Dell EqualLogic series of arrays. You will learn what hardware and software features are available with the PS Series Storage Array. In this session we will cover:
- EqualLogic page-based architecture
- EqualLogic load-balancing & host MPIO
- Discuss manual tiering techniques
- Automatic tiering with firmware 5.1
- Group Manager snapshots
- Auto-replication.”
The presenter for all three sessions in Brisbane was Steven Lee, an Enterprise Technologist who’s been with Dell Australia for some time. Here’re my notes in a fairly, er, rough format.
iSCSI Basics
It’s a client / server deployment, using traditional tcp/ip to send SCSI commands. The host is the initiator, the volume is the target, and we use access controls to control what can and can’t see the storage. The iSCSI target portal (in EQL terms this is referred to as the group IP address) masks physical ethernet and mac addresses behind a “Virtual” SAN. iSCSI works best when the MTU is set to 9000 (AKA Jumbo frames). You need it end-to-end or bad things happen.
EQL Basics
The EQL Group uses a Virtual IP Address (or VIP). The EQL Resources are Pools and Members. A Pool contains 1 or more members (EQL iSCSI arrays). You can have up to 4 pools of storage and 8 members per pool. The maximum configuration is 16 members per Group. When you have 1 member in a pool, adding another member to that pool causes the load to be spread across the pool (auto load balanced across the pool – wide-striped, not copied). You can move LUNs from one pool to another, and load balancing does not happen across pools, only within.
EQL is known as scale-out, not scale-up. Every EQL array has drives, controllers, cache and software. You can aggregate performance as you add to the environment. Midrange arrays are traditionally constrained by disk performance (eg 2+2 R 1/0 disk set). When you add another member to EQL Group you get everything additional for added performance. Arguably, with some vendors you can still do this by using MetaLUNs, etc. The point he’s trying to make is that controller performance is also increased. I think maybe a better argument could also be made for improved front-end performance. When pressed he seemed to be comparing EQL scale out with traditional mid-range scale-up via concatenation – I think this message could be improved. But hey, we’re sitting in a Dell course.
Audience question: how does wide-striping work with different capacities? EQL wide-stripes over different RAID types and different disk types. This makes pool design critical for optimal performance.
Portfolio overview
Covered the PS61x0, PS65x0 and PS4000 (now available in 1 or 10Gb options). The PS4000 series can only have 2 per Group (this is a software defined limitation). You can, however, add more PS6000-class arrays to fully populate the Group. Once you’ve added in a PS6000 series into the Group, the limitations on replicas, connections, etc are taken away. As for the maximum number of Group iSCSI connections – the tested limits are 4096 (1024 per Pool). The EQL Storage Blade or PS4110 (put four in a M1000E chassis) has 10Gb connectivity but still has the aforementioned PS4000 series software limitations. It introduces hybrid for this class (Mix of SAS and SSD drives). Hybrid arrays have a single fixed RAID type.
RAID and Firmware
The EQL arrays, generally, support RAID 1/0, 5, 5/0, 6, and 6acc (accelerated). Note also that there is a single RAID type per member. R5/0 is recommended for a balance of capacity and write performance (as it outperforms R6). R6 is recommended for large capacity drives. The number of hot spares configured is dependent on the member type, RAID type, etc. FW 5.25 and 6.01 are recommended firmware levels at the moment. HD FW upgrade KD0A is now available to address some issues with Seagate drive reliability observed in the field in the last 12 months. The FW upgrade should be done during a planned maintenance window as it can have a performance impact. The upgrade is, however, non-disruptive. This firmware is EQL-specific, not for use with the PowerVault range of disk enclosures (someone did ask). As of FW6 R5 is not an option in the Remote Setup Wizard. While it remains available through the CLI, it is no longer recommended by EQL support. No on-line way to move from R5 to R6 currently available. R6acc is only available on hybrid arrays. R6 across the array, with files move back and forth between the two tiers based on access frequency. It uses SSDs as cache, with up to 20GB reserved from the SSDs as “spillover” cache. Hybrid is the only one that does “smart swapping” of data within a member, load balancing is normally done across multiple members.
Host <-> EQL connectivity
Access Control can be performed three ways: using IP / IP range (wildcards), CHAP, or a specific IQN. Host connectivity via iSCSI doesn’t always use the shortest path, so use stacked switches with decent ISLs (on a single, flat subnet). Windows, Linux and VMware are able to perform multipathing on a single iSCSI Target Portal. XenServer no likey this – it wants to see 2 subnets. The Host Integration Toolkit (available for Microsoft, Linux, and a “broader suite” for VMware) provides the host with access to the group ToC – an understanding of data that sits in the back-end. This influences whether the host will make more connections to get to the member it wants to get to. Audience uestion – Is there a limit to the number of SANs a volume will stripe across? No fundamental limit, however the algorithm is optimised for 3 members. This can’t be tuned. Audience question – what happens when you hit the connection limit on a pool? Will it arbitrarily drop host connections? No. But it will start scaling down existing connections to one per host. If that’s still a problem – you may need to rethink what you’re trying to achieve and how you’re trying to achieve it. You can reduce the used iSCSI connection count by going from 1Gb to 10Gb. The default multipathing policy for all hosts by default is 2 sessions – this is a configurable parameter. Change this via the HIT – Remote Setup Wizard – Multi-pathing Component. The default is 6 sessions per Volume, 2 sessions per slice (assuming 3 members in the pool). Dell are looking to improve the connection numbers, but this is not on a roadmap yet. With HIT installed, you can take advantage of not only the array ToC, but also a multipathing policy that uses least queue depth. Multipathing routes traffic to the member with the slice you need, reducing the overall latency on the connections.
Load Balancers
For a good primer on the load balancers used with EQL, check out TR1070 – EqualLogic PS Series Architecture: Load Balancers. There are three load balancers – Capacity, Network, Performance. The Performance LB was introduced in FW5.1. The Capacity LB allocates slices across volumes and ensures that the free space headroom across all members in a pool is roughly the same %. Example using 2TB SSD and 8TB SAS arrays. Using 5TB (50%), LB puts 1TB on SSD, and 4TB on SAS member. What if you have 2TB of SSD and 48TB of NL-SAS? Still using 5TB (10%), 200GB goes to SSD, and 4.8TB goes to NL-SAS. The Network LB works by ensuring traffic in the back-end network is evenly distributed. Looks at port “busyness” and tells the host to connect to another port if the port is too busy. This can be a problem when accessing single-port 10Gb PS6110 arrays. While it’s a trade-off (losing the extra port), the 10Gb ports are rated at “line-speed”. NLB runs every 6 minutes in the background and just checks on port busyness, executing changes (if required) after the port check is done. Automatic Performance LB (APLB) used to focus on RAID (pre FW5.1), now looks at the overall latency of all members in a pool as well. Now also looks at the sub-volume, not just the whole volume. As a result of this, it takes minutes now to move stuff around rather than weeks and days. In the “olden days” (pre 5.1), for example, a SATA array running RAID 1/0 would be treated as faster than a SAS array running RAID 5/0. The goal now, however, is to ensure that the latency across the pool is balanced. This is done with hot-cold page swapping. Pages are 15MB (?) in size. An assessment and moves are made every 2 minutes. This only happens across members that are hosting a portion (pages) belonging to the volume. Members that don’t host the volume do not participate in swapping for that volume. Empty, pre-allocated pages can be swapped as well. Back-end page swapping can impact bandwidth on switches as well. For every active iSCSI port, you should have a 1-to-1 relationship. For example, if you have 3 PS6000s with 12Gb back-end port bandwidth, you should have a 12Gb ISL between your switches. You can get away with 75%. This is critical on 1Gb, not as much on 10Gb infrastructure.
Host Integration Tools
Steven then briefly covered the HIT and talked about the VMware plugin for vCenter, MEM (Multipathing Extension Module), SRM SRA. He also covered the various snapshot tools: ASM/ME ASM/LE VSM/VE.
A member of the audience asked a question about migrating EQL storage from physical hosts to virtual guests, and then we took a 15 minute break.
What the Dell just happened? – Dell Storage Forum Sydney 2012 – Part 2
Disclaimer: I recently attended the Dell Storage Forum Sydney 2012. My flights and accommodation were covered by Dell, however there is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Part 2
In this post I’d like to touch briefly on some of the sessions I went to and point you in the direction of some further reading. I’m working on some more content for the near future.
Dell AppAssure Physical, Virtual and Cloud Recovery
If you’re unfamiliar with AppAssure, head over to their website for a fairly comprehensive look at what they can do. Version 5 was recently released. Dan Moz has been banging on about this product to me for a while, and it actually looks pretty good. Andrew Diamond presented a large chunk of the content while battling some time constraints thanks to the keynote running over time, while Dan was demo boy. Here’s a picture with words (a diagram, if you will) that gives an idea of what AppAssure can do.
(Image source – http://www.appassure.com/downloads/Transform_Data_Protection_with_Dell_AppAssure.pdf)
Live Recovery is one of my favourite features. With this it’s “not even necessary to wait for a complete restore to be able to access and use the data”. This is really handy when you’re trying to recover 100s of GB of file data but don’t know exactly what the users will want to access first.
Recovery Assure “detects the presence of Microsoft Exchange and SQL and its respective databases and log files and automatically groups the volumes with dependency for comprehensive protection and rapid recovery”. The cool thing here is that you’re going to be told if there’s going to be SNAFU when you recover before you recover. It’s not going to save your bacon every time, but it’s going to help with avoiding awkward conversations with the GM.
In the next few weeks I’m hoping to put together a more detailed brief on what AppAssure can and can’t do.
A Day in the Life of a Dell Compellent Page: How Dynamic Capacity, Data Instant Replay and Data Progression Work Together
Compellent bought serious tiering tech to Dell upon acquisition, and has really driven the Fluid Data play that’s going on at the moment. This session was all about “closely following a page from first write to demotion to low-cost disk”. Sound dry? I must admit it was a little. It was also, however, a great introduction to how pages move about the Compellent and what that means to storage workloads and efficiency. You can read some more about the Compellent architecture here.
The second half of the session comprised a customer testimonial (an Australian on-line betting company) and brief Q & A with the customer. It was good to see that the customer was happy to tell the truth when pushed about some of the features of the Compellent stack and how it had helped and hurt in his environment. Kudos to my Dell AE for bringing up the question of how FastTrack has helped only to watch the customer reluctantly admit it was one of the few problems he’d had since deploying the solution.
Media Lunch ‘Fluid Data and the Storage Evolution’
When I was first approached about attending this event, the idea was that there’d be a blogger roundtable. For a number of reasons, including availability of key people, that had to be canned and I was invited to attend the media lunch instead. Topics covered during the lunch were basically the same as the keynote, but in a “lite” format. There was also two customers providing testimonials about Dell and how happy they were with their Compellent environments. It wasn’t quite the event that Dell had intended, at least from a blogger perspective, but I think they’re very keen to get more of this stuff happening in the future, with some more focus on the tech rather than the financials. At least, I hope that’s the case.
On the Floor
In the exhibition hall I got to look at some bright shinies and talk to some bright folks about new products that have been released. FluidFS (registration required) is available across the Equallogic, Compellent and PowerVault range now. “With FluidFS, our unified storage systems can manage up to 1PB of file data in a single namespace”. Some people were quite excited about this. I had to check out the FS8600, which is the new Compellent Unified offering.
I also had a quick look at the Dell EqualLogic PS-M4110 Blade Array which is basically a PS4000 running in a blade chassis. You can have up to 4 of these things in a single M1000e chassis, and they support 14 2.5″ drives in a variety of combinations. Interestingly you can only have 2 of these in a single group, so you would need 2 groups per chassis if you fully populated it.
Finally I took a brief gander at a PS6500 Series machine. These are 4RU EQL boxes that take up to 48 spindles and basically can give you a bunch of tiering in a big box with a fairly small footprint.
Swag
As an attendee at the event I was given a backpack, water bottle, some pens, a SNIA Dictionary and a CommVault yo-yo. I’ll let you know if I won a laptop.
I may or may not have had some problems filling out my registration properly though.
Thanks, etc
For an inaugural event, I thought the Dell Storage Forum was great, and I’m stoked that vendors are starting to see the value in getting like-minded folk in the same place to get into useful tech stuff, rather than marketing fluff. Thanks to @DanMoz for getting me down there as a blogger in the first place and for making sure I had everything I needed while I was there. Thanks also to the Dell PR and Events people and the other Dell folks who took the time to say hi and check that everything was cool. It was also nice to meet Simon Sharwood in real life, after reading his articles on The Register and stalking him on twitter.
What the Dell just happened? – Dell Storage Forum Sydney 2012 – Part 1
Disclaimer: I recently attended the Dell Storage Forum Sydney 2012. My flights and accommodation were covered by Dell, however there is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Rather than give you an edited transcript of the sessions I attended, I thought it would be easier if I pointed out some of the highlights. In the next few weeks I’m going to do some more detailed posts, particularly on AppAssure and some of the new Compellent stuff. This is the first time I’ve paid attention to what was going on on stage in terms of things to blog about, so it might be a bit rough around the edges. If it comes across as a bit of propaganda from Dell, well, it was their show. There was a metric shedload of good information presented on the day and I don’t think I could do it justice in one post. And if I hear one more person mention “fluid architecture” I’ll probably lose it.
Part 1
Keynote
Dell is big on the Dell Fluid Data Architecture and they’re starting to execute on that strategy. Introducing the keynote speakers was Jamie Humphrey, Director of Storage and Data Management for Australia & New Zealand. The first speaker introduced was Joe Kremer, Vice President and Managing Director, Dell Australia & New Zealand. He spent some time on the global Dell transformation which involved intellectual property (acquisition and development), progressing Dell’s strategy, and offering solution completeness to customers. He’s also keen to see increased efficiency in the enterprise through standards adoption rather than the use of proprietary systems. Dell are big on simplicity and automation.
Dell is now all about shifting its orientation towards solutions with outcomes rather than the short-term wins they’d previously been focussed on. There have been 24 acquisitions since 2008 (18 since 2010). Perot Systems has apparently contributed significantly in terms of services and reference architectures. There have been 6 storage acquisitions in the last 3 years. Joe also went on to talk about why they went for Equallogic, Compellent, Ocarina, Insite One (a public medical cloud), RNA Networks, AppAssure, Wyse, Force10, and Quest. The mantra seems to be “What do you need? We’ll make it or buy it”. Services people make up the biggest part of the team in Australia now, which is a refreshing change from a few years ago. Dell have also been doing some “on-shoring” of various support teams in Australia, presumably so we’ll feel warm and fuzzy about being that little bit closer to a throat we can choke when we need to.
When Joe was finished, it was time for the expert panel. First up was Brett Roscoe, General Manager and Executive Director, PowerVault and Data Management. He discussed Dell’s opportunity to sell a better “together” story through servers and storage. Nowadays you can buy a closed stack, build it yourself, or do it Dell’s way. Dell wants to put together open storage, server and network to keep costs down, drive automation, ease of use and integration across the product line. The fluid thing is all about everything finding its own level, fitting into whatever container you put it in to. Brett also raised the point that enterprise features from a few years ago are now available in today’s midrange arrays, with midrange prices to match. Dell is keen to keep up the strategy using the following steps: Acquire, Integrate and Innovate. They’re also seeing themselves as the biggest storage start-up in the world, which is a novel concept but makes some sense when you consider the nature of their acquisitions. Dedupe and compression in the filesystem is “coming”. Integration will be the key to Dell successfully executing its strategy. Brett also made some product availability announcements (see On The Floor in Part 2).Brett also had one of the funnier lines of the day – “Before I bring up the smart architect guys, I want to bring up one of our local guys” – when introducing Phil Davis, Vice President, Enterprise Solutions Group, Dell Asia Pacific & Japan to the stage.
They then launched into a series of video-linked whiteboard sessions with a number of “Enterprise Technologists”, with a whiteboard they had setup in front of them being filmed and projected onto the screens in the auditorium so we could see it clearly in the audience. It was a nice way to do the presentation, and a little more engaging than the standard videos and slide deck we normally see with keynotes.
The first discussion was on flash, with a focus on the RNA Networks acquisition. Tim Plaud, Principal Storage Architect at Dell, talked about the move of SSD into the server from the array to avoid the latency. The problem with this? It’s not shared. So why not use it as cache (Fluid Cache)? Devices can communicate with each other over a low latency network using Remote DMA to create a cache pool. Take a 15000 IOPS device in the array, remove the latency (network, controller, SAS) and put it out on the PCI Bus and you can get yourself a 250000 IOPS per device. Now put 4 per server (for Dell 12G servers). How do you protect the write cache? Use cache partners in a physically different server, de-staging in the background in “near real-time”. You can also pick your interface for the cache network. And I’m assuming that Force10 and 40Gb would help here. Servers without the devices can still participate in the cache pool through the use of the software. Cache is de-staged before Replays (snapshots) happen, so the Replays are application- or crash-consistent. Tim also talked about working replication – “Asynchronously, semi-synchronously or truly synchronously”. I’m not sure I want to guess what semi-synchronous is. Upward tiering (to the host), and tiering down / out (to the cloud) is also another strategy that they’re working on.
The second discussion was around how data protection is changing – with RPOs and RTOs getting more insane – driving the adoption of snapshots and replication as protection mechanisms. Mike Davis – Director of Marketing, Storage was called up on stage to talk about AppAssure. He talked about how quickly the application can be back on-line after a failure as the primary driver in a number of businesses. AppAssure promises to do not only the data, but the application state as well, while providing flexible recovery options. AppAssure also promises efficiency through the use of incremental forever and dedupe and compression. AppAssure uses a “Core” server as the primary component – just set one up wherever you might want to recover to – be that a Disaster Recovery site, the cloud, or another environment within the same data centre. You can also use AppAssure to replicate from CMP to EQL to Cloud, etc.
The final topic – software architecture to run in a cloud environment on Equallogic – was delivered by Mark Keating, Director of Storage QA at Dell. He talked about how the array is traditionally comprised of the Management layer / Virtualisation (abstraction) layer / Platform (controllers, drives, RAID, FANs). Dell want to be de-coupling these layers in the future. With Host Virtualized Storage (HVS) they’ll be able to do this, and it’s expected sometime next year. Take the management and virtualisation layer and put them in the cloud as a virtual workload. Use any hardware you want but keep the application integration and scalability of Equallogic (because they love the software on the Equallogic, the rest is just tin). Use cases? Tie it to a virtual application. Make a SAN for Exchange, make one for SQL. Temporary expansion of EQL capacity in the cloud is possible. Use it as a replication target. Multiple “SANs” on the same infrastructure as a means of providing simple multi-tenancy. It’s an interesting concept, and something I’d like to explore further. It also raises a lot of questions about the underlying hardware platform, and just how much you can do with software before being limited by, presumably, the cheap, commodity hardware that it sits on.
File system Alignment redux
So I wrote a post a little while ago about filesystem alignment, and why I think it’s important. You can read it here. Obviously, the issue of what to do with guest OS file systems comes up from time to time too. When I asked a colleague to build some VMs for me in our lab environment with the system disks aligned he dismissed the request out of hand and called it an unnecessary overhead. I’m kind of at that point in my life where the only people who dismiss my ideas so quickly are my kids, so I called him on it. He promptly reached for a tattered copy of EMC’s Techbook entitled “Using EMC CLARiiON Storage with VMware vSphere and VMware Infrastructure” (EMC P/N h2197.5 – get it on Powerlink). He then pointed me to this nugget from the book.
I couldn’t let it go, so I reached for my copy (version 4 versus his version 3.1), and found this:
We both thought this wasn’t terribly convincing one way or another, so we decided to test it out. The testing wasn’t super scientific, nor was it particularly rigorous, but I think we got the results that we needed to move forward. We used Passmark‘s PerformanceTest 7.0 to perform some basic disk benchmarks on 2 VMs – one aligned and one not. These are the settings we used for Passmark:
As you can see it’s a fairly simple setup that we’re running with. Now here’s the results of the unaligned VM benchmark.
And here’s the results of the aligned VM.
We ran the tests a few more times and got similar results. So, yeah, there’s a marginal difference in performance. And you may not find it worthwhile pursuing. But I would think, in a large environment like ours where we have 800+ VMs in Production, surely any opportunity to reduce the workload on the array should be taken? Of course, this all changes with Windows 2008. So maybe you should just sit tight until then?
Dell PowerConnect and Jumbo Frames
A friend of mine had a problem recently attaching some EqualLogic storage to some vSphere hosts using Dell PowerConnect switches. You’ll notice that it wasn’t me doing the work, so I’ve had to resort to reporting on other people doing interesting or not so interesting things. In any case, he was seeing a lot of flakiness whenever he tried to do anything with the new volumes on the ESX hosts. We went through the usual troubleshooting routine and discussed whther it was either a problem with the ESX hosts (running latest update ESX 4) or something to do with the network.
He had enabled jumbo frames all the way through (host -> switch -> array). In vSphere, you set the packet size to 9000. On the EqualLogic PS Series you set the MTU to 9000. Apparently, on the Dell PowerConnect switches, you don’t. You set it to 9216. For those of you familiar with maths, 9124 is 9 * 1024. Amazing huh? Yes, that’s right, it follows that 9000 is 9 * 1000. Okay stop now. It’s amazing that 124 could make such a difference, but, er, I guess computers need a level of accuracy to do their thing.
console# configure console(config)# interface range ethernet all console(config-if)# mtu 9216 console(config-if)# exit console(config)# exit console# copy running-config startup-config console# exit
New article added to articles page
I’ve added a new article to the articles page. While I agree that a bunch of screenshots do not a great technical document make, I think this is a useful visual guide for the first timer. Oh yeah, it covers the basic initialisation process used to deploy Dell | EqualLogic PS5xx0 Series arrays using 3.x firmware. Sure, it might be a little dated. Sure, I started writing it last year some time and then left it flailing about for some time. Sure, I probably could have left in my drafts queue forever. But I thought it would be nice to have something to refer back to that didn’t require logging in to the Dell website. You might find some of it useful too.