Dell Technologies World 2018 – storage.38 – Dell EMC Unity: Performance Best Practices Notes

Disclaimer: I recently attended Dell Technologies World 2018.  My flights, accommodation and conference pass were paid for by Dell Technologies via the Press, Analysts and Influencers program. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my rough notes from the storage.38 session. This was presented by Stephen Wright and covered Dell EMC Unity: Performance Best Practices. Firstly, though, you should read Dell EMC’s Unity Best Practices Guide.

Dell EMC Midrange Family

SC Series and Dell EMC Unity

Common Tools

  • PowerPath
  • ViPR
  • VPLEX
  • RecoverPoint
  • Avamar and NetWorker
  • Data Domain

Industry’s #1 Midrange Portfolio

SC Series

  • Intelligent Efficiency
    • Post-process data reduction
  • Federated
    • Data mobility across multiple systems
  • Best economics
    • Low entry price
    • Lowest $/GB

Dell EMC Unity

  • Inline efficiency
    • Inline data reduction
  • Unified
    • Unified file and block data
  • Integrated hybrid cloud
    • Unified cloud tiering

Agenda

Background

  • What is Performance? What are “Best Practices”?
  • Evolution of storage best practices

Hardware

  • Unity All-Flash considerations
  • Unity Hybrid considerations

Features

  • Data reduction, snapshots, replication

 

Background

What is Performance?

“The ability to do the requested work in the required period of time”

  • IOPs (small transactions), MB/s (bulk data)
  • Latency and Response Time (Individual transactions)
  • Window, Job (Batch transactions)

What are Best Practices?

Configuration Guidance

  • Recommendations for options
  • Advice based o experience
  • Responsive to your application
  • Best behaviour for your needs

Evolution of Storage Best Practices

Unity Simplicity

  • Removes the need for detailed tweaking
  • Let the system do the right thing for you

Unified File and Block – same recommendations apply

  • One set of common guidance

Flash – changes the game for storage performance

  • Stress on other components

Quantum Leap of Flash

Recommended maximum IOPS per drive – don’t use these for sizing – these numbers are speed limits and are generally based on small-block random workloads.

  • NL-SAS – 150 IOPS
  • SAS 10K RPM – 250 IOPS
  • SAS 15K RPM – 350 IOPS
  • Flash – 20000 IOPS

The Flash Effect, and CPU utilisation

  • Flash is fast, and Dell EMC Unity can support hundreds of drives
  • Driving a lot of Flash can take a lot of CPU power
  • Provide best practices around CPU utilisation
Average CPU Utilisation Below 50% 50% to 70% 70% to 90% Above 90%
Latency Yes Yes Yes Caution
High Availability Yes Yes Caution No

Approaching Best Practices: AFA or Hybrid?

Hardware Considerations

  • All-Flash
    • Drives are most likely not a bottleneck
    • Focus on maximising other hardware resources
  • Hybrid
    • HDD performance can be determining factor
    • A little Flash can add a lot of capability

Features

  • Data reduction
  • Snapshots
  • Replication
  • Both block and file

CPU Power and Flash Considerations

With All-Flash, CPU becomes the driving factor

  • CPU power has largest impact on achievable performance
  • Memory has largest impact on scalability

As of 4.3 online data-in-place conversions now available

Balanced Access – Back-end SAS

At least use the two onboard

  • Maybe you also want the SAS expansion? (Up to 6 buses)

Largest impact is on bandwidth. Dell EMC advertise 5GB/s of bandwidth through the SAS bus.

Flash drives per bus recommendation? Take how ever many you have, and spread them across the buses you have

Balanced access – FC Ports

  • For HA, zone 1 initiator to 1 port from SPA, 1 port from SPB
  • For HA + load balancing, zone 2 ports per SP
  • Cable and use as many front-end ports as possible
  • We recommend at least 4 ports per SP in U3x0 and U4x0
  • At least 6 ports per SP in U5x0 and U6x0

Balanced Access – Unity File

  • Balance resource utilisation with file
    • Means multiple NAS Servers
    • Using multiple Ethernet ports (can leverage LACP)
    • Failsafe Networking (FSN)

Front-end port considerations

  • Speed is good – use faster ports when available
  • Understand port limits – consult best practices guide
  • Use more ports – better distribution across cores

Hybrid Considerations

  1. All previous considerations
  2. Size for HDD constraints
  3. Leverage Flash Tier, FAST VP
  4. Configure FAST Cache

FAST VP is at the pool level, FAST Cache is a global resource

 

Feature Considerations

Features Overview

In Dell EMC Unity, all system resources are always available

  • Architectural philosophy
  • CPUs are note reserved for any particular process

Features requires resources

  • Use additional CPU and may add drive IOPS
  • CPU cycles can shift as defined workload changes

e.g. RAID 6 may take a little more CPU than RAID 1/0. Same goes for Snapshots, data reduction and replication.

Decision Tree for Enabling Features

  • Understand that enabling a feature may increase CPU utilisation
  • This chart represents average CPU before implementing feature
Average CPU Utilisation Below 50% 50% to 70% 70% to 90% Above 90%
Snapshots / Replication Yes Yes Caution No
Data Reduction Yes Caution Caution No

 

Data Reduction

Prior to 4.3, offered compression

As of OE v4.3, deduplication has been added, and together these provide data reduction

Data reduction

  • Block and file objects
  • All-Flash pools
  • Enabled together
  • Automatically licensed

How does it work?

  • Data acknowledged in write cache
  • Check for patterns
  • Compress data if needed

Improved Performance

  • Reduced overhead when pattern is found
  • Code optimisations

Considerations

  • Latency impacts
  • CPU resource consumption
  • Refer to decision tree

Snapshots

  • Use less aggressive snapshot schedules (number of objects increases – decrease the snapshot schedule frequency)
  • Stagger snapshot schedules

Asynchronous replication

  • Leverages snapshots
    • Similar considerations
    • RPO = snapshot schedule
    • Longer RPOs with lots of replicated objects
  • Consider port capabilities
    • Multiple links per SP
    • Higher speed ports

Synchronous Replication

  • Real-time replication over FC link
  • Latency is key
  • Zone so that clients are not on the replication link

 

Summary

Appropriate Model

  • Choose the right model, based on CPU power
  • Online Data-in-place conversions to move to more powerful model
  • Consider differences between All-Flash and Hybrid

Enough Hardware

  • Have enough drives / enclosures / ports?
  • Utilise all ports

Flash Acceleration

  • Start with Flash
  • All-Flash
  • Flash tier (Hybrid)

Plan for features

  • Consider CPU consumption
  • Snapshot schedules
  • Replication RPOs

 

Useful session. 4 stars.

EMC – Why FAST VP Best Practice is Best Practice

Those of you fortunate enough to have worked with me in a professional capacity will know that I’m highly opinionated. I generally try not to be opinionated on this blog, preferring instead to provide guidance on tangible technical things. On this occasion, however, I’d like to offer my opinion. I overheard someone in the office recently saying that best practices are just best practices, you don’t have to follow them. Generally speaking, they’re right. You don’t have to do what the vendor tells you, particularly if it doesn’t suit your environment, circumstances, whatever. What annoys me, though, is the idea that’s been adopted by a few in my industry that they can just ignore documents that cover best practices because there’s no way the vendor would know what’s appropriate for their environment. At this point I call BS. These types of documents are put out there because the vendor wants you to use their product in the way it was meant to be used. And – get this – they want you to get value from using their product. The idea being that you’ll be happy with the product, and buy from the vendor again.

BP Guides aren’t just for overpaid consultants to wave at know-nothing customers. They’re actually really useful guidelines around which you can base your designs. Crazy notion, right?

So, to my point. EMC recommend, when you’re using FAST VP on the CLARiiON / VNX, to leave 10% free space in your tiers. The reason they recommend this is that they want FAST VP to have sufficient space to move slices between tiers. Otherwise you’ll get errors like this “712d841a Could not complete operation Relocate 0xB00031ED4 allocate slice failed because 0xe12d8709”. And you’ll get lots of them. Which means that FAST is unable to move slices around the pool. In which case why did you by FAST in the first place? For more information on these errors, check out emc274840 and emc286486 on Powerlink.

If you want an easy way to query a pool’s capacity, use the following naviseccli command:

naviseccli -h ipaddress storagepool -list -tiers
Pool Name: SP_DATA_1
Pool ID: 3

Tier Name: FC
Raid Type: r_5
User Capacity (GBs): 33812.06
Consumed Capacity (GBs): 15861.97
Available Capacity (GBs): 17950.10
Percent Subscribed: 46.91%
Data Targeted for Higher Tier (GBs): 0.00
Data Targeted for Lower Tier (GBs): 0.00
Disks (Type):

Bus 6 Enclosure 7 Disk 14 (Fibre Channel)
Bus 6 Enclosure 7 Disk 12 (Fibre Channel)
Bus 6 Enclosure 7 Disk 10 (Fibre Channel)
Bus 3 Enclosure 5 Disk 3 (Fibre Channel)
Bus 3 Enclosure 5 Disk 1 (Fibre Channel)
Bus 4 Enclosure 5 Disk 2 (Fibre Channel)
Bus 4 Enclosure 5 Disk 0 (Fibre Channel)
[snip]
Bus 2 Enclosure 6 Disk 14 (Fibre Channel)
Bus 2 Enclosure 6 Disk 12 (Fibre Channel)
Bus 2 Enclosure 6 Disk 10 (Fibre Channel)
Bus 0 Enclosure 2 Disk 0 (Fibre Channel)
Bus 5 Enclosure 6 Disk 8 (Fibre Channel)
Bus 3 Enclosure 2 Disk 4 (Fibre Channel)
Bus 7 Enclosure 5 Disk 6 (Fibre Channel)

Pool Name: SP_TEST_10
Pool ID: 2
Tier Name: FC
Raid Type: r_10
User Capacity (GBs): 1600.10
Consumed Capacity (GBs): 312.02
Available Capacity (GBs): 1288.08
Percent Subscribed: 19.50%
Data Targeted for Higher Tier (GBs): 0.00
Data Targeted for Lower Tier (GBs): 0.00
Disks (Type):
Bus 1 Enclosure 7 Disk 3 (Fibre Channel)
Bus 1 Enclosure 7 Disk 5 (Fibre Channel)
Bus 1 Enclosure 7 Disk 7 (Fibre Channel)
Bus 1 Enclosure 7 Disk 2 (Fibre Channel)
Bus 1 Enclosure 7 Disk 4 (Fibre Channel)
Bus 1 Enclosure 7 Disk 6 (Fibre Channel)
Bus 1 Enclosure 7 Disk 9 (Fibre Channel)
Bus 1 Enclosure 7 Disk 8 (Fibre Channel)

And if you want to get the status of FAST VP operations on your pools, use the following command:

naviseccli -h ipaddress autotiering -info -opstatus
Storage Pool Name: SP_DATA_1
Storage Pool ID: 3
Relocation Start Time: N/A
Relocation Stop Time: N/A
Relocation Status: Inactive
Relocation Type: N/A
Relocation Rate: N/A
Data to Move Up (GBs): 0.00
Data to Move Down (GBs): 0.00
Data Movement Completed (GBs): N/A
Estimated Time to Complete: N/A
Schedule Duration Remaining: N/A

Storage Pool Name: SP_TEST_10
Storage Pool ID: 2
Relocation Start Time: N/A
Relocation Stop Time: N/A
Relocation Status: Inactive
Relocation Type: N/A
Relocation Rate: N/A
Data to Move Up (GBs): 0.00
Data to Move Down (GBs): 0.00
Data Movement Completed (GBs): N/A
Estimated Time to Complete: N/A
Schedule Duration Remaining: N/A

And next time you’re looking at a pool with tiers that are full, think about what you can do to alleviate the issue, and think about why you’ve automatically ignored the best practices guide.