EMC – Next-Generation VNX – Deduplication

In my previous post on the Next-Generation VNX, I spoke about some of the highlights of the new platform. In this post I’d like to dive a little deeper into the deduplication feature, because I think this stuff is pretty neat. In the interests of transparency, I’m taking a lot of this information from briefings I’ve received from EMC. I haven’t yet had the chance to test this for myself, so, as always, your mileage might vary.

One of the key benefits of deduplication is reduced footprint. Here’s a marketing picture that expresses, via a simple graph, how deduplication can help you do more, with less.


There are 3 basic steps to deduplication:

  1. Discover / Digest;
  2. Sort / Identify; and
  3. Map / Eliminate.

The Discovery phase is basically generating hashes of 8KB blocks using unique digests. It’s then sorted to identify chunk candidates for deduplication. The duplicates are then mapped and the space is freed up. Digests are used as pointers for the unique data chunks.

Deduplication can be turned on or off at the LUN level. Pools can contain both deduplicated and “normal” LUNs. Also, note that the total number of LUNs on a system can be deduplicated – there is no separate limit applied to the deduplication technology.

The deduplication properties of a LUN are as follows:

  • Feature State – On or Paused
  • State – Off, Enabling, On, Disabling
  • Status – indicates any problems during enabling or disabling

The deduplication properties of a Pool are as follows:

  • State – Idle (no deduplicated LUNs), Pending (between passes), Running (currently running a dedupe pass) and Paused (pool is paused)
  • Deduplication Rate – High, Medium, Low (Medium is current default)

Note that dedplication can be paused at a system level for all pools.

When deduplication is turned off on a LUN it is migrated out of the deduplication container within the pool. 8 simultaneous migrations per system can occur, and it obviously reduces the consumed space in the deduplication container.

At a high level, deduplication interoperability is there:

  • Works with FAST VP (dedupe LUNs behave as a single entity, when dedupe is turned off it goes back to per-LUN FAST VP settings)
  • Supports Snaps and Clones (VNX Snapshots are lost when enabling or disabling, Reserved LUNs for SnapView Snaps cannot be deduplicated)
  • Support for RP, MV and SAN Copy
  • LUN migration works, although moving between pools means deduplication is lost as it’s pool-based
  • Compression is not supported.

And that’s fixed-block deduplication for the next-generation VNX in a nutshell. When I get my hands on one of these I’ll be running it through some more realistic testing and scenarios.