Random Short Take #23

Want some news? In a shorter format? And a little bit random? This listicle might be for you.

  • Remember Retrospect? They were acquired by StorCentric recently. I hadn’t thought about them in some time, but they’re still around, and celebrating their 30th anniversary. Read a little more about the history of the brand here.
  • Sometimes size does matter. This article around deduplication and block / segment size from Preston was particularly enlightening.
  • This article from Russ had some great insights into why it’s not wise to entirely rule out doing things the way service providers do just because you’re working in enterprise. I’ve had experience in both SPs and enterprise and I agree that there are things that can be learnt on both sides.
  • This is a great article from Chris Evans about the difficulties associated with managing legacy backup infrastructure.
  • The Pure Storage VM Analytics Collector is now available as an OVA.
  • If you’re thinking of updating your Mac’s operating environment, this is a fairly comprehensive review of what macOS Catalina has to offer, along with some caveats.
  • Anthony has been doing a bunch of cool stuff with Terraform recently, including using variable maps to deploy vSphere VMs. You can read more about that here.
  • Speaking of people who work at Veeam, Hal has put together a great article on orchestrating Veeam recovery activities to Azure.
  • Finally, the Brisbane VMUG meeting originally planned for Tuesday 8th has been moved to the 15th. Details here.

EMC – Next-Generation VNX – Block Deduplication Caveats

Ever since the VNX2 was announced, customers have asked me about using deduplication with their configs. I did an article on it around the time of the product announcement but have been meaning to talk a bit more about it for some time. But before I do, check out Joel Cason’s great post on this. Anyway, here’s a brief article listing some of the caveats and things to look out for with block deduplication. A few of my clients have used this feature in the field, and have learnt the hard way that if you don’t follow EMC’s guidelines, you may have a sub-optimal experience. Most of the information here has been taken from the “EMC VNX2 Deduplication and Compression” which can be downloaded here.


  • If you’re running a workload with more than 30% writes, compression and deduplication may be a problem. EMC state that, “[f]or applications requiring consistent and predictable performance, EMC recommends using Thick pool LUNs or Classic LUNs. If Thin LUN performance is not acceptable, then do not use Block Deduplication”. I can’t stress this enough – know your workload!
  • Block deduplication is done on a per pool LUN basis. EMC recommended that deduplication be enabled at the time of LUN creation. If you enable it on an existing LUN, the LUN is migrated into the deduplication container using a background process. The data must reside in the container before the deduplication process can run on the dataset.
  • There is only one deduplication container per storage pool. This is where your deduplicated data is stored. When a deduplication container is created, the SP owning the container needs to be determined. The container owner is matched to the Allocation Owner of the first deduplicated LUN within the pool. As a result of this process, EMC recommends that all LUNs with Block Deduplication enabled within the same pool should be owned by the same SP. This can be a big problem in smaller environments where you’ve only deployed one pool.


There’s a bit more to consider, particularly if you’re looking at leveraging compression as well. But if you can’t get past these first few considerations, it’s likely that the VNX2’s version of deduplication on primary storage is probably not for you. Read the whitepaper – it’s readily accessible and fairly clear about what can and can’t be achieved within the constraints of the product.

EMC – Next-Generation VNX – Deduplication

In my previous post on the Next-Generation VNX, I spoke about some of the highlights of the new platform. In this post I’d like to dive a little deeper into the deduplication feature, because I think this stuff is pretty neat. In the interests of transparency, I’m taking a lot of this information from briefings I’ve received from EMC. I haven’t yet had the chance to test this for myself, so, as always, your mileage might vary.

One of the key benefits of deduplication is reduced footprint. Here’s a marketing picture that expresses, via a simple graph, how deduplication can help you do more, with less.


There are 3 basic steps to deduplication:

  1. Discover / Digest;
  2. Sort / Identify; and
  3. Map / Eliminate.

The Discovery phase is basically generating hashes of 8KB blocks using unique digests. It’s then sorted to identify chunk candidates for deduplication. The duplicates are then mapped and the space is freed up. Digests are used as pointers for the unique data chunks.

Deduplication can be turned on or off at the LUN level. Pools can contain both deduplicated and “normal” LUNs. Also, note that the total number of LUNs on a system can be deduplicated – there is no separate limit applied to the deduplication technology.

The deduplication properties of a LUN are as follows:

  • Feature State – On or Paused
  • State – Off, Enabling, On, Disabling
  • Status – indicates any problems during enabling or disabling

The deduplication properties of a Pool are as follows:

  • State – Idle (no deduplicated LUNs), Pending (between passes), Running (currently running a dedupe pass) and Paused (pool is paused)
  • Deduplication Rate – High, Medium, Low (Medium is current default)

Note that dedplication can be paused at a system level for all pools.

When deduplication is turned off on a LUN it is migrated out of the deduplication container within the pool. 8 simultaneous migrations per system can occur, and it obviously reduces the consumed space in the deduplication container.

At a high level, deduplication interoperability is there:

  • Works with FAST VP (dedupe LUNs behave as a single entity, when dedupe is turned off it goes back to per-LUN FAST VP settings)
  • Supports Snaps and Clones (VNX Snapshots are lost when enabling or disabling, Reserved LUNs for SnapView Snaps cannot be deduplicated)
  • Support for RP, MV and SAN Copy
  • LUN migration works, although moving between pools means deduplication is lost as it’s pool-based
  • Compression is not supported.

And that’s fixed-block deduplication for the next-generation VNX in a nutshell. When I get my hands on one of these I’ll be running it through some more realistic testing and scenarios.