Ever since the VNX2 was announced, customers have asked me about using deduplication with their configs. I did an article on it around the time of the product announcement but have been meaning to talk a bit more about it for some time. But before I do, check out Joel Cason’s great post on this. Anyway, here’s a brief article listing some of the caveats and things to look out for with block deduplication. A few of my clients have used this feature in the field, and have learnt the hard way that if you don’t follow EMC’s guidelines, you may have a sub-optimal experience. Most of the information here has been taken from the “EMC VNX2 Deduplication and Compression” which can be downloaded here.
- If you’re running a workload with more than 30% writes, compression and deduplication may be a problem. EMC state that, “[f]or applications requiring consistent and predictable performance, EMC recommends using Thick pool LUNs or Classic LUNs. If Thin LUN performance is not acceptable, then do not use Block Deduplication”. I can’t stress this enough – know your workload!
- Block deduplication is done on a per pool LUN basis. EMC recommended that deduplication be enabled at the time of LUN creation. If you enable it on an existing LUN, the LUN is migrated into the deduplication container using a background process. The data must reside in the container before the deduplication process can run on the dataset.
- There is only one deduplication container per storage pool. This is where your deduplicated data is stored. When a deduplication container is created, the SP owning the container needs to be determined. The container owner is matched to the Allocation Owner of the first deduplicated LUN within the pool. As a result of this process, EMC recommends that all LUNs with Block Deduplication enabled within the same pool should be owned by the same SP. This can be a big problem in smaller environments where you’ve only deployed one pool.
There’s a bit more to consider, particularly if you’re looking at leveraging compression as well. But if you can’t get past these first few considerations, it’s likely that the VNX2’s version of deduplication on primary storage is probably not for you. Read the whitepaper – it’s readily accessible and fairly clear about what can and can’t be achieved within the constraints of the product.