EMC have also breathlessly announced the introduction of “Multi-Core Everything” or MCx as an Operating Environment replacement for FLARE. I thought I’d spend a little time going through what that means, based on information I’ve been provided by EMC.
MCx is a redesign of a number of components, providing functionality and performance improvements for:
- Multi-Core Cache (MCC);
- Multi-Core RAID (MCR);
- Multi-Core FAST Cache (MCF); and
- Active / Active data access.
As I mentioned in my introductory post, the OE has been redesigned to spread the operations across each core – providing linear scaling across cores. Given that FLARE pre-dated multi-core x86 CPUs, it seems like this has been a reasonable thing to implement.
EMC are also suggesting that this has enabled them to scale out within the dual-node architecture, providing increased performance, at scale. This is a response to a number of competitors who have suggested that the way to scale out in the mid-range is to increase the number of nodes. How well this scales in real-world scenarios remains to be seen.
With MCC, the cache engine has been modularized to take advantage of all the cores available in the system. There is now also no requirement for manually separate space for Read and Write Cache, meaning no management overhead in
ensuring the cache is working in the most effective way regardless of the IO mix. Interestingly, cache is dynamically assigned, allocating space on-the-fly for reads, writes, thin metadata, etc depending on the needs of the system at the time.
The larger overall working cache provides mutual benefit between reads and writes. Note also that data is not discarded from cache after a write destage (this greatly improves cache hits). The new caching model employs intelligent monitoring of pace of disk writes to avoid forced flushing and works on a standardized 8KB physical page size.
MCR introduces some much-needed changes to the historically clumsy way in which disks were managed in the CLARiiON / VNX. Of particular note is the handling of hot spares. Instead of having to define disks as permanent hot spares, the sparing function is now more flexible and any unassigned drive in the system can operate as a hot spare.
There are three policies for hot spares:
- No hot spares; and
The “Recommended” policy implements the same sparing model as today (1 spare per 30 drives). Note, however, that if there is an unused drive that can be used as a spare (even if you have the “No Hot Spare” policy set) and there is a fault in a RAID group, the system will use that drive. Permanent sparing also means that when a drive has been used in a sparing function, that drive is then a permanent part of the RAID Group or Pool so there is no need to re-balance and copy back the spare drive to a new drive. The cool thing about this is that it reduces the exposure and performance overhead of sparing operations. If the concept of storage pools that spread all over the place didn’t freak out the RAID Group huggers in your storage team, then the idea of spares becoming permanent might just do it. There will be a CLI command available to copy the spare back to the replaced drive, so don’t freak out too much.
What I like the look of, having been stuck “optimising” CLARiiONs previously, is the idea of “Portable drives”, where drives can be physically removed from a RAID group and relocated to another disk shelf. Please note that you can’t use this feature to migrate RAID Groups to other VNX systems, you’ll need to use more traditional migration methods to achieve this.
Multi-Core FAST Cache
According to EMC, MCF is all about increased efficiency and performance. They’ve done this by moving MCC above the FAST Cache driver – all DRAM cache hits are processed without the need to check whether a block resides in FAST Cache, saving this CPU cycle overhead on all IOs. There’s also a faster initial warm-up for FAST Cache, resulting in better initial performance. Finally, instead of requiring 3 hits, if the FAST Cache is not full, it will perform more like a standard extension of cache and load data based on a single hit. Once the Cache is 80% full it reverts to the default 3-hit promotion policy.