Disclaimer: I recently attended Storage Field Day 7. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the VMware presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the VMware website that covers some of what they presented.
I’d like to say a few things about the presentation. Firstly, it was held in the “Rubber Chicken” Room at VMware HQ.
Secondly, Rawlinson was there, but we ran out of time to hear him present. This seems to happen each time I see him in real life. Still, it’s not everyday you get to hear Christos Karamanolis (@XtosK) talk about this stuff, so I’ll put my somewhat weird @PunchingClouds fanboy thing to the side for the moment.
Thirdly, and I’ll be upfront about this, I was a bit disappointed that VMware didn’t go outside some fairly fixed parameters as far as what they could and couldn’t talk about with regards to Virtual SAN. I understand that mega software companies have to be a bit careful about what they can say publicly, but I had hoped for something fresher in this presentation. In any case, I’ve included my notes on Christos’s view on the VSAN architecture – I hope it’s useful.
VMware adopted the following principles when designing VSAN.
- Compute + storage scalability
- Unobtrusive to existing data centre architecture
- Distributed software running on every host
- Pools local storage (flash + HDD) on hosts (virtual shared datastore)
- Symmetric architecture – no single point of failure, no bottleneck
The hypervisor opens up new opportunities, with the virtualisation platform providing:
- Visibility to individual VMs and application storage
- Manages all applications’ resource requirements
- Sits directly in the I/O path
- A global view of underlying infrastructure
- Supports an extensive hardware compatibility list (HCL)
Critical paths in ESX kernel
The cluster service allows for
- Fast failure detection
- High performance (especially for writes)
The data path provides
- Low latency
- Minimal CPU per IO
- Minimal Mem consumption
- Physical access to devices
This equals minimal impact on consolidation rates. This is a Good Thing™.
Optimized internet protocol
As ESXi is both the “consumer” and “producer” of data there is no need for a standard data access protocol.
Per-object coordinator = client
- Distributed “metadata server”
- Transactions span only object distribution
Efficient reliable data transport (RDT)
- Protocol agnostic (now TCP/IP)
- RDMA friendly
Standard protocol for external access?
Two tiers of storage: Hybrid
Optimise the cost of physical storage resources
- HDDS: cheap capacity, expensive IOPS
- Flash: expensive capacity, cheap IOPS
Combine best of both worlds
- Performance from flash (read cache + write back)
- Capacity from HDD (capacity tier)
Optimise workload per tier
- Random IO to flash (high IOPS)
- Sequential IO to HDD (high throughput)
Storage organised in disk groups (flash device and magnetic disks) – up to 5 disk groups, 1 SSD + 7 HDDs – this is the fault domain. 70% of flash is read cache, 30% is write buffer. Writes are accumulated, then staged in a magnetic disk-friendly fashion. Proximal IO – writing blocks within a certain number of cylinders. Filesystem on the magnetic disks is slightly different to the one on the SSDs. Uses the back-end of the Virsto filesystem, but doesn’t use the log-structure filesystem component.
Flash device: cache of disk group (70% read cache, 30% write-back buffer)
No caching on “local” flash where VM runs
- Flash latencies 100x network latencies
- No data transfers, no perf hit during VM migration
- Better overall flash utilisation (most expensive resource)
Use local cache when it matters
- In-memory CBRC (RAM << Network latency)
- Lots of block sharing (VDI)
- More options in the future …
Deduplicated RAM-based caching
- VM consists of a number of objects – each object individually distributed
- VSAN doesn’t know about VMs and VMDKs
- Up to 62TB useable
- Single namespace, multiple mount points
- VMFS created in sub-namespace
The VM Home directory object is formatted with VMFS to allow a VM’s config files to be stored on it. Mounted under the root dir vsanDatastore.
- Availability policy reflected on number of replicas
- Performance policy may include a stripe width per replica
- Object “components” may reside in different disks and / or hosts
VSAN cluster = vSphere cluster
Ease of management
- Piggyback on vSphere management workflow, e.g. EMM
- Ensure coherent configuration of hosts in vSphere cluster
Adapt to the customer’s data centre architecture while working with network topology constraints.
Maintenance mode – planned downtime.
- Ensure accessibility;
- Full data migration; and
- No data migration.
VM-centric monitoring and troubleshooting
- Configure, manage, monitor
Policy compliance reporting
Combination of tools for monitoring in 5.5
- CLI commmands
- Ruby vSphere console
- VSAN observer
More to come soon …
Real *software* defined storage
Software + hardware – component based (individual components), Virtual SAN ready node (40 OEM validated server configurations are ready for VSAN deployment)
VMware EVO:RAIL = Hyper-converged infrastructure
It’s a big task to get all of this working with everything (supporting the entire vSphere HCL).
Closing Thoughts and Further Reading
I like VSAN. And I like that VMware are working so hard at getting it right. I don’t like some of the bs that goes with their marketing of the product, but I think it has its place in the enterprise and is only going to go from strength to strength with the amount of resources VMware is throwing at it. In the meantime, check out Keith’s background post on VMware here. In my opinion, you can’t go past Cormac’s posts on VSAN if you want a technical deep dive. Also, buy his book.