Disclaimer: I recently attended Pure//Accelerate 2019. My flights, accommodation, and conference pass were paid for by Pure Storage. There is no requirement for me to blog about any of the content presented and I am not compensated by Pure Storage for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Cloud Block Store for AWS from Pure Storage has been around for a little while now. I had the opportunity to hear about it in more depth at the Storage Field Day Exclusive event at Pure//Accelerate 2019 and thought I’d share some thoughts here. You can grab a copy of my rough notes from the session here, and video from the session is available here.
Cloud Vision
Pure Storage has been focused on making everything related to their products effortless from day 1. An example of this approach is the FlashArray setup process – it’s really easy to get up and running and serving up storage to workloads. They wanted to do the same thing with anything they deliver via cloud services as well. There is, however, something of a “cloud divide” in operation in the industry. If you’re familiar with the various cloud deployment options, you’ll likely be aware that on-premises and hosted cloud is a bit different to public cloud. They:
- Deliver different application architectures;
- Deliver different management and consumption experience; and
- Use different storage.
So what if Pure could build application portability and deliver common shared data services?
Pure have architected their cloud service to leverage what they call “Three Pillars”:
- Build Your Cloud
- Run anywhere
- Protect everywhere
What Is It?
So what exactly is Cloud Block Store for AWS then? Well, imagine if you will, that you’re watching an episode of Pimp My Ride, and Xzibit is talking to an enterprise punter about how he or she likes cloud, and how he or she likes the way Pure Storage’s FlashArray works. And then X says, “Hey, we heard you liked these two things so we put this thing in the other thing”. Look, I don’t know the exact situation where this would happen. But anyway …
- 100% software – deploys instantly as a virtual appliance in the cloud, runs only as long as you need it;
- Efficient – deduplication, compression, and thin provisioning deliver capacity and performance economically;
- Hybrid – easily migrate data bidirectionally, delivering data portability and protection across your hybrid cloud;
- Consistent APIs – developers connect to storage the same way on-premises and in the cloud. Automated deployment with Cloud Formation templates;
- Reliable, secure – delivers industrial-strength perfromance, reliability & protection with Multi-AZ HA, NDU, instant snaps and data at rest encryption; and
- Flexible – pay as you go consumption model to best match your needs for production and development.
[image courtesy of Pure Storage]
Architecture
At the heart of it, the architecture for CVS is not dissimilar to the FlashArray architecture. There’re controllers, drives, NVRAM, and a virtual shelf.
- EC2: CBS Controllers
- EC2: Virtual Drives
- Virtual Shelf: 7 Virtual drives in Spread Placement Group
- EBS IO1: NVRAM, Write Buffer (7 total)
- S3: Durable persistent storage
- Instance Store: Non-Persistent Read Mirror
[image courtesy of Pure Storage]
What’s interesting, to me at least, is how they use S3 for persistent storage.
Procurement
How do you procure CBS for AWS? I’m glad you asked. There are two procurement options.
A – Pure as-a-Service
- Offered via SLED / CLED process
- Minimums 100TiB effective used capacity
- Unified hybrid contracts (on-premises and CBS, CBS)
- 1 year to 3 year contracts
B – AWS Marketplace
- Direct to customer
- Minimum, 10 TiB effective used capacity
- CBS only
- Month to month contract or 1 year contract
Use Cases
There are a raft of different use cases for CBS. Some of them made sense to me straight away, some of them took a little time to bounce around in my head.
Disaster Recovery
- Production instance on-premises
- Replicate data to public cloud
- Fail over in DR event
- Fail back and recover
Lift and shift
- Production instance on-premises
- Replicate data to public cloud
- Run the same architecture as before
- Run production on CBS
Use case: Dev / test
- Replicate data to public cloud
- Instantiate test / dev instances in public cloud
- Refresh test / dev periodically
- Bring changes back on-premises
- Snapshots are more costly and slower to restore in native AWS
ActiveCluster
- HA within an availability zone and / or across availability zones in an AWS region (ActiveCluster needs <11ms latency)
- No downtime when a Cloud Block Store Instance goes away or there is a zone outage
- Pure1 Cloud Mediator Witness (simple to manage and deploy)
Migrating VMware Environments
VMware Challenges
- AWS does not recognise VMFS
- Replicating volumes with VMFS will not do any good
Workaround
- Convert VMFS datastore into vVOLs
- Now each volume has the Guest VM’s file system (NTFS, EXT3, etc)
- Replicate VMDK vVOLs to CBS
- Now the volumes can be mounted to EC2 with matching OS
Note: This is for the VM’s data volumes. The VM boot volume will not be usable in AWS. The VM’s application will need to be redeployed in native AWS EC2.
VMware Cloud
VMware Challenges
- VMware Cloud does not support external storage, it only supports vSAN
Workaround
- Connect Guest VMs directly to CBS via iSCSI
Note: I haven’t verified this myself, and I suspect there may be other ways to do this. But in the context of Pure’s offering, it makes sense.
Thoughts and Further Reading
There’s been a feeling in some parts of the industry for the last 5-10 years that the rise of the public cloud providers would spell the death of the traditional storage vendor. That’s clearly not been the case, but it has been interesting to see the major storage slingers evolving their product strategies to both accommodate and leverage the cloud providers in a more effective manner. Some have used the opportunity to get themselves as close as possible to the cloud providers, without actually being in the cloud. Others have deployed virtualised versions of their offerings inside public cloud and offered users the comfort of their traditional stack, but off-premises. There’s value in these approaches, for sure. But I like the way that Pure has taken it a step further and optimised its architecture to leverage some of the features of what AWS can offer from a cloud hardware perspective.
In my opinion, the main reason you’d look to leverage something like CBS on AWS is if you have an existing investment in Pure and want to keep doing things a certain way. You’re also likely using a lot of traditional VMs in AWS and want something that can improve the performance and resilience of those workloads. CBS is certainly a great way to do this. If you’re already running a raft of cloud-native applications, it’s likely that you don’t necessarily need the features on offer from CBS, as you’re already (hopefully) using them natively. I think Pure understands this though, and isn’t pushing CBS for AWS as the silver bullet for every cloud workload.
I’m looking forward to seeing what the market uptake on this product is like. I’m also keen to crunch the numbers on running this type of solution versus the cost associated with doing something on-premises or via other means. In any case, I’m looking forward to see how this capability evolves over time, and I think CBS on AWS is definitely worthy of further consideration.