Disclaimer: I recently attended Storage Field Day 7. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Cloudian presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Cloudian website that covers some of what they presented.
Michael Tso, CEO and co-founder of Cloudian, provided us with a brief overview of the company. It was founded about 4 years ago, and a lot of the staff’s background was experience with hyper-scale messaging systems for big telcos. They now have about 65 staff.
- Entry Level;
- Capacity Optimised; and
- Performance Optimised.
The software is supported on RedHat and CentOS.
Paul Turner, Chief Marketing and Product Officer, gave us an introduction to the architecture behind Cloudian. Their focus is on using commodity servers, that provide scale out capability, are durable, and simple to use. “If you don’t make it dead easy to add nodes or remove nodes on the fly you don’t have a good platform”.
The platform uses
- Erasure Coding;
- Replication; and
Here’s a picture of what’s inside:
- Natively S3;
- Hybrid Storage Cloud;
- Extreme durability;
- Scale out;
- Intelligence in Software;
- Smart Support;
- Data Protection;
- Programmable; and
- Billing and Reporting.
They also make use of an Adaptive Policy Engine (multi-tenant, continuous, adaptive, policy engine), which offers:
- Policy controlled virtual storage pools (buckets like Amazon);
- Scale / reduce storage on demand;
- Multi-tenanted with many application tenants on same infrastructure;
- Dynamically adjust protection policies;
- Optimise for small objects by policy; and
- Cloud archiving by virtual pool.
Here’s a diagram of the logical architecture.
They use Cassandra as the core metadata and distribution mechanism. Why Cassandra? Well it’s
- Supports 1000s of nodes
- Adds capacity by adding nodes to running system
- Distributed shared-nothing P2P architecture, with no single point of failure
- Data durability, synced to disk
- Resilient to network or hardware failures
- Multi-DC replication
- Tuneable data consistency level
Provides Features such as
- Vnodes, TTL, secondary indexes, compression, encryption
- Write path especially fast
Multiple data protection policies, including:
- NoSQL DB, Replicas, Erasure Coding
- ACL, QoS, Tiering, versioning, etc.
- Nodes remapped to physical disks. then one disk failure only affects those nodes;
- Maximum 256 nodes per physical node. no token management. tokens randomly assigned;
- Parallel I/O across nodes;
- Increased repair speed in case of disk or node failure; and
- Allows heterogeneous machines in a cluster.
Further Reading and Final Thoughts
If you’re doing a bit with cloud storage, I think these guys are worth checking out. I particularly like the use case for Cloudian deployed as an on-premises S3 cloud behind the firewall. There’s also a Community Edition available for download. You can use HyperStore Community Edition software for:
- For product evaluation;
- Testing HyperStore software features in a single or multi-node install; and
- Building 10TB object storage systems free of charge.
I think that’s pretty neat. I also recommend checking out Keith’s preview of Cloudian.