SwiftStack Announces 7

SwiftStack recently announced version 7 of their solution. I had the opportunity to speak to Joe Arnold and Erik Pounds from SwiftStack about the announcement and thought I’d share some thoughts here.

 

Insane Data Requirements

We spoke briefly about just how insane modern data requirements are becoming, in terms of both volume and performance requirements. The example offered up was that of an Advanced Driver-Assistance System (ADAS). These things need a lot of capacity to work, with training data starting at 15PB of data with performance requirements approaching 100GB/s.

  • Autonomy – Level 2+
  • 10 Deep neural networks needed
  • Survey car – 2MP cameras
  • 2PB per year per car
  • 100 NVIDIA DGX-1 servers per car

When your hot data is 15 – 30PB and growing – it’s a problem.

 

What’s New In 7?

SwiftStack has been working to address those kinds of challenges with version 7.

Ultra-scale Performance Architecture

They’ve managed to get some pretty decent numbers under their belt, delivering over 100GB/s at scale with a platform that’s designed to scale linearly to higher levels. The numbers stack up well against some of their competitors, and have been validated through:

  • Independent testing;
  • Comparing similar hardware and workloads; and
  • Results being posted publicly (with solutions based on Cisco Validated Designs).

 

ProxyFS Edge

ProxyFS Edge takes advantage of SwiftStack’s file services to deliver distributed file services between edge, core, and cloud. The idea is that you can use it for “high-throughput, data-intensive use cases”.

[image courtesy of SwiftStack]

Enabling functionality:

  • Containerised deployment of ProxyFS agent for orchestrated elasticity
  • Clustered filesystem enables scale-out capabilities
  • Caching at the edge, minimising latency for improved application performance
  • Load-balanced, high-throughput API-based communication to the core

 

1space File Connector

But what if you have a bunch of unstructured data sitting in file environments that you want to use with your more modern apps? 1space File Connector brings enterprise file data into the cloud namespace, and “[g]ives modern, cloud-native applications access to existing data without migration”. The thinking is that you can modernise your workflows at an incremental rate, rather than having to deal with the app and the storage all in one go.  incrementally

[image courtesy of SwiftStack]

Enabling functionality:

  • Containerised deployment 1space File Connector for orchestrated elasticity
  • File data is accessible using S3 or Swift object APIs
  • Scales out and is load balanced for high-throughput
  • 1space policies can be applied to file data when migration is desired

The SwiftStack AI Architecture

SwiftStack have also developed a comprehensive AI Architecture model, describing it as “the customer-proven stack that enables deep learning at ultra-scale”. You can read more on that here.

Ultra-Scale Performance

  • Shared-nothing distributed architecture
  • Keep GPU compute complexes busy

Elasticity from Edge-to-Core-to-Cloud

  • With 1space, ingest and access data anywhere
  • Eliminate data silos and move beyond one cloud

Data Immutability

  • Data can be retained and referenced indefinitely as it was originally written
  • Enabling traceability, accountability, confidence, and safety throughout the life of a DNN

Optimal TCO

  • Compelling savings compared to public cloud or all-flash arrays Real-World Confidence
  • Notable AI deployments for autonomous vehicle development

SwiftStack PRO

The final piece is the SwiftStack PRO offering, a support service delivering:

  • 24×7 remote management and monitoring of your SwiftStack production cluster(s);
  • Incorporating operational best-practices learned from 100s of large-scale production clusters;
  • Including advanced monitoring software suite for log aggregation, indexing, and analysis; and
  • Operations integration with your internal team to ensure end-to-end management of your environment.

 

Thoughts And Further Reading

The sheer scale of data enterprises are working with every day is pretty amazing. And data is coming from previously unexpected places as well. The traditional enterprise workloads hosted on NAS or in structured applications are insignificant in size when compared to the PB-scale stuff going on in some environments. So how on earth do we start to derive value from these enormous data sets? I think the key is to understand that data is sometimes going to be in places that we don’t expect, and that we sometimes have to work around that constraint. In this case, SwiftStack have recognised that not all data is going to be sitting in the core, or the cloud, and they’re using some interesting technology to get that data where you need it to get the most value from it.

Getting the data from the edge to somewhere useable (or making it useable at the edge) is one thing, but the ability to use unstructured data sitting in file with modern applications is also pretty cool. There’s often reticence associated with making wholesale changes to data sources, and this solution helps to make that transition a little easier. And it gives the punters an opportunity to address data challenges in places that may have been inaccessible in the past.

SwiftStack have good pedigree in delivering modern scale-out storage solutions, and they’ve done a lot of work ensure that their platform adds value. Worth checking out.

Backblaze Has A (Pod) Birthday, Does Some Cool Stuff With B2

Backblaze has been on my mind a lot lately. And not just because of their recent expansion into Europe. The Storage Pod recently turned ten years old, and I was lucky enough to have the chance to chat with Yev Pusin and Andy Klein about that news and some of the stuff they’re doing with B2, Tiger Technology, and Veeam.

 

10 Years Is A Long Time

The Backblaze Storage Pod (currently version 6) recently turned 10 years old. That’s a long time for something to be around (and successful) in a market like cloud storage. I asked to Yev and Andy about where they saw the pod heading, and whether they thought there was room for Flash in the picture. Andy pointed out that, with around 900PB under management, Flash still didn’t look like the most economical medium for this kind of storage task. That said, they have seen the main HDD manufacturers starting to hit a wall in terms of the capacity per drive that they can deliver. Nonetheless, the challenge isn’t just performance, it’s also the fact that people are needing more and more capacity to store their stuff. And it doesn’t look like they can produce enough Flash to cope with that increase in requirements at this stage.

Version 7.0

We spoke briefly about what Pod 7.0 would look like, and it’s going to be a “little bit faster”, with the following enhancements planned:

  • Updating the motherboard
  • Upgrade the CPU and consider using an AMD CPU
  • Updating the power supply units, perhaps moving to one unit
  • Upgrading from 10Gbase-T to 10GbE SFP+ optical networking
  • Upgrading the SATA cards
  • Modifying the tool-less lid design

They’re looking to roll this out in 2020 some time.

 

Tiger Style?

So what’s all this about Veeam, Tiger Bridge, and Backblaze B2? Historically, if you’ve been using Veeam from the cheap seats, it’s been difficult to effectively leverage object storage to use as a repository for longer term data storage. Backblaze and Tiger Technology have gotten together to develop an integration that allows you to use B2 storage to copy your Veeam protection data to the Backblaze cloud. There’s a nice overview of the solution that you can read here, and you can read some more comprehensive instructions here.

 

Thoughts and Further Reading

I keep banging on about it, but ten years feels like a long time to be hanging around in tech. I haven’t managed to stay with one employer longer than 7 years (maybe I’m flighty?). Along with the durability of the solution, the fact that Backblaze made the design open source, and inspired a bunch of companies to do something similar, is a great story. It’s stuff like this that I find inspiring. It’s not always about selling black boxes to people. Sometimes it’s good to be a little transparent about what you’re doing, and relying on a great product, competitive pricing, and strong support to keep customers happy. Backblaze have certainly done that on the consumer side of things, and the team assures me that they’re experiencing success with the B2 offering and their business-oriented data protection solution as well.

The Veeam integration is an interesting one. While B2 is an object storage play, it’s not S3-compliant, so they can’t easily leverage a lot of the built-in options delivered by the bigger data protection vendors. What you will see, though, is that they’re super responsive when it comes to making integrations available across things like NAS devices, and stuff like this. If I get some time in the next month, I’ll look at setting this up in the lab and running through the process.

I’m not going to wax lyrical about how Backblaze is democratising data access for everyone, as they’re in business to make money. But they’re certainly delivering a range of products that is enabling a variety of customers to make good use of technology that has potentially been unavailable (in a simple to consume format) previously. And that’s a great thing. I glossed over the news when it was announced last year, but the “Rebel Alliance” formed between Backblaze, Packet and ServerCentral is pretty interesting, particularly if you’re looking for a more cost-effective solution for compute and object storage that isn’t reliant on hyperscalers. I’m looking forward to hearing about what Backblaze come up with in the future, and I recommend checking them out if you haven’t previously. You can read Ken‘s take over at Gestalt IT here.

Western Digital – The A Is For Active, The S Is For Scale

Disclaimer: I recently attended Storage Field Day 15.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

   

Western Digital recently presented at Storage Field Day 15. You might recall there are a few different brands under the WD umbrella, including Tegile and HGST and folks from both Tegile and HGST presented during Storage Field Day 15. I’d like to talk about the ActiveScale session however, mainly because I’m interested in object solutions. I’ve written about Tegile previously, although obviously a fair bit has changed for them too. You can see their videos from Storage Field Day 15 here, and download a PDF copy of my rough notes from here.

 

ActiveScale, Probably Not What You Thought It Was

ActiveScale isn’t some kind of weight measurement tool for exercise fanatics, but rather the brand of scalable object system that HGST sells. It comes in two flavours: the P100 and X100. Apparently the letters in product names sometimes do mean things, with the “P” standing for Petabyte, and the “X” for Exabyte (possibly in the same way that X stands for Excellent). From a speeds and feeds perspective, the typical specs are as follows:

  • P100 – starts as low as 720TB, goes to 18PB. 17x 9s data durability, 4.6KVA typical power consumption; and
  • X100 – 5.4PB in a rack, 840TB – 52PB, 17x 9s data durability, 6.5KVA typical power consumption.

You can scale out to 9 expansion racks, with 52PB of scale out object storage goodness per namespace. Some of the key capabilities of the ActiveScale platform include:

  • Archive and Backup;
  • Active Data for Analytics;
  • Data Forever Architecture;
  • Versioning;
  • Encryption;
  • Replication;
  • Single Pane Management;
  • S3 Compatible APIs;
  • Multi-Geo Availability Zones; and
  • Scale Up and Scale Out.

They use “BitSpread” for dynamic data placement and you can read a little about their erasure coding mechanism here. “BitDynamics” assures continuous data integrity, offering the following features:

  • Background – verification process always running
  • Performance – not impacted by verification or repair
  • Automatic – all repairs happen with no intervention

There’s also a feature called “GeoSpread” for geographical availability.

  • Single – Distributed erasure coded copy;
  • Available – Can sustain the loss of an entire site; and
  • Efficient – Better than 2 or 3 copy replication.

 

What Do I Use It For Again?

Like a number of other object storage systems in the market, ActiveScale is being positioned as a very suitable platform for:

  • Media & Entertainment
    • Media Archive
    • Tape replacement and augmentation
    • Transcoding
    • Playout
  • Life Sciences
    • Bio imaging
    • Genomic Sequencing
  • Analytics

 

Thoughts And Further Reading

Unlike a lot of people, I find technical sessions discussing object storage at extremely large scale to be really interesting. It’s weird, I know, but there’s something that I really like about the idea of petabytes of storage servicing media and entertainment workloads. Maybe it’s because I don’t frequently come across these types of platforms in my day job. If I’m lucky I get to talk to folks about using object as a scalable archive platform. Occasionally I’ll bump into someone doing stuff with life sciences stuff in a higher education setting, but they’ve invariably built something that’s a little more home-brew than HGST’s offering. Every now and then I’m lucky enough to spend some time with media types who regale me with tales of things that go terribly wrong when the wrong bit of storage infrastructure is put in the path of a particular editing workflow or transcode process. Oh how we laugh. I can certainly see these types of scalable platforms being a good fit for archive and tape replacement. I’m not entirely convinced they make for a great transcode or playout platform, but I’m relatively naive when it comes to those kinds of workloads. If there are folks reading this who are familiar with that kind of stuff, I’d love to have a chat.

But enough with my fascination with the media and entertainment industry’s infrastructure requirements. From what I’ve seen of ActiveScale, it looks to be a solid platform with a lot of very useful features. Coupled with the cloud management feature it seems like they’re worth a look. Western Digital aren’t just making hard drives for your NAS (and other devices), they’re doing a whole lot more, and a lot of it is really cool. You can read El Reg’s article on the X100 here.

SwiftStack 6.0 – Universal Access And More

I haven’t covered SwiftStack in a little while, and they’ve been doing some pretty interesting stuff. They made some announcements recently but a number of scheduling “challenges” and some hectic day job commitments prevented me from speaking to them until just recently. In the end I was lucky enough to snaffle 30 minutes with Mario Blandini and he kindly took me through the latest news.

 

6.0 Then, So What?

Universal Access

Universal Access is really very cool. Think of it as a way to write data in either file or object format, and then read it back in file or object format, depending on how you need to consume it.

[image courtesy of SwiftStack]

Key features include:

  • Gateway free – the data is stored in cloud-native format in a single namespace;
  • Accessible via file (SMB3 / NFS4) and / or object API (S3 / Swift). Note that this is not a replacement for NAS, but it will give you the ability to work with some of those applications that expect to see file in places; and
  • Applications can write data one way, access the data another way, and vice versa.

The great thing is that, according to SwiftStack, “Universal Access enables applications to take advantage of all data under management, no matter how it was written or where it is stored, without the need to refactor applications”.

 

Universal Access Multi-Cloud

So what if you take to really neat features like, say, Cloud Sync and Universal Access, and combine them? You get access to a single, multi-cloud, storage namespace.

[image courtesy of SwiftStack]

 

Thoughts

As Mario took me through the announcements he mentioned that SwiftStack are “not just an object storage thing based on Swift” and I thought that was spot on. Universal Access (particularly with multi-cloud) is just the type of solution that enterprises looking to add mobility to workloads are looking for. The problem for some time has been that data gets tied up in silos based on the protocol that a controller speaks, rather than the value of the data to the business. Products like this go a long way towards relieving some of the pressure on enterprises by enabling simpler access to more data. Being able to spread it across on-premises and public cloud locations also makes for simpler consumption models and can help business leverage the data in a more useful way than was previously possible. Add in the usefulness of something like Cloud Sync in terms of archiving data to public cloud buckets and you’ll start to see that these guys are onto something. I recommend you head over to the SwiftStack site and request a demo. You can read the press release here.

Cloudian Announces HyperFile, Makes Object Better

Cloudian recently announced an addition to their HyperStore appliance. I had the opportunity to be briefed by Jon Toor and thought I’d share the highlights of the announcement here. I’ve had the pleasure of talking to Cloudian at a few Storage Field Day events. If you’re unfamiliar with the HyperStore 4000, you can read my coverage of it here. In short, it’s 840TB of object storage in 4RU with really, really, comprehensive S3 compliance, amongst other things.

 

HyperFile You Say?

HyperFile is the new file front-end controller for the HyperStore appliance. It supports the following features:

  • SMB3 and NFS3;
  • High Availabilty with active / passive controllers;
  • Non-disruptive failover;
  • POSIX compliance;
  • Active Direcotry / LDAP authentication;
  • Write Once Read Many (WORM); and
  • Snapshots.

It wouldn’t be a product announcement without a bezel shot. I can’t say whether this is actually what it looks like, but if it does, it’s kind of cool.

[image courtesy of Cloudian]

The appliance itself is 2RU with dual controllers and a shared backplane. The cool thing is that it can be deployed as VMs, making it appealing for service providers looking to setup multiple environments for customers. Supported hypervisors include vSphere 5.1 (or later) and KVM. Replication is handled at the HyperStore level.

Multi-tenancy is supported with dedicated controllers.

[image courtesy of Cloudian]

There’s a global namespace between file and object and it also supports a shared namespace across multiple NAS controllers, meaning you can up your number of controllers to increase bandwidth or replication performance. From a scalability perspective, it supports up to 64 namespaces per controller. One of my favourite features is what Cloudian call “converged access” between file and object, meaning you could use S3 for storing files. It also supports Microsoft Azure, Google Cloud Platform and Amazon S3 formats, opening up some interesting possibilities for file consumption on-premises and in the cloud.

There are two editions available. The Basic HyperFile NAS Controller includes

  • Full protocol support;
  • High-availability;
  • Converged data access; and
  • Data migration.

The Enterprise HyperFile NAS Controller adds

  • Snapshot;
  • WORM; and
  • Geo-distribution with file versioning/locking.

 

Thoughts

I’ve been a fan of Cloudian’s products for some time, and this addition to the HyperStore platform makes them a compelling option for file and object storage in the data centre. With this approach they’re looking to push further into Media Asset Management (MAM) and video surveillance solutions. The title of the post is misleading. Object is already pretty cool, and a very suitable solution for a number of workloads. So why would an object vendor need to add file to work in these industries? Isn’t object ideally suited to these kinds of workloads? Yes, but sometimes the leading software vendors and people in charge of workflows are focused on other things, like only supporting file. So Cloudian have adapted to take a bigger piece of the pie. In much the same way that some data protection solutions are still file oriented, the HyperFile allows Cloudian to play in areas where it’s traditionally been excluded.

I’m also a fan of the appliance as VM approach and I like the breadth of protocol support and cloud integration available. If you’re going to put cloud in the name of your company the expectation will be there that you know what you’re doing. Cloudian haven’t disappointed thus far. If you’re in the market for a solid object (and now file) solution, you could do worse than talking to the folks at Cloudian.

So NooBaa, eh?

Disclaimer: I recently attended VMworld 2016 – US.  My flights were paid for by myself, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

TFD-Extra-VMworld-300

noobaa_logo

I had the opportunity to speak with NooBaa about six months ago. At the time they were still developing their product, but I thought it looked pretty cool. At Tech Field Day Extra,  they demoed their cloud services engine. The company was founded by Yuval Dimnik (Co-founder and CEO) and Guy Margalit (Co-founder and CTO). If you’re familiar with Exanet or Dell FluidFS, you’ll be familiar with some of their capabilities. NooBaa was founded in 2014, with a product launch in September 2016, and a current headcount of 14 (they tell us have a strong security/storage DNA).

“Customers don’t care how you do your tech, they care how it fixes their problems”

 

So NooBaa, eh?

They have thought about the name. A lot. It’s a pure software product enabling folks to create and provision cloud services

  • Storage (like AWS S3) – First!
  • Serverless compute (like AWS Lambda) – Future

The key is that the customer owns the service, with

  • Full control of who accesses what, and what stays on-premises
  • No cloud vendor lock-in

The services use

  • Heterogeneous resources – cloud resources and servers
  • In the cloud, on-premises, and spanned

So, take all the spare storage you have lying about on Windows and Linux VMs, bang it all in a single namespace and present it back to your object-friendly apps. Replicate it to the cloud if you like. Or use all your spare clouds. Sounds like a cool idea.
Design Considerations (once bitten, twice shy)

They wanted to design a product that behaves like the cloud, but gives you the choice to consume from on-premises or cloud.

But can you predict the unpredictable?

  • Cloud strategy? Everyone has one of those, they’re just not sure what it really means.
  • Growth rate? Oh, it grows a lot.
  • Hardware technologies? Yep, software still needs hardware.
  • Vendors? Who can really work out what they do?
  • Organisational changes?
  • Security issues and lurking “heart bleeds”?

Stuff is hard. Along with this, NooBaa were looking to add the following capabilities

  • On-premises, multi-cloud, and supporting cloud migration
  • P2P scalable capacity
  • Monitor hardware and adapt
  • Agnostic to the machine
  • Allowed to grow, allowed to shrink
  • User space as a religion – when you need to fix that you can do it right away

Architecture

NooBaa is all about a hybrid approach to resources, supporting multiple cloud providers and on-premises resources. It also has support for multiple sites.

tfdx-noobaa-architecture1

The key to NooBaa’s storage performance in what might seem to be non-performant environments is the way it stores data, as you can see in the below diagram.

tfdx-noobaa-architecture2

 

Note that they’re not targeting low-latency workloads. At this stage they’re cloud agnostic and hoping to keep things that way. Heterogeneous resources are key for NooBaa. You can also sign up for the Community Edition – limited to 20TB aggregate object size.
Final Thoughts and Reading

 

The name doesn’t roll off the tongue, and the colour-scheme is very pretty. But I think this belies the thought that’s gone into this product. Yuval and his team have a strong background in scalable object storage, and I’m excited to see them finally come out of stealth. The concept of treating storage nodes as second class citizens is interesting, and I’m looking forward to taking the Community Edition for a spin when I get my act together in the near future. In the meantime, head over to Alastair’s blog for a more succinct write-up on what we saw. John White also did a great post here. You can grab a copy of my raw notes here, and watch NooBaa’s TFDx presentations here.

 

Caringo Announces SwarmNFS

Caringo recently announced SwarmNFS, and I recently had the opportunity to be briefed by Caringo’s Adrian J Herrera (VP Marketing). If you’re not familiar with Caringo, their main platform is Swarm, which “provides a platform for data protection, management, organization and search at massive scale”. You can read an overview of Swarm here, and there’s also a technical overview here.

 

So what is it?

SwarmNFS is a “stateless Linux process that integrates directly with Caringo Swarm. It delivers a global namespace across NFSv4, HTTP, SCSP (Caring’s protocol), S3, and HDFS, delivering data distribution and data management at scale”.

SwarmNFS is basically an NFS server modified with proprietary code. It is:

  • Stateless and lightweight;
  • Has no caching or spooling;
  • Supports parallel data streaming; and
  • Has no single point of failure, with built-in high availability.

Caringo tell me this makes it a whole lot easier to centralise, distribute and manage data, while using a bunch less resources than a traditional file gateway. You can run it as either a Linux process, an appliance or via a VM. Caringo also tell me that, since they connect directly into Swarm, there are less bottlenecks than the traditional approach using gateways, FUSE and proxies.

Caringo_001

Everything in the UI can be done via the API as well, and it has support for multi-tenancy. As I mentioned before, there’s a global namespace with “Universal Access”, meaning that files can be written, read and edited through any interface (NFSv4, SCSP/HTTP, S3, HDFS). Having been a protocol prisoner in previous roles it’s nice to think the there’s a different way to do things.

 

What do I use it for?

You can use this for all kinds of stuff Adrian ran me through some use cases, including:

  • Media and entertainment (think media streaming / content delivery); and
  • Street view type image storage.

One of the key things here is that, because the platform uses NFS, a lot of application re-work doesn’t necessarily need to occur to take advantage of the object storage platform. In my opinion this is a pretty cool feature of the platform, and one that should definitely see people look at SwarmNFS fairly seriously when evaluating their object storage options.

 

Conclusion

Caringo are doing some really cool stuff. If you haven’t checked out FileFly before, it’s also worth a look. The capabilities of the Swarm platform are growing at a rapid place. And the storage world is becoming more object and less block and file as each day passes. Enrico‘s been telling me that for ages now, and everything I’m seeing supports that. Caringo’s approach to metadata – storing metadata with the object itself – also means you can do a bunch of cool stuff with it fairly easily, like replicating it, applying erasure coding to it, and so forth. The upshot is that now the data’s truly portable. So, if you’re object-curious but still hang out with file types, maybe SwarmNFS might be a nice compromise for everyone.

Caringo_002

Exablox Isn’t Just Pretty Hardware

Disclaimer: I recently attended Storage Field Day 10.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

exablox-logo-black

Before I get started, you can find a link to my raw notes on Exablox‘s presentation here. You can also see videos of the presentation here.  You can find a preview post from Chris M. Evans here.

 

It’s Not Just the Hardware

I waxed lyrical about the Exablox hardware platform after seeing it at Storage Field Day 7. But while the OneBlox hardware is indeed pretty cool (you can see the specifications here), the cloud-based monitoring platform, OneSystem, is really the interesting bit.

According to Exablox, the “OneSystem application is used to combine OneBlox appliances into Rings as well as configuring shares, user access, and remote replication”. It’s the mechanism used for configuration, as well as monitoring, alerting and reporting.

OneSystem is built on a cloud-based, multi-tenant architecture. There’s nothing to install for organisations, VARs, and MSPs. Although if you feel a bit special about how your data is treated, there is an optional, private OneSystem deployment available for on-premises management. Exablox pride themselves on the “world-class” support they provide to customers, with a customer-first culture being one of the dominant themes when talking to them about support capability. Some of the other benefits of the OneSystem approach is:

  • The ability to globally manage OneBlox anywhere; and
  • Deliver seamless OneBlox software upgrades.

Exablox also provide 24×7 proactive monitoring, providing insight into, amongst other things:

  • Storage utilisation and analysis;
  • Storage health and alerts; and
  • OneBlox drive health.

The cool thing about this platform is that it offers the ability to configure custom storage policies and simple scaling for individual applications. In this manner you can configure the following data services on a “per application” basis:

  • Variable or fixed-length deduplication;
  • Compression on/off;
  • Continuous data protection on/off and retention; and
  • Remote replication on/off.

 

I Want My Data Everywhere

While the OneBlox ring is currently limited to 7 systems per cluster, you can have two or more (up to 10) clusters operating in a mesh for replication. You can then conceivably have a whole bunch of different data protection schemes in place depending on what you need to protect and where you need it protected. The great thing is that, with the latest version of OneSystem, you can have a one-to-many replication relationship between directories as well. This kind of flexibility is really neat in my opinion. Note that replication is asynchronous.

SFD10_Exablox_Mutli-siteReplication

 

Further Reading and Final Thoughts

If you’ve read any of my recent posts on the likes of Pure, Nimble and Tintri, it would feel like everyone and their dog is into cloud-based monitoring and analytics systems for storage platforms. This is in no way a bad thing, and something that I’m glad we’re seeing become a prevalent feature with these “modern” storage architectures. We store a whole bunch of data on these things. And sometimes it’s even data that is vital to the success of the various business endeavours we undertake on a daily basis. So it’s great to see vendors are taking this requirement seriously. It also helps somewhat that people are a little more comfortable with the concept of keeping information in “the cloud”. This certainly helps the vendors control the end user experience form a support viewpoint, rather than relyin on arcane systems deployed across multiple VMs that invariably fail at the time you need to dig into the data to find out what’s really going on in the environment.

Exablox have come up with a fairly unique approach to scale-out NAS, and I’m keen to see where they take it from here. Features such as remote replication and the continuing maturity of the OneSystem platform make me think that they’re gearing up to push things a little beyond the BYO drives SMB space. I’ll be interested to see just how that plays out.

Ray Lucchesi did a thorough write-up on Exablox that you can read here, while Francesco Bonetti did a great write-up here. Exablox has also published a technical overview of OneBlox and OneSystem that is worth checking out.

 

SwiftStack Announces Object Storage Version 4.0

If you’ve not heard of SwiftStack before, they do “object storage for the enterprise”, with the core product built on OpenStack Swift. I recently had the opportunity to be briefed by Mario Blandini on their 4.0 announcement. Mario describes them as “Like Amazon cloud but inside your DC and behind your firewall”.

New SwiftStack 4.0 innovations introduced today (and available now or in the next 90 days) include:

  • Integrated load balancing reducing the need for expensive dedicated network hardware and minimizing latency and bandwidth costs while scaling to larger numbers of storage nodes
  • Metadata search increases business value with integrated third-party indexing and search services to make stored object data analytics-ready
  • SwiftStack Drive is an optional desktop client that enables access to objects directly from desktops or laptops
  • Enhanced management with new IPv6 support, capacity planning and advanced data migration tools

Swift00

One of the key points in this announcement is the metadata search capability. Object storage is not just about “cheap and deep”, and the way we use metadata can have a big impact on the value of the data, often to applications that didn’t necessarily generate the data in the first place.

Like all good scale out solutions, you don’t need to buy everything up front, just what you need to get started. SwiftStack aren’t in the hardware business though, so you’ll be rolling your own. The hardware requirements for SwiftStack are here, and there’s also a reference architecture for Cisco.

 

Futures

SwiftStack have plans to introduce “Swift File Access” in 2016

Swift00_File

Some of the benefits of this include:

  • Scale-out file services; SMB and NFS – minimizes the need for gateways
  • Fully bimodal > files can come in over SMB and accessed through object APIs and visa versa
  • Integrated into the proxy role > performance scales independently of capacity

SwiftStack also have plans to introduce “Object Synchronization” in 2016

Swift00_ObjectSync

This will provide S3 Synchronization capability, including

  • Replication of objects to S3 buckets
  • Policy-driven > protecting and accessing files using centralized policies
  • Supporting any cloud compatible with the S3 API

This is pretty cool as there’s a lot of momentum within enterprises to consume data in places where it’s needed, not necessarily where it’s created.

 

Final Thoughts

Object storage is hot, because folks love cloud, and object is a big part of that. I like what object can do for storage, particularly as it relates to metadata and scale out performance. I’m happy to see SwiftStack making a decent play inside the enterprise, rather than aiming to be just another public cloud storage provider. I think they’re worth checking out, particularly if you have data that could benefit from object storage without necessarily having live in the public cloud.

Storage Field Day 7 – Day 3 – Cloudian

Disclaimer: I recently attended Storage Field Day 7.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

For each of the presentations I attended at SFD7, there are a few things I want to include in the post. Firstly, you can see video footage of the Cloudian presentation here. You can also download my raw notes from the presentation here. Finally, here’s a link to the Cloudian website that covers some of what they presented.

 

Overview

Michael Tso, CEO and co-founder of Cloudian, provided us with a brief overview of the company. It was founded about 4 years ago, and a lot of the staff’s background was experience with hyper-scale messaging systems for big telcos. They now have about 65 staff.

Cloudian  offers a software version as well as a hardware appliance that runs their HyperStore software. The hardware appliance comes in 3 different flavours:

  • Entry Level;
  • Capacity Optimised; and
  • Performance Optimised.

The software is supported on RedHat and CentOS.

 

Architecture

Paul Turner, Chief Marketing and Product Officer, gave us an introduction to the architecture behind Cloudian. Their focus is on using commodity servers, that provide scale out capability, are durable, and simple to use. “If you don’t make it dead easy to add nodes or remove nodes on the fly you don’t have a good platform”.

The platform uses

  • Erasure Coding;
  • Replication; and
  • Compression

Here’s a picture of what’s inside:

SFD7_Day3_Cloudian_What's_Inside

Features include:

  • Natively S3;
  • Hybrid Storage Cloud;
  • Extreme durability;
  • Multi-tenant;
  • Geo-distribution;
  • Scale out;
  • Intelligence in Software;
  • Smart Support;
  • Data Protection;
  • QoS;
  • Programmable; and
  • Billing and Reporting.

They also make use of an Adaptive Policy Engine (multi-tenant, continuous, adaptive, policy engine), which offers:

  • Policy controlled virtual storage pools (buckets like Amazon);
  • Scale / reduce storage on demand;
  • Multi-tenanted with many application tenants on same infrastructure;
  • Dynamically adjust protection policies;
  • Optimise for small objects by policy; and
  • Cloud archiving by virtual pool.

 

Here’s a diagram of the logical architecture.

SFD7_Day3_Cloudian_Architecture

They use Cassandra as the core metadata and distribution mechanism. Why Cassandra? Well it’s

Scalable

  • Supports 1000s of nodes
  • Adds capacity by adding nodes to running system
  • Distributed shared-nothing P2P architecture, with no single point of failure

Reliable

  • Data durability, synced to disk
  • Resilient to network or hardware failures
  • Multi-DC replication
  • Tuneable data consistency level

Provides Features such as

  • Vnodes, TTL, secondary indexes, compression, encryption

Performant

  • Write path especially fast

Multiple data protection policies, including:

  • NoSQL DB, Replicas, Erasure Coding

Policy features

  • ACL, QoS, Tiering, versioning, etc.

vnodes

  • Nodes remapped to physical disks. then one disk failure only affects those nodes;
  • Maximum 256 nodes per physical node. no token management. tokens randomly assigned;
  • Parallel I/O across nodes;
  • Increased repair speed in case of disk or node failure; and
  • Allows heterogeneous machines in a cluster.

 

Further Reading and Final Thoughts

If you’re doing a bit with cloud storage, I think these guys are worth checking out. I particularly like the use case for Cloudian deployed as an on-premises S3 cloud behind the firewall. There’s also a Community Edition available for download. You can use HyperStore Community Edition software for:

  • For product evaluation;
  • Testing HyperStore software features in a single or multi-node install; and
  • Building 10TB object storage systems free of charge.

I think that’s pretty neat. I also recommend checking out Keith’s preview of Cloudian.