Intel Optane And The DAOS Storage Engine

Disclaimer: I recently attended Storage Field Day 20.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Intel recently presented at Storage Field Day 20. You can see videos of the presentation here, and download my rough notes from here.

 

Intel Optane Persistent Memory

If you’re a diskslinger, you’ve very likely heard of Intel Optane. You may have even heard of Intel Optane Persistent Memory. It’s a little different to Optane SSD, and Intel describes it as “memory technology that delivers a unique combination of affordable large capacity and support for data persistence”. It looks a lot like DRAM, but the capacity is greater, and there’s data persistence across power losses. This all sounds pretty cool, but isn’t it just another form factor for fast storage? Sort of, but the application of the engineering behind the product is where I think it starts to get really interesting.

 

Enter DAOS

Distributed Asynchronous Object Storage (DAOS) is described by Intel as “an open source software-defined scale-out object store that provides high bandwidth, low latency, and high I/O operations per second (IOPS) storage containers to HPC applications”. It’s ostensibly a software stack built from the ground up to take advantage of the crazy speeds you can achieve with Optane, and at scale. There’s a handy overview of the architecture available on Intel’s website. Traditional object (and other storage systems) haven’t really been built to take advantage of Optane in quite the same way DAOS has.

[image courtesy of Intel]

There are some cool features built into DAOS, including:

  • Ultra-fine grained, low-latency, and true zero-copy I/O
  • Advanced data placement to account for fault domains
  • Software-managed redundancy supporting both replication and erasure code with online rebuild
  • End-to-end (E2E) data integrity
  • Scalable distributed transactions with guaranteed data consistency and automated recovery
  • Dataset snapshot capability
  • Security framework to manage access control to storage pools
  • Software-defined storage management to provision, configure, modify, and monitor storage pools

Exciting? Sure is. There’s also integration with Lustre. The best thing about this is that you can grab it from Github under the Apache 2.0 license.

 

Thoughts And Further Reading

Object storage is in its relative infancy when compared to some of the storage architectures out there. It was designed to be highly scalable and generally does a good job of cheap and deep storage at “web scale”. It’s my opinion that object storage becomes even more interesting as a storage solution when you put a whole bunch of really fast storage media behind it. I’ve seen some media companies do this with great success, and there are a few of the bigger vendors out there starting to push the All-Flash object story. Even then, though, many of the more popular object storage systems aren’t necessarily optimised for products like Intel Optane PMEM. This is what makes DAOS so interesting – the ability for the storage to fundamentally do what it needs to do at massive scale, and have it go as fast as the media will let it go. You don’t need to worry as much about the storage architecture being optimised for the storage it will sit on, because the folks developing it have access to the team that developed the hardware.

The other thing I really like about this project is that it’s open source. This tells me that Intel are both focused on Optane being successful, and also focused on the industry making the most of the hardware it’s putting out there. It’s a smart move – come up with some super fast media, and then give the market as much help as possible to squeeze the most out of it.

You can grab the admin guide from here, and check out the roadmap here. Intel has plans to release a new version every 6 months, and I’m really looking forward to seeing this thing gain traction. For another perspective on DAOS and Intel Optane, check out David Chapa’s article here.

 

 

Datadobi Announces S3 Migration Capability

Datadobi recently announced S3 migration capabilities as part of DobiMigrate 5.9. I had the opportunity to speak to Carl D’Halluin and Michael Jack about the announcement and thought I’d share some thoughts on it here.

 

What Is It?

In short, you can now use DobiMigrate to perform S3 to S3 object storage migrations. It’s flexible too, offering the ability to migrate data from a variety of on-premises object systems up to public cloud object storage, between on-premises systems, or back to on-premises from public cloud storage. There’s support for a variety of S3 systems, including:

In the future Datadobi is looking to add support for AWS Glacier, object locks, object tags, and non-current object versions.

 

Why Would You?

There are quite a few reasons why you might want to move S3 data around. You could be seeing high egress charges from AWS because you’re accessing more data in S3 than you’d initially anticipated. You might be looking to move to the cloud and have a significant on-premises footprint that needs to go. Or you might be looking to replace your on-premises solution with a solution from another vendor.

 

How Would You?

The process used to migrate object is fairly straightforward, and follows a pattern that will be familiar if you’ve done anything with any kind of storage migration tool before. In short, you setup a migration pair (source and destination), run a scan and first copy, then do some incremental copies. Once you’ve got a maintenance window, there’s a cutover where the final scan and copy is done. And then you’re good to go. Basically.

[image courtesy of Datadobi]

 

Final Thoughts

Why am I so interested in these types of offerings? Part of it is that it reminds of all of the time I burnt through earlier in my career migrating data from various storage platforms to other storage platforms. One of the funny things about storage is that there’s rarely enough to service demand, and it rarely delivers the performance you need after it’s been in use for a few years. As such, there’s always some requirement to move data from one spot to another, and to keep that data intact in terms of its permissions, and metadata.

Amazon’s S3 offering has been amazing in terms of bringing object storage to the front of mind of many storage consumers who had previously only used block or file storage. Some of those users are now discovering that, while S3 is great, it can be expensive if you haven’t accounted for egress costs, or you’ve started using a whole lot more of it than initially anticipated. Some companies simply have to take their lumps, as everything is done in public cloud. But for those organisations with some on-premises footprint, the idea of being able to do performance oriented object storage in their own data centre holds a great deal of appeal. But how do you get it back on-premises in a reliable fashion? I believe that’s where Datadobi’s solution really shines.

I’m a fan of software that makes life easier for storage folk. Platform migrations can be a real pain to deal with, and are often riddled with risky propositions and daunting timeframes. Datadobi can’t necessarily change the laws of physics in a way that will keep your project manager happy, but it can do some stuff that means you won’t be quite as broken after a storage migration as you might have been previously. They already had a good story when it came to file storage migration, and the object to object story enhances it. Worth checking out.

MinIO – Not Your Father’s Object Storage Platform

Disclaimer: I recently attended Storage Field Day 19.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

MinIO recently presented at Storage Field Day 19. You can see videos of the presentation here, and download my rough notes from here.

 

MinIO – A Sharp Looking 3-Piece Suite

AB Periasamy spoke to the delegates first, describing MinIO as “a high performance, software-defined, distributed object storage server, designed for peta-scale data infrastructure”. It was built from scratch with the private cloud as its target and is comprised of three components:

  • MinIO Server
  • MinIO Client
  • MinIO SDK

He noted that “the private cloud is a very different beast to the public cloud”.

Why Object?

The MinIO founders felt strongly that data would continue to grow, S3 would overtake POSIX, and that the bulk of data would exist outside of AWS. It’s his opinion that private cloud finally started emerging as a real platform last year.

 

Architecture

A number of guiding principles were adopted by MinIO when designing the platform. MinIO is:

  • Focused on performance. They believe it is the fastest object store in existence;
  • Cloud native. It is the most K8s-friendly solution available for the private cloud;
  • 100% open-source enables an increasingly dominant position in the enterprise;
  • Built for scale using the same philosophy as web scalers; and
  • Designed for simplicity. Simplicity scales – across clients, clouds, and machines.

 

[image courtesy of MinIO]

 

Other Features

Some of the other key features MinIO is known for include:

  • Scalability;
  • Support for erasure coding;
  • Identity and Access Management capability;
  • Encryption; and
  • Lifecycle Management.

MinIO is written in Go and is 100% open source. “The idea of holding customers hostage with a license key – those days are over”.

 

Deployment Use Cases

MinIO delivers usable object storage capability in all of the places you would expect it to.

  • Big Data / Machine Learning environments
  • HDFS replacements
  • High performance data lake / warehouse infrastructure
  • Cloud native applications (replacing file and block)
  • Multi-cloud environments (portability)
  • Endpoint for streaming workloads

 

Thoughts and Further Reading

If you watch the MinIO presentation, or check my notes, you’ll see a lot of slides with some impressive numbers in terms of both performance and market penetration. MinIO is not your standard object storage stack. A number of really quite big customers use it internally to service their object storage requirements. And, because it’s open source, a whole lot of people are really curious about how it all works, and have taken it for a spin at some stage or another. The story here isn’t that MinIO is architecturally a bit different from some other vendors’ storage offerings. Rather, it’s the fact that it’s open source and accessible to every punter who wants to grab it. This is exactly the reason why neckbeards get excited about open source products. Because you can take a core infrastructure function, and build a product that does something extremely useful from it. And you can contribute back to the community.

The big question, though, is how to make money of this kind of operating model. A well known software company made a pretty decent stab at leveraging open source products as a business model, delivering enhanced support services as a way to keep the cash coming in. This is very much what MinIO is doing as well. It has a number of very big customers willing to pay for an enhanced support experience via a subscription. It’s an interesting idea. Come up with a product that does what it says it will quite well. Make it easy to get hold of. Help big companies adopt it at scale. Then keep them up and running when said open source code becomes a mission critical piece of their business workflow. I want this model to work, I really do. And I have no evidence to say that it won’t. The folks at MinIO were pretty confident about what they could deliver with SUBNET in terms of the return on investment. I’m optimistic that MinIO will be around for a while longer, as the product looks the goods, and the people behind the product have spent some time thinking through what this will look like in the future. I also recommend checking out Chin-Fah’s recent article for another perspective.

SwiftStack Announces 7

SwiftStack recently announced version 7 of its solution. I had the opportunity to speak to Joe Arnold and Erik Pounds from SwiftStack about the announcement and thought I’d share some thoughts here.

 

Insane Data Requirements

We spoke briefly about just how insane modern data requirements are becoming, in terms of both volume and performance requirements. The example offered up was that of an Advanced Driver-Assistance System (ADAS). These things need a lot of capacity to work, with training data starting at 15PB of data with performance requirements approaching 100GB/s.

  • Autonomy – Level 2+
  • 10 Deep neural networks needed
  • Survey car – 2MP cameras
  • 2PB per year per car
  • 100 NVIDIA DGX-1 servers per car

When your hot data is 15 – 30PB and growing – it’s a problem.

 

What’s New In 7?

SwiftStack has been working to address those kinds of challenges with version 7.

Ultra-scale Performance Architecture

SwiftStack has managed to get some pretty decent numbers under its belt, delivering over 100GB/s at scale with a platform that’s designed to scale linearly to higher levels. The numbers stack up well against some of their competitors, and have been validated through:

  • Independent testing;
  • Comparing similar hardware and workloads; and
  • Results being posted publicly (with solutions based on Cisco Validated Designs).

 

ProxyFS Edge

ProxyFS Edge takes advantage of SwiftStack’s file services to deliver distributed file services between edge, core, and cloud. The idea is that you can use it for “high-throughput, data-intensive use cases”.

[image courtesy of SwiftStack]

Enabling functionality:

  • Containerised deployment of ProxyFS agent for orchestrated elasticity
  • Clustered filesystem enables scale-out capabilities
  • Caching at the edge, minimising latency for improved application performance
  • Load-balanced, high-throughput API-based communication to the core

 

1space File Connector

But what if you have a bunch of unstructured data sitting in file environments that you want to use with your more modern apps? 1space File Connector brings enterprise file data into the cloud namespace, and “[g]ives modern, cloud-native applications access to existing data without migration”. The thinking is that you can modernise your workflows at an incremental rate, rather than having to deal with the app and the storage all in one go.  incrementally

[image courtesy of SwiftStack]

Enabling functionality:

  • Containerised deployment 1space File Connector for orchestrated elasticity
  • File data is accessible using S3 or Swift object APIs
  • Scales out and is load balanced for high-throughput
  • 1space policies can be applied to file data when migration is desired

The SwiftStack AI Architecture

SwiftStack has also developed a comprehensive AI Architecture model, describing it as “the customer-proven stack that enables deep learning at ultra-scale”. You can read more on that here.

Ultra-Scale Performance

  • Shared-nothing distributed architecture
  • Keep GPU compute complexes busy

Elasticity from Edge-to-Core-to-Cloud

  • With 1space, ingest and access data anywhere
  • Eliminate data silos and move beyond one cloud

Data Immutability

  • Data can be retained and referenced indefinitely as it was originally written
  • Enabling traceability, accountability, confidence, and safety throughout the life of a DNN

Optimal TCO

  • Compelling savings compared to public cloud or all-flash arrays Real-World Confidence
  • Notable AI deployments for autonomous vehicle development

SwiftStack PRO

The final piece is the SwiftStack PRO offering, a support service delivering:

  • 24×7 remote management and monitoring of your SwiftStack production cluster(s);
  • Incorporating operational best-practices learned from 100s of large-scale production clusters;
  • Including advanced monitoring software suite for log aggregation, indexing, and analysis; and
  • Operations integration with your internal team to ensure end-to-end management of your environment.

 

Thoughts And Further Reading

The sheer scale of data enterprises are working with every day is pretty amazing. And data is coming from previously unexpected places as well. The traditional enterprise workloads hosted on NAS or in structured applications are insignificant in size when compared to the PB-scale stuff going on in some environments. So how on earth do we start to derive value from these enormous data sets? I think the key is to understand that data is sometimes going to be in places that we don’t expect, and that we sometimes have to work around that constraint. In this case, SwiftStack has recognised that not all data is going to be sitting in the core, or the cloud, and it’s using some interesting technology to get that data where you need it to get the most value from it.

Getting the data from the edge to somewhere useable (or making it useable at the edge) is one thing, but the ability to use unstructured data sitting in file with modern applications is also pretty cool. There’s often reticence associated with making wholesale changes to data sources, and this solution helps to make that transition a little easier. And it gives the punters an opportunity to address data challenges in places that may have been inaccessible in the past.

SwiftStack has good pedigree in delivering modern scale-out storage solutions, and it’s done a lot of work ensure that its platform adds value. Worth checking out.

Backblaze Has A (Pod) Birthday, Does Some Cool Stuff With B2

Backblaze has been on my mind a lot lately. And not just because of their recent expansion into Europe. The Storage Pod recently turned ten years old, and I was lucky enough to have the chance to chat with Yev Pusin and Andy Klein about that news and some of the stuff they’re doing with B2, Tiger Technology, and Veeam.

 

10 Years Is A Long Time

The Backblaze Storage Pod (currently version 6) recently turned 10 years old. That’s a long time for something to be around (and successful) in a market like cloud storage. I asked to Yev and Andy about where they saw the pod heading, and whether they thought there was room for Flash in the picture. Andy pointed out that, with around 900PB under management, Flash still didn’t look like the most economical medium for this kind of storage task. That said, they have seen the main HDD manufacturers starting to hit a wall in terms of the capacity per drive that they can deliver. Nonetheless, the challenge isn’t just performance, it’s also the fact that people are needing more and more capacity to store their stuff. And it doesn’t look like they can produce enough Flash to cope with that increase in requirements at this stage.

Version 7.0

We spoke briefly about what Pod 7.0 would look like, and it’s going to be a “little bit faster”, with the following enhancements planned:

  • Updating the motherboard
  • Upgrade the CPU and consider using an AMD CPU
  • Updating the power supply units, perhaps moving to one unit
  • Upgrading from 10Gbase-T to 10GbE SFP+ optical networking
  • Upgrading the SATA cards
  • Modifying the tool-less lid design

They’re looking to roll this out in 2020 some time.

 

Tiger Style?

So what’s all this about Veeam, Tiger Bridge, and Backblaze B2? Historically, if you’ve been using Veeam from the cheap seats, it’s been difficult to effectively leverage object storage to use as a repository for longer term data storage. Backblaze and Tiger Technology have gotten together to develop an integration that allows you to use B2 storage to copy your Veeam protection data to the Backblaze cloud. There’s a nice overview of the solution that you can read here, and you can read some more comprehensive instructions here.

 

Thoughts and Further Reading

I keep banging on about it, but ten years feels like a long time to be hanging around in tech. I haven’t managed to stay with one employer longer than 7 years (maybe I’m flighty?). Along with the durability of the solution, the fact that Backblaze made the design open source, and inspired a bunch of companies to do something similar, is a great story. It’s stuff like this that I find inspiring. It’s not always about selling black boxes to people. Sometimes it’s good to be a little transparent about what you’re doing, and relying on a great product, competitive pricing, and strong support to keep customers happy. Backblaze have certainly done that on the consumer side of things, and the team assures me that they’re experiencing success with the B2 offering and their business-oriented data protection solution as well.

The Veeam integration is an interesting one. While B2 is an object storage play, it’s not S3-compliant, so they can’t easily leverage a lot of the built-in options delivered by the bigger data protection vendors. What you will see, though, is that they’re super responsive when it comes to making integrations available across things like NAS devices, and stuff like this. If I get some time in the next month, I’ll look at setting this up in the lab and running through the process.

I’m not going to wax lyrical about how Backblaze is democratising data access for everyone, as they’re in business to make money. But they’re certainly delivering a range of products that is enabling a variety of customers to make good use of technology that has potentially been unavailable (in a simple to consume format) previously. And that’s a great thing. I glossed over the news when it was announced last year, but the “Rebel Alliance” formed between Backblaze, Packet and ServerCentral is pretty interesting, particularly if you’re looking for a more cost-effective solution for compute and object storage that isn’t reliant on hyperscalers. I’m looking forward to hearing about what Backblaze come up with in the future, and I recommend checking them out if you haven’t previously. You can read Ken‘s take over at Gestalt IT here.

Western Digital – The A Is For Active, The S Is For Scale

Disclaimer: I recently attended Storage Field Day 15.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

   

Western Digital recently presented at Storage Field Day 15. You might recall there are a few different brands under the WD umbrella, including Tegile and HGST and folks from both Tegile and HGST presented during Storage Field Day 15. I’d like to talk about the ActiveScale session however, mainly because I’m interested in object solutions. I’ve written about Tegile previously, although obviously a fair bit has changed for them too. You can see their videos from Storage Field Day 15 here, and download a PDF copy of my rough notes from here.

 

ActiveScale, Probably Not What You Thought It Was

ActiveScale isn’t some kind of weight measurement tool for exercise fanatics, but rather the brand of scalable object system that HGST sells. It comes in two flavours: the P100 and X100. Apparently the letters in product names sometimes do mean things, with the “P” standing for Petabyte, and the “X” for Exabyte (possibly in the same way that X stands for Excellent). From a speeds and feeds perspective, the typical specs are as follows:

  • P100 – starts as low as 720TB, goes to 18PB. 17x 9s data durability, 4.6KVA typical power consumption; and
  • X100 – 5.4PB in a rack, 840TB – 52PB, 17x 9s data durability, 6.5KVA typical power consumption.

You can scale out to 9 expansion racks, with 52PB of scale out object storage goodness per namespace. Some of the key capabilities of the ActiveScale platform include:

  • Archive and Backup;
  • Active Data for Analytics;
  • Data Forever Architecture;
  • Versioning;
  • Encryption;
  • Replication;
  • Single Pane Management;
  • S3 Compatible APIs;
  • Multi-Geo Availability Zones; and
  • Scale Up and Scale Out.

They use “BitSpread” for dynamic data placement and you can read a little about their erasure coding mechanism here. “BitDynamics” assures continuous data integrity, offering the following features:

  • Background – verification process always running
  • Performance – not impacted by verification or repair
  • Automatic – all repairs happen with no intervention

There’s also a feature called “GeoSpread” for geographical availability.

  • Single – Distributed erasure coded copy;
  • Available – Can sustain the loss of an entire site; and
  • Efficient – Better than 2 or 3 copy replication.

 

What Do I Use It For Again?

Like a number of other object storage systems in the market, ActiveScale is being positioned as a very suitable platform for:

  • Media & Entertainment
    • Media Archive
    • Tape replacement and augmentation
    • Transcoding
    • Playout
  • Life Sciences
    • Bio imaging
    • Genomic Sequencing
  • Analytics

 

Thoughts And Further Reading

Unlike a lot of people, I find technical sessions discussing object storage at extremely large scale to be really interesting. It’s weird, I know, but there’s something that I really like about the idea of petabytes of storage servicing media and entertainment workloads. Maybe it’s because I don’t frequently come across these types of platforms in my day job. If I’m lucky I get to talk to folks about using object as a scalable archive platform. Occasionally I’ll bump into someone doing stuff with life sciences stuff in a higher education setting, but they’ve invariably built something that’s a little more home-brew than HGST’s offering. Every now and then I’m lucky enough to spend some time with media types who regale me with tales of things that go terribly wrong when the wrong bit of storage infrastructure is put in the path of a particular editing workflow or transcode process. Oh how we laugh. I can certainly see these types of scalable platforms being a good fit for archive and tape replacement. I’m not entirely convinced they make for a great transcode or playout platform, but I’m relatively naive when it comes to those kinds of workloads. If there are folks reading this who are familiar with that kind of stuff, I’d love to have a chat.

But enough with my fascination with the media and entertainment industry’s infrastructure requirements. From what I’ve seen of ActiveScale, it looks to be a solid platform with a lot of very useful features. Coupled with the cloud management feature it seems like they’re worth a look. Western Digital aren’t just making hard drives for your NAS (and other devices), they’re doing a whole lot more, and a lot of it is really cool. You can read El Reg’s article on the X100 here.

SwiftStack 6.0 – Universal Access And More

I haven’t covered SwiftStack in a little while, and they’ve been doing some pretty interesting stuff. They made some announcements recently but a number of scheduling “challenges” and some hectic day job commitments prevented me from speaking to them until just recently. In the end I was lucky enough to snaffle 30 minutes with Mario Blandini and he kindly took me through the latest news.

 

6.0 Then, So What?

Universal Access

Universal Access is really very cool. Think of it as a way to write data in either file or object format, and then read it back in file or object format, depending on how you need to consume it.

[image courtesy of SwiftStack]

Key features include:

  • Gateway free – the data is stored in cloud-native format in a single namespace;
  • Accessible via file (SMB3 / NFS4) and / or object API (S3 / Swift). Note that this is not a replacement for NAS, but it will give you the ability to work with some of those applications that expect to see file in places; and
  • Applications can write data one way, access the data another way, and vice versa.

The great thing is that, according to SwiftStack, “Universal Access enables applications to take advantage of all data under management, no matter how it was written or where it is stored, without the need to refactor applications”.

 

Universal Access Multi-Cloud

So what if you take to really neat features like, say, Cloud Sync and Universal Access, and combine them? You get access to a single, multi-cloud, storage namespace.

[image courtesy of SwiftStack]

 

Thoughts

As Mario took me through the announcements he mentioned that SwiftStack are “not just an object storage thing based on Swift” and I thought that was spot on. Universal Access (particularly with multi-cloud) is just the type of solution that enterprises looking to add mobility to workloads are looking for. The problem for some time has been that data gets tied up in silos based on the protocol that a controller speaks, rather than the value of the data to the business. Products like this go a long way towards relieving some of the pressure on enterprises by enabling simpler access to more data. Being able to spread it across on-premises and public cloud locations also makes for simpler consumption models and can help business leverage the data in a more useful way than was previously possible. Add in the usefulness of something like Cloud Sync in terms of archiving data to public cloud buckets and you’ll start to see that these guys are onto something. I recommend you head over to the SwiftStack site and request a demo. You can read the press release here.

Cloudian Announces HyperFile, Makes Object Better

Cloudian recently announced an addition to their HyperStore appliance. I had the opportunity to be briefed by Jon Toor and thought I’d share the highlights of the announcement here. I’ve had the pleasure of talking to Cloudian at a few Storage Field Day events. If you’re unfamiliar with the HyperStore 4000, you can read my coverage of it here. In short, it’s 840TB of object storage in 4RU with really, really, comprehensive S3 compliance, amongst other things.

 

HyperFile You Say?

HyperFile is the new file front-end controller for the HyperStore appliance. It supports the following features:

  • SMB3 and NFS3;
  • High Availabilty with active / passive controllers;
  • Non-disruptive failover;
  • POSIX compliance;
  • Active Direcotry / LDAP authentication;
  • Write Once Read Many (WORM); and
  • Snapshots.

It wouldn’t be a product announcement without a bezel shot. I can’t say whether this is actually what it looks like, but if it does, it’s kind of cool.

[image courtesy of Cloudian]

The appliance itself is 2RU with dual controllers and a shared backplane. The cool thing is that it can be deployed as VMs, making it appealing for service providers looking to setup multiple environments for customers. Supported hypervisors include vSphere 5.1 (or later) and KVM. Replication is handled at the HyperStore level.

Multi-tenancy is supported with dedicated controllers.

[image courtesy of Cloudian]

There’s a global namespace between file and object and it also supports a shared namespace across multiple NAS controllers, meaning you can up your number of controllers to increase bandwidth or replication performance. From a scalability perspective, it supports up to 64 namespaces per controller. One of my favourite features is what Cloudian call “converged access” between file and object, meaning you could use S3 for storing files. It also supports Microsoft Azure, Google Cloud Platform and Amazon S3 formats, opening up some interesting possibilities for file consumption on-premises and in the cloud.

There are two editions available. The Basic HyperFile NAS Controller includes

  • Full protocol support;
  • High-availability;
  • Converged data access; and
  • Data migration.

The Enterprise HyperFile NAS Controller adds

  • Snapshot;
  • WORM; and
  • Geo-distribution with file versioning/locking.

 

Thoughts

I’ve been a fan of Cloudian’s products for some time, and this addition to the HyperStore platform makes them a compelling option for file and object storage in the data centre. With this approach they’re looking to push further into Media Asset Management (MAM) and video surveillance solutions. The title of the post is misleading. Object is already pretty cool, and a very suitable solution for a number of workloads. So why would an object vendor need to add file to work in these industries? Isn’t object ideally suited to these kinds of workloads? Yes, but sometimes the leading software vendors and people in charge of workflows are focused on other things, like only supporting file. So Cloudian have adapted to take a bigger piece of the pie. In much the same way that some data protection solutions are still file oriented, the HyperFile allows Cloudian to play in areas where it’s traditionally been excluded.

I’m also a fan of the appliance as VM approach and I like the breadth of protocol support and cloud integration available. If you’re going to put cloud in the name of your company the expectation will be there that you know what you’re doing. Cloudian haven’t disappointed thus far. If you’re in the market for a solid object (and now file) solution, you could do worse than talking to the folks at Cloudian.

So NooBaa, eh?

Disclaimer: I recently attended VMworld 2016 – US.  My flights were paid for by myself, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

TFD-Extra-VMworld-300

noobaa_logo

I had the opportunity to speak with NooBaa about six months ago. At the time they were still developing their product, but I thought it looked pretty cool. At Tech Field Day Extra,  they demoed their cloud services engine. The company was founded by Yuval Dimnik (Co-founder and CEO) and Guy Margalit (Co-founder and CTO). If you’re familiar with Exanet or Dell FluidFS, you’ll be familiar with some of their capabilities. NooBaa was founded in 2014, with a product launch in September 2016, and a current headcount of 14 (they tell us have a strong security/storage DNA).

“Customers don’t care how you do your tech, they care how it fixes their problems”

 

So NooBaa, eh?

They have thought about the name. A lot. It’s a pure software product enabling folks to create and provision cloud services

  • Storage (like AWS S3) – First!
  • Serverless compute (like AWS Lambda) – Future

The key is that the customer owns the service, with

  • Full control of who accesses what, and what stays on-premises
  • No cloud vendor lock-in

The services use

  • Heterogeneous resources – cloud resources and servers
  • In the cloud, on-premises, and spanned

So, take all the spare storage you have lying about on Windows and Linux VMs, bang it all in a single namespace and present it back to your object-friendly apps. Replicate it to the cloud if you like. Or use all your spare clouds. Sounds like a cool idea.
Design Considerations (once bitten, twice shy)

They wanted to design a product that behaves like the cloud, but gives you the choice to consume from on-premises or cloud.

But can you predict the unpredictable?

  • Cloud strategy? Everyone has one of those, they’re just not sure what it really means.
  • Growth rate? Oh, it grows a lot.
  • Hardware technologies? Yep, software still needs hardware.
  • Vendors? Who can really work out what they do?
  • Organisational changes?
  • Security issues and lurking “heart bleeds”?

Stuff is hard. Along with this, NooBaa were looking to add the following capabilities

  • On-premises, multi-cloud, and supporting cloud migration
  • P2P scalable capacity
  • Monitor hardware and adapt
  • Agnostic to the machine
  • Allowed to grow, allowed to shrink
  • User space as a religion – when you need to fix that you can do it right away

Architecture

NooBaa is all about a hybrid approach to resources, supporting multiple cloud providers and on-premises resources. It also has support for multiple sites.

tfdx-noobaa-architecture1

The key to NooBaa’s storage performance in what might seem to be non-performant environments is the way it stores data, as you can see in the below diagram.

tfdx-noobaa-architecture2

 

Note that they’re not targeting low-latency workloads. At this stage they’re cloud agnostic and hoping to keep things that way. Heterogeneous resources are key for NooBaa. You can also sign up for the Community Edition – limited to 20TB aggregate object size.
Final Thoughts and Reading

 

The name doesn’t roll off the tongue, and the colour-scheme is very pretty. But I think this belies the thought that’s gone into this product. Yuval and his team have a strong background in scalable object storage, and I’m excited to see them finally come out of stealth. The concept of treating storage nodes as second class citizens is interesting, and I’m looking forward to taking the Community Edition for a spin when I get my act together in the near future. In the meantime, head over to Alastair’s blog for a more succinct write-up on what we saw. John White also did a great post here. You can grab a copy of my raw notes here, and watch NooBaa’s TFDx presentations here.

 

Caringo Announces SwarmNFS

Caringo recently announced SwarmNFS, and I recently had the opportunity to be briefed by Caringo’s Adrian J Herrera (VP Marketing). If you’re not familiar with Caringo, their main platform is Swarm, which “provides a platform for data protection, management, organization and search at massive scale”. You can read an overview of Swarm here, and there’s also a technical overview here.

 

So what is it?

SwarmNFS is a “stateless Linux process that integrates directly with Caringo Swarm. It delivers a global namespace across NFSv4, HTTP, SCSP (Caring’s protocol), S3, and HDFS, delivering data distribution and data management at scale”.

SwarmNFS is basically an NFS server modified with proprietary code. It is:

  • Stateless and lightweight;
  • Has no caching or spooling;
  • Supports parallel data streaming; and
  • Has no single point of failure, with built-in high availability.

Caringo tell me this makes it a whole lot easier to centralise, distribute and manage data, while using a bunch less resources than a traditional file gateway. You can run it as either a Linux process, an appliance or via a VM. Caringo also tell me that, since they connect directly into Swarm, there are less bottlenecks than the traditional approach using gateways, FUSE and proxies.

Caringo_001

Everything in the UI can be done via the API as well, and it has support for multi-tenancy. As I mentioned before, there’s a global namespace with “Universal Access”, meaning that files can be written, read and edited through any interface (NFSv4, SCSP/HTTP, S3, HDFS). Having been a protocol prisoner in previous roles it’s nice to think the there’s a different way to do things.

 

What do I use it for?

You can use this for all kinds of stuff Adrian ran me through some use cases, including:

  • Media and entertainment (think media streaming / content delivery); and
  • Street view type image storage.

One of the key things here is that, because the platform uses NFS, a lot of application re-work doesn’t necessarily need to occur to take advantage of the object storage platform. In my opinion this is a pretty cool feature of the platform, and one that should definitely see people look at SwarmNFS fairly seriously when evaluating their object storage options.

 

Conclusion

Caringo are doing some really cool stuff. If you haven’t checked out FileFly before, it’s also worth a look. The capabilities of the Swarm platform are growing at a rapid place. And the storage world is becoming more object and less block and file as each day passes. Enrico‘s been telling me that for ages now, and everything I’m seeing supports that. Caringo’s approach to metadata – storing metadata with the object itself – also means you can do a bunch of cool stuff with it fairly easily, like replicating it, applying erasure coding to it, and so forth. The upshot is that now the data’s truly portable. So, if you’re object-curious but still hang out with file types, maybe SwarmNFS might be a nice compromise for everyone.

Caringo_002