Fujifilm Object Archive – Not Your Father’s Tape Library

Disclaimer: I recently attended Storage Field Day 22.  Some expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Fujifilm recently presented at Storage Field Day 22. You can see videos of the presentation here, and download my rough notes from here.

 

Fujifilm Overview

You’ve heard of Fujifilm before, right? They do a whole bunch of interesting stuff – batteries, cameras, copiers. Nami Matsumoto, Director of DMS Marketing and Operations, took us through some of Fujifilm’s portfolio. Fujifilm’s slogan is “Value From Innovation”, and it certainly seems to be looking to extract maximum value from its $1.4B annual spend on research and development. The Recording Media Products Division is focussed on helping “companies future proof their data”.

[image courtesy of Fujifilm]

 

The Problem

The challenge, as always (it seems), is that data growth continues apace while budgets remain flat. As a result, both security and scalability are frequently sacrificed when solutions are deployed in enterprises.

  • Rapid data creation: “More than 59 Zettabytes (ZB) of data will be created, captured, copied, and consumed in the world this year” (IDC 2020)
  • Shift from File to Object Storage
  • Archive Market – 60 – 80%
  • Flat IT budgets
  • Cybersecurity concerns
  • Scalability

 

Enter The Archive

FUJIFILM Object Archive

Chris Kehoe, Director of DMS Sales and Engineering, spent time explaining what exactly FUJIFILM Object Archive was. “Object Archive is an S3 based archival tier designed to reduce cost, increase scale and provide the highest level of security for long-term data retention”. In short, it:

  • Works like Amazon S3 Glacier in your DC
  • Simply integrates with other object storage
  • Scales on tape technology
  • Secure with air gap and full chain of custody
  • Predictable costs and TCO with no API or egress fees

Workloads?

It’s optimised to handle the long-term retention of data, which is useful if you’re doing any of these things:

  • Digital preservation
  • Scientific research
  • Multi-tenant managed services
  • Storage optimisation
  • Active archiving

What Does It Look Like?

There are a few components that go into the solution, including a:

  • Storage Server
  • Smart cache
  • Tape Server

[image courtesy of Fujifilm]

Tape?

That’s right, tape. The tape library supports LTO7, LTO8, TS1160. The data is written using “OTFormat” specification (you can read about that here). The idea is that it packs a bunch of objects together so they get written efficiently.  

[image courtesy of Fujifilm]

Object Storage Too

It uses an “S3-compatible” API – the S3 server is built on Zenko inside (Scality). From an object storage perspective, it works with Cloudian HyperStore, Caringo Swarm, NetApp StorageGRID, Scality Ring. It also has Starfish and Tiger Bridge support.

Other Notes

The product starts at 1PB of licensing. You can read the Solution Brief here. There’s an informative White Paper here. And there’s one of those nice Infographic things here.

Deployment Example

So what does this look like from a deployment perspective? One example was a typical primary storage deployment, with data archived to an on-premises object storage platform (in this case NetApp StorageGRID). When your archive got really “cold”, it would be moved to the Object Archive.

[image courtesy of Fujifilm]

[image courtesy of Fujifilm]

 

Thoughts

Years ago, when a certain deduplication storage appliance company was acquired by a big storage slinger, stickers with “Tape is dead, get over it” were given out to customers. I think I still have one or two in my office somewhere. And I think the sentiment is spot on, at least in terms of the standard tape library deployments I used to see in small to mid to large enterprise. The problem that tape was solving for those organisations at the time has largely been dealt with by various disk-based storage solutions. There are nonetheless plenty of use cases where tape is still considered useful. I’m not going to go into every single reason, but the cost per GB of tape, at a particular scale, is hard to beat. And when you want to safely store files for a long period of time, even offline? Tape, again, is hard to beat. This podcast from Curtis got me thinking about the demise of tape, and I think this presentation from Fujifilm reinforced the thinking that it was far from on life support – at least in very specific circumstances.

Data keeps growing, and we need to keep it somewhere, apparently. We also need to think about keeping it in a way that means we’re not continuing to negatively impact the environment. It doesn’t necessarily make sense to keep really old data permanently online, despite the fact that it has some appeal in terms of instant access to everything ever. Tape is pretty good when it comes to relatively low energy consumption, particularly given the fact that we can’t yet afford to put all this data on All-Flash storage. And you can keep it available in systems that can be relied upon to get the data back, just not straight away. As I said previously, this doesn’t necessarily make sense for the home punter, or even for the small to midsize enterprise (although I’m tempted now to resurrect some of my older tape drives and see what I can store on them). It really works better at large scale (dare I say hyperscale?). Given that we seem determined to store a whole bunch of data with the hyperscalers, and for a ridiculously long time, it makes sense that solutions like this will continue to exist, and evolve. Sure, Fujifilm has sold something like 170 million tapes worldwide. But this isn’t simply a tape library solution. This is a wee bit smarter than that. I’m keen to see how this goes over the next few years.

Aparavi Announces Enhancements, Makes A Good Thing Better

I recently had the opportunity to speak to Victoria Grey (CMO) and Jonathan Calmes (VP Business Development) from Aparavi regarding some updates to their Active Archive solution. If you’re a regular reader, you may remember I’m quite a fan of Aparavi’s approach. I thought I’d share some of my thoughts on the announcement here.

 

Aparavi?

According to Aparavi, Active Archive delivers “SaaS-based Intelligent, Multi-Cloud Data Management”. The idea is that:

  • Data is archived to cloud or on-premises based on policies for long-term lifecycle management;
  • Data is organised for easy access and retrieval; and
  • Data is accessible via Contextual Search.

Sounds pretty neat. So what’s new?

 

What’s New?

Direct-to-cloud

Direct-to-cloud provides the ability to archive data directly from source systems to the cloud destination of choice, with minimal local storage requirements. Instead of having to sotre archive data locally, you can now send bits of it straight to cloud, minimising your on-premises footprint.

  • Now supporting AWS, Backblaze B2, Caringo, Google, IBM Cloud, Microsoft Azure, Oracle Cloud, Scality, and Wasabi;
  • Trickle or bulk data migration – Adding bulk migration of data from one storage destination to another; and
  • Dynamic translation from cloud to cloud.

[image courtesy of Aparavi]

Data Classification

The Active Archive solution can now index, classify, and tag archived data. This makes it simple to classify data based on individual words, phrases, dates, file types, and patterns. Users can easily identify and tag data for future retrieval purposes such as compliance, reference, or analysis.

  • Customisable taxonomy using specific words, phrases, patterns, or meta-data
  • Pre-set classifications of “legal”, “confidential”, and PII
  • Easy to add new ad–hoc classifications at any time

Advanced Archive Search

Intuitive query interface

  • Search by metadata including classifications, tag, dates, file name, file type, optionally with wildcards
  • Search within document content using words, phrases, patterns, and complex queries
  • Searches across all locations
  • Contextual Search: produces results of the match within context
  • No retrieval until file is selected; no egress fees until retrieved

 

Conclusion

I was pretty enthusiastic about Aparavi when they came out of stealth, and I’m excited about some of the new features they’ve added to the solution. Data management is a hard nut to crack. Primarily because a lot of different organisations have a lot of different requirements for storing data long term. And there are a lot of different types of data that need to be stored. Aparavi isn’t a silver bullet for data management by any stretch, but it certainly seems to meet a lot of the foundational requirements for a solid archive strategy. There are some excellent options in terms of storage by location, search, and organisation.

The cool thing isn’t just that they’ve developed a solid multi-cloud story. Rather, it’s that there are options when it comes to the type of data mobility the user might require. They can choose to do bulk migrations, or take it slower by trickling data to the destination. This provides for some neat flexibility in terms of infrastructure requirements and windows of opportunity. It strikes me that it’s the sort of solution that can be tailored to work with a business’s requirements, rather than pushing it in a certain direction.

I’m also a big fan of Aparavi’s “Open Data” access approach, with an open API that “enables access to archived data for use outside of Aparavi”, along with a published data format for independent data access. It’s a nice change from platforms that feel the need to lock data into proprietary formats in order to store them long term. There’s a good chance the type of data you want to archive in the long term will be around longer than some of these archive solutions, so it’s nice to know you’ve got a chance of getting the data back if something doesn’t work out for the archive software vendor. I think it’s worth keeping an eye on Aparavi, they seem to be taking a fresh approach to what has become a vexing problem for many.

Cohesity Is (Data)Locked In

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Cohesity recently presented at Storage Field Day 18. You can see their videos from Storage Field Day 18 here, and download a PDF copy of my rough notes from here.

 

The Cohesity Difference?

Cohesity covered a number of different topics in its presentation, and I thought I’d outline some of the Cohesity features before I jump into the meat and potatoes of my article. Some of the key things you get with Cohesity are:

  • Global space efficiency;
  • Data mobility;
  • Data resiliency & compliance;
  • Instant mass restore; and
  • Apps integration.

I’m going to cover 3 of the 5 here, and you can check the videos for details of the Cohesity MarketPlace and the Instant Mass Restore demonstration.

Global Space Efficiency

One of the big selling points for the Cohesity data platform is the ability to deliver data reduction and small file optimisation.

  • Global deduplication
    • Modes: inline, post-process
  • Archive to cloud is also deduplicated
  • Compression
    • Zstandard algorithm (read more about that here)
  • Small file optimisation
    • Better performance for reads and writes
    • Benefits from deduplication and compression

Data Mobility

There’s also an excellent story when it comes to data mobility, with the platform delivering the following data mobility features:

  • Data portability across clouds
  • Multi-cloud replication and archival (1:many)
  • Integrated indexing and search across locations

You also get simultaneous, multi-protocol access and a comprehensive set of file permissions to work with.

 

But What About Archives And Stuff?

Okay, so all of that stuff is really cool, and I could stop there and you’d probably be happy enough that Cohesity delivers the goods when it comes to a secondary storage platform that delivers a variety of features. In my opinion, though, it gets a lot more interesting when you have a look at some of the archival features that are built into the platform.

Flexible Archive Solutions

  • Archive either on-premises or to cloud;
  • Policy driven archival schedule for long term data retention
  • Data an be retrieved to the same or a different Cohesity cluster; and
  • Archived data is subject to further deduplication.

Data Resiliency and Compliance – ensures data integrity

  • Erasure coding;
  • Highly available; and
  • DataLock and legal hold.

Achieving Compliance with File-level DataLock

In my opinion, DataLock is where it gets interesting in terms of archive compliance.

  • DataLock enables WORM functionality at a file level;
  • DataLock adheres to regulatory acts;
  • Can automatically lock a file after a period of inactivity;
  • Files can be locked manually by setting file attributes;
  • Minimum and maximum retention times can be set; and
  • Cohesity provides a unique RBAC role for Data Security administration.

DataLock on Backups

  • DataLock enables WORM functionality;
  • Prevent changes by locking Snapshots;
  • Applied via backup policy; and
  • Operations performed by Data Security administrators.

 

Ransomware Detection

Cohesity also recently announced the ability to look within Helios for Ransomware. The approach taken is as follows: Prevent. Detect. Respond.

Prevent

There’s some good stuff built into the platform to help prevent ransomware in the first place, including:

  • Immutable file system
  • DataLock (WORM)
  • Multi-factor authentication

Detect

  • Machine-driven anomaly detection (backup data, unstructured data)
  • Automated alert

Respond

  • Scalable file system to store years worth of backup copies
  • Google-like global actionable search
  • Instant mass restore

 

Thoughts and Further Reading

The conversation with Cohesity got a little spirited in places at Storage Field Day 18. This isn’t unusual, as Cohesity has had some problems in the past with various folks not getting what they’re on about. Is it data protection? Is it scale-out NAS? Is it an analytics platform? There’s a lot going on here, and plenty of people (both inside and outside Cohesity) have had a chop at articulating the real value of the solution. I’m not here to tell you what it is or isn’t. I do know that a lot of the cool stuff with Cohesity wasn’t readily apparent to me until I actually had some stick time with the platform and had a chance to see some of its key features in action.

The DataLock / Security and Compliance piece is interesting to me though. I’m continually asking vendors what they’re doing in terms of archive platforms. A lot of them look at me like I’m high. Why wouldn’t you just use software to dump your old files up to the cloud or onto some cheap and deep storage in your data centre? After all, aren’t we all using software-defined data centres now? That’s certainly an option, but what happens when that data gets zapped? What if the storage platform you’re using, or the software you’re using to store the archive data, goes bad and deletes the data you’re managing with it? Features such as DataLock can help with protecting you from some really bad things happening.

I don’t believe that data protection data should be treated as an “archive” as such, although I think that data protection platform vendors such as Cohesity are well placed to deliver “archive-like” solutions for enterprises that need to retain protection data for long periods of time. I still think that pushing archive data to another, dedicated, tier is a better option than simply calling old protection data “archival”. Given Cohesity’s NAS capabilities, it makes sense that they’d be an attractive storage target for dedicated archive software solutions.

I like what Cohesity have delivered to date in terms of a platform that can be used to deliver data insights to derive value for the business. I think sometimes the message is a little muddled, but in my opinion some of that is because everyone’s looking for something different from these kinds of platforms. And these kinds of platforms can do an awful lot of things nowadays, thanks in part to some pretty smart software and some grunty hardware. You can read some more about Cohesity’s Security and Compliance story here,  and there’s a fascinating (if a little dated) report from Cohasset Associates on Cohesity’s compliance capabilities that you can access here. My good friend Keith Townsend also provided some thoughts on Cohesity that you can read here.

Rubrik Basics – Archival Locations

I’ve been doing some work with Rubrik in our lab and thought it worth covering some of the basic features that I think are pretty neat. In this edition of Rubrik Basics, I thought I’d quickly cover off how to get started with the Archival Locations feature. You can read the datasheet here.

 

Rubrik and Archiving Policies

So what can you do with Archival Locations? Well, the idea is that you can copy data to another location for safe-keeping. Normally this data will live in that location for a longer period than it will in the on-premises Brik you’re using. You might, for example, keep data on your appliance for 30 days, and have archive data living in a cloud location for another 2 years.

 

Archival Location Support

Rubrik supports a variety of Archival Locations, including:

  • Public Cloud: Amazon Web Services S3, S3-IA, S3-RRS and Glacier; Microsoft Azure Blob Storage LRS, ZRS and GRS; Google Cloud Platform Nearline, Coldline, Multi-Regional and Regional; (also includes support for Government Cloud Options in AWS and Azure);
  • Private Cloud (S3 Object Store): Basho Riak, Cleversafe, Cloudian, EMC ECS, Hitachi Content Platform, IIJ GIO, Red Hat Ceph, Scality;
  • NFS: Any NFS v3 Compliant Target; and
  • Tape: All Major Tape Vendors via QStar.

What’s cool is that multiple, active archival locations can be configured for a Rubrik cluster. You can then select an archival location when an SLA policy is created or edited. This is particularly useful when you have a number of different tenants hosted on the same Brik.

 

Setup

To setup an Archival Location, click on the “Gear” icon in the Rubrik interface (in this example I’m using Rubrik CDM 4.1) and select “Archival Locations”.

Click on the + sign.

You can then choose the archival type, selecting from Amazon S3 (or Glacier), Azure, Google Cloud Platform, NFS or Tape (via QStar). In this example I’m setting up an Amazon S3 bucket.

You then need to select the Region and Storage Class, and provide your AWS Access Key, Secret Key and S3 Bucket.

You also need to choose the encryption type. I’m not using an external KMS in our lab, so I’ve used OpenSSL to generate a key using the following command.

Once you run that command, paste the contents of the PEM file.

Once you’ve added the location, you’ll see it listed, along with some high level statistics.

Once you have an Archival Location configured, you can add it to existing SLA Domains, or use it when you create a new SLA Domain.

Instant Archive

The Instant Archive feature can also be used to immediately queue a task to copy a new snapshot to a specified archival location. Note that the Instant Archive feature does not change the amount of time that a snapshot is retained locally on the Rubrik cluster. The Retention On Brik setting determines how long a snapshot is kept on the Rubrik cluster.

 

Thoughts

Rubrik’s Data Archival is flexible as well as simple to use. It’s easy to setup and works as promised. There is a bunch of stuff happening within the Rubrik environment that means that you can access protection data across multiple locations as well, so you might find that a combination of a Rubrik Brik and some cheap and deep NFS storage is a good option to store backup data for an extended period of time. You might also think about using this feature as a way to do data mobility or disaster recovery, depending on the type of disaster you’re trying to recover from.

Rubrik – Cloud Data What?

I’ve done a few posts on Cohesity in the past, and I have some friends who work at Rubrik. So it seemed like a good idea to put up a short article on what Rubrik do. Thanks to Andrew Miller at Rubrik for helping out with the background info.

 

The Platform

It’s converged hardware and software (called “Briks” – there are different models but 2RU (4 nodes) are the most common).

[image via Rubrik’s website]

The Rubrik solution:

  • Is fundamentally built on a scale out architecture;
  • Provides a built-in backup application/catalogue with deduplication and compression;
  • Uses a custom file system, distributed task scheduler, distributed metadata, etc;
  • Delivers cloud native archiving, policy driven at the core around imperative vs. declarative;
  • Can leverage cloud native archive (with native hooks into AWS/Azure/etc.);
  • Has a custom VSS provider to help with STUN (super VMware friendly),
  • Provides a native API since day one (REST-based), and along with vSphere (VADP, CBT, NBDSSL), handles SQL and Linux natively (there’s apparently more to come on that front); and
  • There’s an edge appliance for ROBO, amongst other things.

 

Cloud Data Management

Rubrik position their solution as “Cloud Data Management”.

In a similar fashion to Cohesity, Rubrik are focused on a bunch of stuff, not just backup and recovery or copy data management. There’s a bunch of stuff you can do around archive and compliance, and Rubrik tell me the search capabilities are pretty good too.

It also works well with technologies such as VMware vSAN. Chris Wahl and Cormac Hogan wrote a whitepaper on the integration that you can get here (registration required).

 

Thoughts

As you can see from this post there’s a lot to look into with Rubrik (and Cohesity for that matter) and I’ve really only barely scratched the surface. The rising popularity of smarter secondary storage solutions such as these points to a desire in the marketplace to get sprawling data under control via policy rather than simple tiers of disk. This is a good thing. Add in the heavy focus on API-based control and I think we’re in for exciting times (or as exciting as this kind of stuff gets in any case). If you’re interested in some of what you can do with Rubrik there’s a playlist on YouTube with some demos that give a reasonable view of what you can do. I’m hoping to dig a little deeper into the Rubrik solution in the next little while, and I’m particularly interested to see what it can do from an archiving perspective, so stay tuned.