Datadobi Announces StorageMAP

Datadobi recently announced StorageMAP – a “solution that provides a single pane of glass for organizations to manage unstructured data across their complete data storage estate”. I recently had the opportunity to speak with Carl D’Halluin about the announcement, and thought I’d share some thoughts here.

 

The Problem

So what’s the problem enterprises are trying to solve? They have data all over the place, and it’s no longer a simple activity to work out what’s useful and what isn’t. Consider the data on a typical file / object server inside BigCompanyX.

[image courtesy of Datadobi]

As you can see, there’re all kinds of data lurking about the place, including data you don’t want to have on your server (e.g. Barry’s slightly shonky home videos), and data you don’t need any more (the stuff you can move down to a cheaper tier, or even archive for good).

What’s The Fix?

So how do you fix this problem? Traditionally, you’ll try and scan the data to understand things like capacity, categories of data, age, and so forth. You’ll then make some decisions about the data based on that information and take actions such as relocating, deleting, or migrating it. Sounds great, but it’s frequently a tough thing to make decisions about business data without understanding the business drivers behind the data.

[image courtesy of Datadobi]

What’s The Real Fix?

The real fix, according to Datadobi, is to add a bit more automation and smarts to the process, and this relies heavily on accurate tagging of the data you’re storing. D’Halluin pointed out to me that they don’t suggest you create complex tags for individual files, as you could be there for years trying to sort that out. Rather, you add tags to shares or directories, and let the StorageMAP engine make recommendations and move stuff around for you.

[image courtesy of Datadobi]

Tags can represent business ownership, the role of the data, any action to be taken, or other designations, and they’re user definable.
[image courtesy of Datadobi]

How Does This Fix It?

You’ll notice that the process above looks awfully similar to the one before – so how does this fix anything? The key, in my opinion at least, is that StorageMAP takes away the requirement for intervention from the end user. Instead of going through some process every quarter to “clean up the server”, you’ve got a process in place to do the work for you. As a result, you’ll hopefully see improved cost control, better storage efficiency across your estate, and (hopefully) you’ll be getting a little bit more value from your data.

 

Thoughts

Tools that take care of everything for you have always had massive appeal in the market, particularly as organisations continue to struggle with data storage at any kind of scale. Gone are the days when your admins had an idea where everything on a 9GB volume was stored, or why it was stored there. We now have data stored all over the place (both officially and unofficially), and it’s becoming impossible to keep track of it all.

The key things to consider with these kinds of solutions is that you need to put in the work with tagging your data correctly in the first place. So there needs to be some thought put into what your data looks like in terms of business value. Remember that mp4 video files might not be warranted in the Accounting department, but your friends in Marketing will be underwhelmed if you create some kind of rule to automatically zap mp4s. The other thing to consider is that you need to put some faith in the system. This kind of solution will be useless if folks insist on not deleting anything, or not “believing” the output of the analytics and reporting. I used to work with customers who didn’t want to trust a vendor’s automated block storage tiering because “what does it know about my workloads?”. Indeed. The success of these kind of intelligence and automation tools relies to a certain extent on folks moving away from faith-based computing as an operating model.

But enough ranting from me. I’ve covered Datadobi a bit over the last few years, and it makes sense that all of these announcements have finally led to the StorageMAP product. These guys know data, and how to move it.

Random Short Take #70

Welcome to Random Short Take #70. Let’s get random.

Random Short Take #69

Welcome to Random Short Take #69. Let’s get random.

Random Short Take #67

Welcome to Random Short Take #67. Let’s get random.

  • MinIO was in the news recently, and this article from Chin-Fah seems to summarise nicely what you need to know.
  • Whenever I read articles about home Internet connectivity, I generally chuckle in Australian and move on. But this article from Jeff Geerling on his experience with Starlink makes for interesting reading, if only for the somewhat salty comments people felt the need to leave after the article was published. He nonetheless brings up some great points about challenges with the service, and I think the endless fawning over Musk as some kind of tech saviour needs to stop.
  • In the “just because you can, doesn’t mean you should” category is this article from William Lam, outlining how to create a VMFS datastore on a USB device. It’s unsupported, but it strikes me that this is just the kind of crazy thing that might be useful to folks trying to move around VMs at the edge.
  • Karen Lopez is a really smart person, and this article over at Gestalt IT is more than just the “data is the new oil” schtick we’ve been hearing for the past few years.
  • Speaking of Pure Storage, Kyndryl and Pure Storage have announced a global alliance. You can read more on that here.
  • Mike Preston wrote a brief explainer on S3 Object Lock here. I really enjoy Mike’s articles, as I find he has a knack for breaking down complex topics into very simple to digest and consume pieces.
  • Remember when the movies and TV shows you watched had consistent aspect ratios? This article from Tom Andry talks about how that’s changed quite a bit in the last few years.
  • I’m still pretty fresh in my role, but in the future I hope to be sharing more news and articles about VMware Cloud on AWS. In the meantime, check out this article from Greg Vinton, where he covers some of his favourite parts of what’s new in the platform.

In unrelated news, this is the last week to vote for the #ITBlogAwards. You can cast your vote here.

Random Short Take #51

Welcome to Random Short Take #51. A few players have worn 51 in the NBA including Lawrence Funderburke (I remember the Ohio State team wearing grey Nikes on TV and thinking that was a really cool sneaker colour – something I haven’t been able to shake over 25 years later). My pick is Boban Marjanović though. Let’s get random.

  • Folks don’t seem to spend much time making sure the fundamentals are sound, particularly when it comes to security. This article from Jess provides a handy list of things you should be thinking about, and doing, when it comes to securing your information systems. As she points out, it’s just a starting point, but I think it should be seen as a bare minimum / entry level set of requirements that you could wrap around most environments out in the wild.
  • Could there be a new version of AIX on the horizon? Do I care? Not really. But I do sometimes yearn for the “simpler” times I spent working on a myriad of proprietary open systems, particularly when it came to storage array support.
  • StorCentric recently announced Nexsan Assureon Cloud Edition. You can read the press release here.
  • Speaking of press releases, Zerto continues to grow its portfolio of cloud protection technology. You can read more on that here.
  • Spectro Cloud has been busy recently, and announced supporting for management of existing Kubernetes deployments. The news on that can be found here.
  • Are you a data hoarder? I am. This article won’t help you quit data, but it will help you understand some of the things you can do to protect your data.
  • So you’ve found yourself with a publicly facing vCenter? Check out this VMware security advisory, and get patching ASAP. vCenter is the only thing you need to be patching either, but hopefully you knew that already.
  • John Birmingham is one of my favourite writers. Not just for his novels with lots of things going bang, but also for his blog posts about food. And things of that nature.

Hammerspace, Storageless Data, And One Tough Problem

Disclaimer: I recently attended Storage Field Day 21.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Hammerspace recently presented at Storage Field Day 21. You can see videos of the presentation here, and download my rough notes from here.

 

Storageless Data You Say?

David Flynn kicked off the presentation from Hammerspace talking about storageless data. Storageless data? What on earth is that, then? Ultimately your data has to live on storage. But this all about consumption side abstraction. Hammerspace doesn’t want you to care about how your application maps to servers, or how it maps to storage. It’s more of a data-focussed approach to storage than we’re used to, perhaps. Some of the key requirements of the solution are as follows:

  • The agent needs to run on everything – virtual, physical, containers – it can’t be bound to specific hardware
  • Needs to be multi-vendor and support multi-protocol
  • Presumes metadata
  • Make data into a routed resource
  • Deliver objective-based orchestration

The trick is that you have to be able to do all of this without killing the benefits of the infrastructure (performance, reliability, cost, and management). Simple, huh?

Stitching It Together

A key part of the Hammerspace story is the decoupling of the control plane and the data plane. This allows it to focus on getting the data where it needs to be, from edge to cloud, and over whatever protocol it needs to be done over.

[image courtesy of Hammerspace]

Other Notes

Hammerspace officially supports 8 sites at the moment, and the team have tested the solution with 32 sites. It uses an eventually consistent model, and the Global Namespace is global per share, providing flexible deployment options. Metadata replication can be setup to be periodic – and customised at each site. You always rehydrate the data and serve it locally over NAS via SMB or NFS.

Licensing Notes

Hammerspace is priced on capacity (data under management). You can also purchase it via the AWS Marketplace. Note that you can access up to 10TB free on the public cloud vendors (AWS, GCP, Azure) from a Hammerspace perspective.

 

Thoughts and Further Reading

I was fortunate to have a followup session with Douglas Fallstrom and Brendan Wolfe to revisit the Hammerspace story, ask a few more questions, and check out some more demos. I asked Fallstrom about the kind of use cases they were seeing in the field for Hammerspace. One popular use case was for disaster recovery. Obviously, there’s a lot more to doing DR than just dumping data in multiple locations, but it seems that there’s appetite for this very thing. At a high level, Hammerspace is a great choice for getting data into multiple locations, regardless of the underlying platform. Sure, there’s a lot more that needs to be done once it’s in another location, or when something goes bang. But from the perspective of keeping things simple, this one is up there.

Fallstrom was also pretty clear with me that this isn’t Primary Data 2.0, regardless of the number of folks that work at Hammerspace with that heritage. I think it’s a reasonable call, given that Hammerspace is doubling down on the data story, and really pushing the concept of a universal file system, regardless of location or protocol.

So are we finally there in terms of data abstraction? It’s been a problem since computers became common in the enterprise. As technologists we frequently get caught up in the how, and not as much in the why of storage. It’s one thing to say that I can scale this to this many Petabytes, or move these blocks from this point to that one. It’s an interesting conversation for sure, and has proven to be a difficult problem to solve at times. But I think as a result of this, we’ve moved away from understanding the value of data, and data management, and focused too much on the storage and services supporting the data. Hammerspace has the noble goal of moving us beyond that conversation to talking about data and the value that it can bring to the enterprise. Is it there yet in terms of that goal? I’m not sure. It’s a tough thing to be able to move data all over the place in a reliable fashion and still have it do what it needs to do with regards to performance and availability requirements. Nevertheless I think that the solution does a heck of a lot to remove some of the existing roadblocks when it comes to simplified data management. Is serverless compute really a thing? No, but it makes you think more about the applications rather than what they run on. Storageless data is aiming to do the same thing. It’s a bold move, and time will tell whether it pays off or not. Regardless of the success or otherwise of the marketing team, I’m thinking that we’ll be seeing a lot more innovation coming out of Hammerspace in the near future. After all, all that data isn’t going anywhere any time soon. And someone needs to take care of it.

StorCentric Announces Data Mobility Suite

StorCentric recently announced its Data Mobility Suite (DMS). I had the opportunity to talk to Surya Varanasi (StorCentric CTO) about the news, and thought I’d share some of my notes here.

 

What Is It?

DMS is being positioned as a suite of “data cloud services” by StorCentric, with a focus on:

  • Data migration;
  • Data consistency; and
  • Data operation.

It has the ability to operate across heterogeneous storage, clouds, and protocols. It’s a software solution based on subscription licensing and uses a policy-driven engine to manage data in the enterprise. It can run on bare-metal or as a VM appliance. Object storage platform / cloud support if fairly robust, with AWS, Backblaze B2, and Wasabi, amongst others, all being supported.

[image courtesy of StorCentric]

Use Cases

There are a number of scenarios where a solution like DMS makes sense. You might have a bunch of NFS storage on-premises, for example, and want to move it to a cloud storage target using S3. Another use case cited involved collaboration across multiple sites, with the example being a media company creating content in three places, and working in different time zones, and wanting to move the data back to a centralised location.

Big Ideas

Speaking to StorCentric about the announcement, it was clear that there’s a lot more on the DMS roadmap. Block storage is something the team wants to tackle, and they’re also looking to deliver analytics and ransomware alerting. There’s also a strong desire to provide governance as well. For example, if I want to copy some data somewhere and keep it for 10 years, I’ll configure DMS to take care of that for me.

 

Thoughts and Further Reading

Data management means a lot of things to a lot of people. Storage companies often focus on moving blocks and files from one spot to another, but don’t always do a solid job of capturing data needs to be stored where it does. Or how, for that matter. There’s a lot more to data management than keeping ones and zeroes in a safe place. But it’s not just about being able to move data from one spot to another. It’s about understanding the value of your data, and understanding where it needs to be to deliver the most value to your organisation. Whilst it seems like DMS is focused primarily on moving data from one spot to another, there’s plenty of potential here to develop a broader story in terms of data governance and mobility. There’s built-in security, and the ability to apply levels of data governance to data in various locations. The greater appeal here is also the ability to automate the movement of data to different places based on policy. This policy-driven approach becomes really interesting when you start to look at complicated collaboration scenarios, or need to do something smart with replication or data migration.

Ultimately, there are a bunch of different ways to get data from one point to another, and a bunch of different reasons why you might need to do that. The value in something like DMS is the support for heterogeneous storage platforms, as well as the simple to use GUI support. Plenty of data migration tools come with extremely versatile command line interfaces and API support, but the trick is delivering an interface that is both intuitive and simple to navigate. It’s also nice to have a few different use cases met with one tool, rather than having to reach into the bag a few different times to solve very similar problems. StorCentric has a lot of plans for DMS moving forward, and if those plans come to fruition it’s going to form a very compelling part of the typical enterprise’s data management toolkit. You can read the press release here.

Spectra Logic – BlackPearl Overview

I recently had the opportunity to take a briefing with Jeff Braunstein and Susan Merriman from Spectra Logic (one of those rare occasions where getting your badge scanned at a conference proves valuable), and thought I’d share some of my notes here.

 

BlackPearl Family

Spectra Logic sell a variety of products, but this briefing was focused primarily on the BlackPearl series. Braunstein described it as a “gateway” device, with both NAS and object front end interfaces, and backend capability that can move data to multiple types of archives.

[image courtesy of Spectra Logic]

It’s a hardware box, but at its core the value is in the software product. The idea is that the BlackPearl acts as a disk cache, and you configure policies to send the data to one or more storage targets. The cool thing is that it supports multiple retention policies, and these can be permanent too. By that I mean you could spool one copy to tape for long term storage, and have another copy of your data sit on disk for 90 days (or however long you wanted).

 

Local vs Remote Storage

Local

There are a few different options for local storage, including BlackPearl Object Storage Disk, functioning as “near line archive”. This is configured with 107 enterprise quality SATA drives, (and they’re looking at introducing 16TB drives next month), providing roughly 1.8PB RAW capacity. They function as power-down archive drives (using the drive spin down settings), and delivers a level of resilience and reliability by using ZFS as the file system,. There are also customer-configurable parity settings. Alternatively, you can pump data to Spectra Tape Libraries, for those of you who still want to use tape as a storage format.

 

Remote Storage Targets

In terms of remote storage targets, BlackPearl can leverage either public cloud, or other BlackPearl devices as replication targets. Replication to BlackPearl can be one way or bi-directional. Public Cloud support is available via Amazon S3 (and S3-like products such as Cloudian and Wasabi), and MS Azure. There is a concept of data immutability in the product, and you can turn on versioning to prevent your data management applications (or users) from accidentally clobbering your data.

Braunstein also pointed out that tape generations evolve, and BlackPearl has auto-migration capabilities. You can potentially have data migrate transparently from tape to tape (think LTO-6 to LTO-7), tape to disk, and tape to cloud.

 

[image courtesy of Spectra Logic]

In terms of how you leverage BlackPearl, some of that is dependent on the workflows you have in place to move your data. This could be manual, semi-automated, or automated (or potentially purpose built into existing applications). There’s a Spectra S3 RESTful API, and there’s heaps of information on developer.spectralogic.com on how to integrate BlackPearl into your existing applications and media workflows.

 

Thoughts

If you’re listening to the next-generation data protection vendors and big box storage folks, you’d wonder why companies such as Spectra Logic still focus on tape. It’s not because they have a rich heritage and deep experience in the tape market (although they do). There are plenty of use cases where tape still makes sense in terms of its ability to economically store large amounts of data in a relatively secure (off-line if required) fashion. Walk into any reasonably sized film production house and you’ll still see tape in play. From a density perspective (and durability), there’s a lot to like about tape. But BlackPearl is also pretty adept at getting data from workflows that were traditionally file-based and putting them on public cloud environments (the kind of environments that heavily leverage object storage interfaces). Sure, you can pump the data up to AWS yourself if you’re so inclined, but the real benefit of the BlackPearl approach, in my opinion, is that it’s policy-driven and fully automated. There’s less chance that you’ll fat finger the transfer of critical data to another location. This gives you the ability to focus on your core business, and not have to worry about data management.

I’ve barely scratched the surface of what BlackPearl can do, and I recommend checking out their product site for more information.

Aparavi Announces Enhancements, Makes A Good Thing Better

I recently had the opportunity to speak to Victoria Grey (CMO) and Jonathan Calmes (VP Business Development) from Aparavi regarding some updates to their Active Archive solution. If you’re a regular reader, you may remember I’m quite a fan of Aparavi’s approach. I thought I’d share some of my thoughts on the announcement here.

 

Aparavi?

According to Aparavi, Active Archive delivers “SaaS-based Intelligent, Multi-Cloud Data Management”. The idea is that:

  • Data is archived to cloud or on-premises based on policies for long-term lifecycle management;
  • Data is organised for easy access and retrieval; and
  • Data is accessible via Contextual Search.

Sounds pretty neat. So what’s new?

 

What’s New?

Direct-to-cloud

Direct-to-cloud provides the ability to archive data directly from source systems to the cloud destination of choice, with minimal local storage requirements. Instead of having to sotre archive data locally, you can now send bits of it straight to cloud, minimising your on-premises footprint.

  • Now supporting AWS, Backblaze B2, Caringo, Google, IBM Cloud, Microsoft Azure, Oracle Cloud, Scality, and Wasabi;
  • Trickle or bulk data migration – Adding bulk migration of data from one storage destination to another; and
  • Dynamic translation from cloud to cloud.

[image courtesy of Aparavi]

Data Classification

The Active Archive solution can now index, classify, and tag archived data. This makes it simple to classify data based on individual words, phrases, dates, file types, and patterns. Users can easily identify and tag data for future retrieval purposes such as compliance, reference, or analysis.

  • Customisable taxonomy using specific words, phrases, patterns, or meta-data
  • Pre-set classifications of “legal”, “confidential”, and PII
  • Easy to add new ad–hoc classifications at any time

Advanced Archive Search

Intuitive query interface

  • Search by metadata including classifications, tag, dates, file name, file type, optionally with wildcards
  • Search within document content using words, phrases, patterns, and complex queries
  • Searches across all locations
  • Contextual Search: produces results of the match within context
  • No retrieval until file is selected; no egress fees until retrieved

 

Conclusion

I was pretty enthusiastic about Aparavi when they came out of stealth, and I’m excited about some of the new features they’ve added to the solution. Data management is a hard nut to crack. Primarily because a lot of different organisations have a lot of different requirements for storing data long term. And there are a lot of different types of data that need to be stored. Aparavi isn’t a silver bullet for data management by any stretch, but it certainly seems to meet a lot of the foundational requirements for a solid archive strategy. There are some excellent options in terms of storage by location, search, and organisation.

The cool thing isn’t just that they’ve developed a solid multi-cloud story. Rather, it’s that there are options when it comes to the type of data mobility the user might require. They can choose to do bulk migrations, or take it slower by trickling data to the destination. This provides for some neat flexibility in terms of infrastructure requirements and windows of opportunity. It strikes me that it’s the sort of solution that can be tailored to work with a business’s requirements, rather than pushing it in a certain direction.

I’m also a big fan of Aparavi’s “Open Data” access approach, with an open API that “enables access to archived data for use outside of Aparavi”, along with a published data format for independent data access. It’s a nice change from platforms that feel the need to lock data into proprietary formats in order to store them long term. There’s a good chance the type of data you want to archive in the long term will be around longer than some of these archive solutions, so it’s nice to know you’ve got a chance of getting the data back if something doesn’t work out for the archive software vendor. I think it’s worth keeping an eye on Aparavi, they seem to be taking a fresh approach to what has become a vexing problem for many.

Cloudtenna Announces DirectSearch

 

I had the opportunity to speak to Aaron Ganek about Cloudtenna and their DirectSearch product recently and thought I’d share some thoughts here. Cloudtenna recently announced $4M in seed funding, have Citrix as a key strategic partner, and are shipping a beta product today. Their goal is “[b]ringing order to file chaos!”.

 

The Problem

Ganek told me that there are three major issues with file management and the plethora of collaboration tools used in the modern enterprise:

  • Search is too much effort
  • Security tends to fall through the cracks
  • Enterprise IT is dangerously non-compliant

Search

Most of these collaboration tools are geared up for search, because people don’t tend to remember where they put files, or what they’ve called them. So you might have some files in your corporate Box account, and some in Dropbox, and then some sitting in Confluence. The problem with trying to find something is that you need to search each application individually. According to Cloudtenna, this:

  • Wastes time;
  • Leads to frustration; and
  • Often yields poor results.

Security

Security also becomes a problem when you have multiple storage repositories for corporate files.

  • There are too many apps to manage
  • It’s difficult to track users across applications
  • There’s no consolidated audit trail

Exposure

As a result of this, enterprises find themselves facing exposure to litigation, primarily because they can’t answer these questions:

  • Who accessed what?
  • When and from where?
  • What changed?

As some of my friends like to say “people die from exposure”.

 

Cloudtenna – The DirectSearch Solution

Enter DirectSearch. At its core it’s a SaaS offering that

  • Catalogues file activity across disparate data silos; and
  • Delivers machine learning services to mitigate the “chaos”.

Basically you point it at all of your data repositories and you can then search across all of those from one screen. The cool thing about the catalogue is not just that it tracks metadata and leverages full-text indexing, it also tracks user activity. It supports a variety of on-premises, cloud and SaaS applications (6 at the moment, 16 by September). You only need to login once and there’s full ACL support – so users can only see what they’re meant to see.

According to Ganek, it also delivers some pretty fast search results, in the order of 400 – 600ms.

[image courtesy of Cloudtenna]

I was interested to know a little more about how the machine learning could identify files that were being worked on by people in the same workgroup. Ganek said they didn’t rely on Active Directory group membership, as these were often outdated. Instead, they tracked file activity to create a “Shadow IT organisational chart” that could be used to identify who was collaborating on what, and tailor the search results accordingly.

 

Thoughts and Further Reading

I’ve spent a good part of my career in the data centre providing storage solutions for enterprises to host their critical data on. I talk a lot about data and how important it is to the business. I’ve worked at some established companies where thousands of files are created every day and terabytes of data is moved around. Almost without fail, file management has been a pain in the rear. Whether I’ve been using Box to collaborate, or sending links to files with Dropbox, or been stuck using Microsoft Teams (great for collaboration but hopeless from a management perspective), invariably files get misplaced or I find myself firing up a search window to try and track down this file or that one. It’s a mess because we don’t juts work from a single desktop and carefully curated filesystem any more. We’re creating files on mobile devices, emailing them about, and gathering data from systems that don’t necessarily play well on some platforms. It’s a mess, but we need access to the data to get our jobs done. That’s why something like Cloudtenna has my attention. I’m looking forward to seeing them progress with the beta of DirectSearch, and I have a feeling they’re on to something pretty cool with their product. You can also read Rich’s thoughts on Cloudtenna over at the Gestalt IT website.