Random Short Take #13

Here are a few links to some random news items and other content that I found interesting. You might find them interesting too. Let’s dive in to lucky number 13.

Cohesity Marketplace – A Few Notes

 

Cohesity first announced their Marketplace offering in late February. I have access to a Cohesity environment (physical and virtual) in my lab, and I’ve recently had the opportunity to get up and running on some of the Marketplace-ready code, so I thought I’d share my experiences here.

 

Prerequisites

I’m currently running version 6.2 of Cohesity’s DataPlatform. I’m not sure whether this is widely available yet or still only available for early adopter testing. My understanding is that the Marketplace feature will be made generally available to Cohesity customers when 6.3 ships. The Cohesity team did install a minor patch (6.2a) on my cluster as it contained some small but necessary fixes. In this version of the code, a gflag is set to show the Apps menu. The “Enable Apps Management” in the UI under Admin – Cluster Settings was also enabled. You’ll also need to nominate an unused private subnet for the apps to use.

 

Current Application Availability

The Cohesity Marketplace has a number of Cohesity-developed and third-party apps available to install, including:

  • Splunk – Turn machine data into answers
  • SentinelOne – AI-powered threat prevention purpose built for Cohesity
  • Imanis Data – NoSQL backup, recovery, and replication
  • Cohesity Spotlight – Analyse file audit logs and find anomalous file-access patterns
  • Cohesity Insight – Search inside unstructured data
  • Cohesity EasyScript – Create, upload, and execute customised scripts
  • ClamAV – Anti-virus scans for file data

Note that none of the apps need more than Read permissions on the nominated View(s).

 

Process

App Installation

To install the app you want to run on your cluster, click on “Get App”, then enter your Helios credentials.

Review the EULA and click on “Accept & Get” to proceed. You’ll then be prompted to select the cluster(s) you want to deploy the app on. In this example, I have 5 clusters in my Helios environment. I want to install the app on C1, as it’s the physical cluster.

Using An App

Once your app is installed, it’s fairly straightforward to run it. Click on More, then Apps to access your installed apps.

 

Then you just need to click on “Run App” to get started

You’ll be prompted to set the Read Permissions for the App, along with QoS. It’s my understanding that the QoS settings are relative to other apps running on the cluster, not data protection activities, etc. The Read Permissions are applied to one or more Views. This can be changed after the initial configuration. Once the app is running you can click on Open App. In this example I’m using the Cohesity Insight app to look through some unstructured data stored on a View.

 

Thoughts

I’ve barely scratched the surface of what you achieve with the Marketplace on Cohesity’s DataPlatform. The availability of the Marketplace (and the ability to run apps on the platform) is another step closer to Cohesity’s vision of extracting additional value from secondary storage. Coupled with Cohesity’s C4000 series hardware (or perhaps whatever flavour you want to run from Cisco or HPE or the like), I can imagine you’re going to be able to do a heck a lot with this capability, particularly as more apps are validated with the platform.

I hope to do a lot more testing of this capability over the next little while, and I’ll endeavour to report back with my findings. If you’re a current Cohesity customer and haven’t talked to your account team about this capability, it’s worth getting in touch to see what you can do in terms of an evaluation. Of course, it’s also worth noting that, as with most things technology related, just because you can, doesn’t always mean you should. But if you have the use case, this is a cool capability on top of an already interesting platform.

Aparavi Announces Enhancements, Makes A Good Thing Better

I recently had the opportunity to speak to Victoria Grey (CMO) and Jonathan Calmes (VP Business Development) from Aparavi regarding some updates to their Active Archive solution. If you’re a regular reader, you may remember I’m quite a fan of Aparavi’s approach. I thought I’d share some of my thoughts on the announcement here.

 

Aparavi?

According to Aparavi, Active Archive delivers “SaaS-based Intelligent, Multi-Cloud Data Management”. The idea is that:

  • Data is archived to cloud or on-premises based on policies for long-term lifecycle management;
  • Data is organised for easy access and retrieval; and
  • Data is accessible via Contextual Search.

Sounds pretty neat. So what’s new?

 

What’s New?

Direct-to-cloud

Direct-to-cloud provides the ability to archive data directly from source systems to the cloud destination of choice, with minimal local storage requirements. Instead of having to sotre archive data locally, you can now send bits of it straight to cloud, minimising your on-premises footprint.

  • Now supporting AWS, Backblaze B2, Caringo, Google, IBM Cloud, Microsoft Azure, Oracle Cloud, Scality, and Wasabi;
  • Trickle or bulk data migration – Adding bulk migration of data from one storage destination to another; and
  • Dynamic translation from cloud to cloud.

[image courtesy of Aparavi]

Data Classification

The Active Archive solution can now index, classify, and tag archived data. This makes it simple to classify data based on individual words, phrases, dates, file types, and patterns. Users can easily identify and tag data for future retrieval purposes such as compliance, reference, or analysis.

  • Customisable taxonomy using specific words, phrases, patterns, or meta-data
  • Pre-set classifications of “legal”, “confidential”, and PII
  • Easy to add new ad–hoc classifications at any time

Advanced Archive Search

Intuitive query interface

  • Search by metadata including classifications, tag, dates, file name, file type, optionally with wildcards
  • Search within document content using words, phrases, patterns, and complex queries
  • Searches across all locations
  • Contextual Search: produces results of the match within context
  • No retrieval until file is selected; no egress fees until retrieved

 

Conclusion

I was pretty enthusiastic about Aparavi when they came out of stealth, and I’m excited about some of the new features they’ve added to the solution. Data management is a hard nut to crack. Primarily because a lot of different organisations have a lot of different requirements for storing data long term. And there are a lot of different types of data that need to be stored. Aparavi isn’t a silver bullet for data management by any stretch, but it certainly seems to meet a lot of the foundational requirements for a solid archive strategy. There are some excellent options in terms of storage by location, search, and organisation.

The cool thing isn’t just that they’ve developed a solid multi-cloud story. Rather, it’s that there are options when it comes to the type of data mobility the user might require. They can choose to do bulk migrations, or take it slower by trickling data to the destination. This provides for some neat flexibility in terms of infrastructure requirements and windows of opportunity. It strikes me that it’s the sort of solution that can be tailored to work with a business’s requirements, rather than pushing it in a certain direction.

I’m also a big fan of Aparavi’s “Open Data” access approach, with an open API that “enables access to archived data for use outside of Aparavi”, along with a published data format for independent data access. It’s a nice change from platforms that feel the need to lock data into proprietary formats in order to store them long term. There’s a good chance the type of data you want to archive in the long term will be around longer than some of these archive solutions, so it’s nice to know you’ve got a chance of getting the data back if something doesn’t work out for the archive software vendor. I think it’s worth keeping an eye on Aparavi, they seem to be taking a fresh approach to what has become a vexing problem for many.

IBM Spectrum Protect Plus – More Than Meets The Eye

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

IBM recently presented at Storage Field Day 18. You can see videos of their presentation here, and download my rough notes from here.

 

We Want A Lot From Data Protection

Data protection isn’t just about periodic protection of applications or files any more. Or, at the very least, we seem to want more than that from our data protection solutions. We want:

  • Application / data recovery – providing data availability;
  • Disaster Recovery – recovering from a minor to major data loss;
  • BCP – reducing the risk to the business, employees, market perception;
  • Application / data reuse – utilise for new routes to market; and
  • Cyber resiliency – recover the business from a compromised attack.

There’s a lot to cover there. And it could be argued that you’d need five different solutions to meet those requirements successfully. With IBM Spectrum Protect Plus (SPP) though, you’re able to meet a number of those requirements.

 

There’s Much That Can Be Done

IBM are positioning SPP as a tool that can help you extend your protection options beyond the traditional periodic data protection solution. You can use it for:

  • Data management / operational recovery – modernise and expanded use cases with instant data access, instant recovery leveraging snapshots;
  • Backup – traditional backup / recovery using streaming backups; and
  • Archive – long-term data retention / compliance, corporate governance.

 

Key Design Principles

Easy Setup

  • Deploy Anywhere: virtual appliance, cloud, bare metal;
  • Zero touch application agents;
  • Automated deployment for IBM Cloud for VMware; and
  • IBM SPP Blueprints.

The benefits of this include:

  • Easy to get started;
  • Reduced deployment costs; and
  • Hybrid and multi-cloud configurations.

Protect

  • Protect databases and applications hosted on-premises or in cloud;
  • Incremental forever using native hypervisor, database, and OS APIs; and
  • Efficient data reduction using deduplication and compression.

The benefits of this include:

  • Efficiency through reduced storage and network usage;
  • Stringent RPOs compliance with a reduced backup window; and
  • Application backup with multi-cloud portability.

Manage

  • Centralised, SLA-driven management;
  • Simple, secure RBAC based user self service; and
  • Lifecycle management of space efficient point-in-time snapshots.

The benefits of this include:

  • Lower TCO by reducing operational costs;
  • Consistent management / governance of multi-cloud environments; and
  • Secure by design with RBAC.

Recover, Reuse

  • Instant access / sandbox for DevOps and test environments;
  • Recover applications in cloud or data centre; and
  • Global file search and recovery.

The benefits of this include:

  • Improved RTO via instant access;
  • Eliminate time finding the right copy (file search across all snapshots with a globally indexed namespace);
  • Data reuse (versus backup as just an insurance policy); and
  • Improved agility; efficiently capture and use copy of production data for test.

 

One Workflow, Multiple Use Cases

There’s a lot you can with SPP, and the following diagram shows the breadth of the solution.

[image courtesy of IBM]

 

Thoughts and Further Reading

When I first encountered IBM SPP at Storage Field Day 15, I was impressed with their approach to policy-driven protection. It’s my opinion that we’re asking more and more of modern data protection solutions. We don’t just want to use them as insurance for our data and applications any more. We want to extract value from the data. We want to use the data as part of test and development workflows. And we want to manipulate the data we’re protecting in ways that have proven difficult in years gone by. It’s not just about having a secondary copy of an important file sitting somewhere safe. Nor is it just about using that data to refresh an application so we can test it with current business problems. It’s all of those things and more. This add complexity to the solution, as many people who’ve administered data protection solutions have found out over the years. To this end, IBM have worked hard with SPP to ensure that it’s a relatively simple process to get up and running, and that you can do what you need out of the box with minimal fuss.

If you’re already operating in the IBM ecosystem, a solution like SPP can make a lot of sense, as there are some excellent integration points available with other parts of the IBM portfolio. That said, there’s no reason you can’t benefit from SPP as a standalone offering. All of the normal features you’d expect in a modern data protection platform are present, and there’s good support for enhanced protection use cases, such as analytics.

Enrico had some interesting thoughts on IBM’s data protection lineup here, and Chin-Fah had a bit to say here.

Random Short Take #12

Here are a few links to some random news items and other content that I found interesting. You might find it interesting too. Maybe.

  • I’ve been a fan of Backblaze for some time now, and I find their blog posts useful. This one, entitled “A Workflow Playbook for Migrating Your Media Assets to a MAM“, was of particular interest to me.
  • Speaking of Backblaze, this article on SSDs and reliability should prove useful, particularly if you’re new to the technology. And the salty comments from various readers are great too.
  • Zerto just announced the myZerto Labs Program as a way for “IT professionals to test, understand and experiment with the IT Resilience Platform using virtual infrastructure”. You can sign up here.
  • If you’re in the area, I’m speaking at the Sydney VMUG UserCon on Tuesday 19th March. I’ll be covering how to “Build Your Personal Brand by Starting and Maintaining a Blog”. It’s more about blogging than branding, but I’m hoping there’s enough to keep the punters engaged. Details here. If you can’t get along to the event, I’ll likely publish the deck on this site in the near future.
  • The nice people at Axellio had some success at the US Air Force Pitch Day recently. You can read more about that here.
  • UltraViolet is going away. This kind of thing is disheartening (and a big reason why I persist in buying physical copies of things still).
  • I’m heading to Dell Technologies World this year. Michael was on the TV recently, talking about the journey and looking ahead. You can see more here.

Cohesity Is (Data)Locked In

Disclaimer: I recently attended Storage Field Day 18.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Cohesity recently presented at Storage Field Day 18. You can see their videos from Storage Field Day 18 here, and download a PDF copy of my rough notes from here.

 

The Cohesity Difference?

Cohesity covered a number of different topics in its presentation, and I thought I’d outline some of the Cohesity features before I jump into the meat and potatoes of my article. Some of the key things you get with Cohesity are:

  • Global space efficiency;
  • Data mobility;
  • Data resiliency & compliance;
  • Instant mass restore; and
  • Apps integration.

I’m going to cover 3 of the 5 here, and you can check the videos for details of the Cohesity MarketPlace and the Instant Mass Restore demonstration.

Global Space Efficiency

One of the big selling points for the Cohesity data platform is the ability to deliver data reduction and small file optimisation.

  • Global deduplication
    • Modes: inline, post-process
  • Archive to cloud is also deduplicated
  • Compression
    • Zstandard algorithm (read more about that here)
  • Small file optimisation
    • Better performance for reads and writes
    • Benefits from deduplication and compression

Data Mobility

There’s also an excellent story when it comes to data mobility, with the platform delivering the following data mobility features:

  • Data portability across clouds
  • Multi-cloud replication and archival (1:many)
  • Integrated indexing and search across locations

You also get simultaneous, multi-protocol access and a comprehensive set of file permissions to work with.

 

But What About Archives And Stuff?

Okay, so all of that stuff is really cool, and I could stop there and you’d probably be happy enough that Cohesity delivers the goods when it comes to a secondary storage platform that delivers a variety of features. In my opinion, though, it gets a lot more interesting when you have a look at some of the archival features that are built into the platform.

Flexible Archive Solutions

  • Archive either on-premises or to cloud;
  • Policy driven archival schedule for long term data retention
  • Data an be retrieved to the same or a different Cohesity cluster; and
  • Archived data is subject to further deduplication.

Data Resiliency and Compliance – ensures data integrity

  • Erasure coding;
  • Highly available; and
  • DataLock and legal hold.

Achieving Compliance with File-level DataLock

In my opinion, DataLock is where it gets interesting in terms of archive compliance.

  • DataLock enables WORM functionality at a file level;
  • DataLock adheres to regulatory acts;
  • Can automatically lock a file after a period of inactivity;
  • Files can be locked manually by setting file attributes;
  • Minimum and maximum retention times can be set; and
  • Cohesity provides a unique RBAC role for Data Security administration.

DataLock on Backups

  • DataLock enables WORM functionality;
  • Prevent changes by locking Snapshots;
  • Applied via backup policy; and
  • Operations performed by Data Security administrators.

 

Ransomware Detection

Cohesity also recently announced the ability to look within Helios for Ransomware. The approach taken is as follows: Prevent. Detect. Respond.

Prevent

There’s some good stuff built into the platform to help prevent ransomware in the first place, including:

  • Immutable file system
  • DataLock (WORM)
  • Multi-factor authentication

Detect

  • Machine-driven anomaly detection (backup data, unstructured data)
  • Automated alert

Respond

  • Scalable file system to store years worth of backup copies
  • Google-like global actionable search
  • Instant mass restore

 

Thoughts and Further Reading

The conversation with Cohesity got a little spirited in places at Storage Field Day 18. This isn’t unusual, as Cohesity has had some problems in the past with various folks not getting what they’re on about. Is it data protection? Is it scale-out NAS? Is it an analytics platform? There’s a lot going on here, and plenty of people (both inside and outside Cohesity) have had a chop at articulating the real value of the solution. I’m not here to tell you what it is or isn’t. I do know that a lot of the cool stuff with Cohesity wasn’t readily apparent to me until I actually had some stick time with the platform and had a chance to see some of its key features in action.

The DataLock / Security and Compliance piece is interesting to me though. I’m continually asking vendors what they’re doing in terms of archive platforms. A lot of them look at me like I’m high. Why wouldn’t you just use software to dump your old files up to the cloud or onto some cheap and deep storage in your data centre? After all, aren’t we all using software-defined data centres now? That’s certainly an option, but what happens when that data gets zapped? What if the storage platform you’re using, or the software you’re using to store the archive data, goes bad and deletes the data you’re managing with it? Features such as DataLock can help with protecting you from some really bad things happening.

I don’t believe that data protection data should be treated as an “archive” as such, although I think that data protection platform vendors such as Cohesity are well placed to deliver “archive-like” solutions for enterprises that need to retain protection data for long periods of time. I still think that pushing archive data to another, dedicated, tier is a better option than simply calling old protection data “archival”. Given Cohesity’s NAS capabilities, it makes sense that they’d be an attractive storage target for dedicated archive software solutions.

I like what Cohesity have delivered to date in terms of a platform that can be used to deliver data insights to derive value for the business. I think sometimes the message is a little muddled, but in my opinion some of that is because everyone’s looking for something different from these kinds of platforms. And these kinds of platforms can do an awful lot of things nowadays, thanks in part to some pretty smart software and some grunty hardware. You can read some more about Cohesity’s Security and Compliance story here,  and there’s a fascinating (if a little dated) report from Cohasset Associates on Cohesity’s compliance capabilities that you can access here. My good friend Keith Townsend also provided some thoughts on Cohesity that you can read here.

Veeam Vanguard 2019

I was very pleased to get an email from Rick Vanover yesterday letting me know I was accepted as part of the Veeam Vanguard Program for 2019. This is my first time as part of this program, but I’m really looking forward to participating in it. Big shout out to Dilupa Ranatunga and Anthony Spiteri for nominating me in the first place, and for Rick and the team for having me as part of the program. Also, (and I’m getting a bit parochial here) special mention of the three other Queenslanders in the program (Rhys Hammond, Nathan Oldfield, and Chris Gecks). There’s going to be a lot of cool stuff happening with Veeam and in data protection generally this year and I can’t wait to get started. More soon.

Imanis Data and MDL autoMation Case Study

Background

I’ve covered Imanis Data in the past, but am the first to admit that their focus area is not something I’m involved with on a daily basis. They recently posted a press release covering a customer success story with MDL autoMation. I had the opportunity to speak with both Peter Smails from Imanis Data, as well as Eric Gutmann from MDL autoMation. Whilst I enjoy speaking to vendors about their successes in the market, I’m even more intrigued by customer champions and what they have to say about their experience with a vendor’s offering. It’s one thing to talk about what you’ve come up with as a product, and how you think it might work well in the real world. It’s entirely another thing to have a customer take the time to speak to people on your behalf and talk about how your product works for them. Ultimately, these are usually interesting conversations, and it’s always useful for me to hear about how various technologies are applied in the real world. Note that I spoke to them separately, so Gutmann wasn’t being pushed in a certain direction by Imanis Data – he’s just really enthusiastic about the solution.

 

The Case Study

The Customer

Founded in 2006, MDL autoMation (MDL) is “one of the automotive industry’s leaders in the application of IoT and SaaS-based technologies for process improvement, automated customer recognition, vehicle tracking and monitoring, personalised customer service and sales, and inventory management”. Gutmann explained to me that for them, “every single customer is a VIP”. There’s a lot of stuff happening on the back-end to make sure that the customer’s experience is an extremely smooth one. MongoDB provides the foundation for the solution. When they first deployed the environment, they used MongoDB Cloud Manager to protect the environment, but struggled to get it to deliver the results they required.

 

Key Challenges

MDL moved to another provider, and spent approximately six months with getting it running. It worked well at the time, and met their requirements, saving them money and delivering quick backup on-premises and quick restores. There were a few issues though, including the:

  • Cost and complexity of backup and recovery for 15-node, sharded, MongoDB deployment across three data centres;
  • Time and complexity associated with daily refresh to non-sharded QA test cluster (it would take 2 days to refresh QA); and
  • Inability to use Active Directory for user access control.

 

Why Imanis Data?

So what got Gutmann and MDL excited about Imanis Data? There were a few reasons that Eric outlined for me, including:

  • 10x backup storage efficiency;
  • 26x faster QA refresh time – incremental restore;
  • 95% reduction in number policies to manage – enterprise policy engine, the number of policies to manage was reduced from 40 to 2; and
  • Native integration with Active Directory.

It was cheaper again than the previous provider, and, as Gutmann puts it “[i]t took literally hours to implement the Imanis product”. MDL are currently protecting 1.6TB of data, and it takes 7 minutes every hour to backup any changes.

 

Conclusion and Further Reading

Data protection is a problem that everyone needs to deal with at some level. Whether you have “traditional” infrastructure delivering your applications, or one of those fancy new NoSQL environments, you still need to protect your stuff. There are a lot of built-in features with MongoDB to ensure it’s resilient, but keeping the data safe is another matter. Coupled with that is the fact that developers have relied on data recovery activities to get data in to quality assurance environments for years now. Add all that together and you start to see why customers like MDL are so excited when they come across a solution that does what they need it to do.

Working in IT infrastructure (particularly operations) can be a grind at times. Something always seems to be broken or about to break. Something always seems to be going a little bit wrong. The best you can hope for at times is that you can buy products that do what you need them to do to ensure that you can produce value for the business. I think Imanis Data have a good story to tell in terms of the features they offer to protect these kinds of environments. It’s also refreshing to see a customer that is as enthusiastic as MDL is about the functionality and performance of the product, and the engagement as a whole. And as Gutmann pointed out to me, his CEO is always excited about the opportunity to save money. There’s no shame in being honest about that requirement – it’s something we all have to deal with one way or another.

Note that neither of us wanted to focus on the previous / displaced solution, as it serves no real purpose to talk about another vendor in a negative light. Just because that product didn’t do what MDL wanted it to do, doesn’t mean that that product wouldn’t suit other customers and their particular use cases. Like everything in life, you need to understand what your needs and wants are, prioritise them, and then look to find solutions that can fulfil those requirements.

Random Short Take #11

Here are a few links to some random news items and other content that I found interesting. You might find it interesting too. Maybe. Happy New Year too. I hope everyone’s feeling fresh and ready to tackle 2019.

  • I’m catching up with the good folks from Scale Computing in the next little while, but in the meantime, here’s what they got up to last year.
  • I’m a fan of the fruit company nowadays, but if I had to build a PC, this would be it (hat tip to Stephen Foskett for the link).
  • QNAP announced the TR-004 over the weekend and I had one delivered on Tuesday. It’s unusual that I have cutting edge consumer hardware in my house, so I’ll be interested to see how it goes.
  • It’s not too late to register for Cohesity’s upcoming Helios webinar. I’m looking forward to running through some demos with Jon Hildebrand and talking about how Helios helps me manage my Cohesity environment on a daily basis.
  • Chris Evans has published NVMe in the Data Centre 2.0 and I recommend checking it out.
  • I went through a basketball card phase in my teens. This article sums up my somewhat confused feelings about the card market (or lack thereof).
  • Elastifile Cloud File System is now available on the AWS Marketplace – you can read more about that here.
  • WekaIO have posted some impressive numbers over at spec.org if you’re into that kind of thing.
  • Applications are still open for vExpert 2019. If you haven’t already applied, I recommend it. The program is invaluable in terms of vendor and community engagement.

 

 

Cohesity – Helios Article and Upcoming Webinar

I’ve written about Cohesity’s Helios offering previously, and also wrote a short article on upgrading multiple clusters using Helios. I think it’s a pretty neat offering, so to that end I’ve written an article on Cohesity’s blog about some of the cool stuff you can do with Helios. I’m also privileged to be participating in a webinar in late January with Cohesity’s Jon Hildebrand. We’ll be running through some of these features from a more real-world perspective, including doing silly things like live demos. You can get further details on the webinar here.