This short, sharp piece from JB is the best. Too often I’ve found myself grinding through a TV show because I had high hopes for it, or so many people told me it was great. What I should have realised is that amateur TV critics (i.e. your friends and colleagues) are often like home theatre enthusiasts who have bought their first subwoofer. Whether it’s good or bad, that’s the choice they made, and they need you to endorse that choice so they can feel better about it as well.
Finally, the blog turned 15 years old recently (about a month ago). I’ve been so busy with the day job that I forgot to appropriately mark the occasion. But I thought we should do something. So if you’d like some stickers (I have some small ones for laptops, and some big ones because I can’t measure things properly), send me your address via this contact form and I’ll send you something as a thank you for reading along.
Jeff Geerling seems to do a lot of projects that I either can’t afford to do, or don’t have the time to do. Either way, thanks Jeff. This latest one – Building a fast all-SSD NAS (on a budget) – looked like fun.
You like ransomware? What if I told you you can have it cross-platform? Excited yet? Read Melissa’s article on Multiplatform Ransomware for a more thorough view of what’s going on out there.
Speaking of storage and clouds, Chris M. Evans recently published a series of videos over at Architecting IT where he talks to NetApp’s Matt Watt about the company’s hybrid cloud strategy. You can see it here.
I’ve spent a lot of money over the years trying to find the perfect media streaming device for home. I currently favour the Apple TV 4K, but only because my Boxee Box can’t keep up with more modern codecs. This article on the Best Device for Streaming for Any User – 2022 seems to line up well with my experiences to date, although I admit I haven’t tried the NVIDIA device yet. I do miss playing ISOs over the network with the HD Mediabox 100, but those were simpler times I guess.
In home theatre news, this article on XLR vs RCA – which cable is better? makes for good reading, particularly if you’re starting to convince yourself that you need to take things in a certain direction.
USB-C? Thunderbolt? Whatever it’s called, getting stuff to connect properly to your shiny computers with very few useful ports built-in can be a problem. This article had me giggling and crying at the same time.
Backblaze has come through with the goods again, with this article titled “How to Talk to Your Family About Backups“. I talk to my family all the time about backups (and recovery), and it drives them nuts.
I loved this article from Preston on death in the digital age. It’s a thorough exploration not only of what happens to your data when you shuffle off, but also some of the challenges associated with keeping your digital footprint around after you go.
Finally, if you’re looking at all-flash as an option for your backup infrastructure, it’s worth checking out Chris’s article here. The performance benefits (particularly with recovery) are hard to argue with, but at scale the economics may still be problematic.
The title is a little messy, but think of your unstructured data in the same way you might look at the third drawer down in your kitchen. There’s a bunch of stuff in there and no-one knows what it all does, but you know it has some value. Aparavi describes File Protect and Insight (FPI), as “[f]ile by file data protection and archive for servers, endpoints and storage devices featuring data classification, content level search, and hybrid cloud retention and versioning”. It takes the data you’re not necessarily sure about, and makes it useful. Potentially.
It comes with a range of features out of the box, including:
Policy driven workflows
Encryption (in-flight and at rest)
Data Search and Access
Anywhere / anytime file access
Seamless cloud integration
How Does It Work?
The solution is fairly simple to deploy. There’s a software appliance installed on-premises (this is known as the aggregator). There’s a web-accessible management console, and you configure your sources to be protected via network access.
[image courtesy of Aparavi]
You get the ability to mount backup data from any point in time, and you can provide a path that can be shared via the network to users to access that data. Regardless of where you end up storing the data, you leave the index on-premises, and search against the index, not the source. This saves you in terms of performance and speed. There’s also a good story to be had in terms of cloud provider compatibility. And if you’re looking to work with an on-premises / generic S3 provider, chances are high that the solution won’t have too many issues with that either.
Data protection is hard to do well at the best of times, and data management is even harder to get right. Enterprises are busy generating terabytes of data and are struggling to a) protect it successfully, and b) make use of that protected data in an intelligent fashion. It seems that it’s no longer enough to have a good story around periodic data protection – most of the vendors have proven themselves capable in this regard. What differentiates companies is the ability to make use of that protected data in new and innovative ways that can increase the value to that data to the business that’s generating it.
Companies like Aparavi are doing a pretty good job of taking the madness that is your third drawer down and providing you with some semblance of order in the chaos. This can be a real advantage in the enterprise, not only for day to day data protection activities, but also for extended retention and compliance challenges, as well as storage optimisation challenges that you may face. You still need to understand what the data is, but something like FPI can help you to declutter what that data is, making it easier to understand.
I also like some of the ransomware detection capabilities being built into the product. It’s relatively rudimentary for the moment, but keeping a close eye on the percentage of changed data is a good indicator of wether or not something is going badly wrong with the data sources you’re trying to protect. And if you find yourself the victim of a ransomware attack, the theory is that Aparavi has been storing a secondary, immutable copy of your data that you can recover from.
People want a lot of different things from their data protection solutions, and sometimes it’s easy to expect more than is reasonable from these products without really considering some of the complexity that can arise from that increased level of expectation. That said, it’s not unreasonable that your data protection vendors should be talking to you about data management challenges and deriving extra value from your secondary data. A number of people have a number of ways to do this, and not every way will be right for you. But if you’ve started noticing a data sprawl problem, or you’re looking to get a bit more from your data protection solution, particularly for unstructured data, Aparavi might be of some interest. You can read the announcement here.
According to Aparavi, Active Archive delivers “SaaS-based Intelligent, Multi-Cloud Data Management”. The idea is that:
Data is archived to cloud or on-premises based on policies for long-term lifecycle management;
Data is organised for easy access and retrieval; and
Data is accessible via Contextual Search.
Sounds pretty neat. So what’s new?
Direct-to-cloud provides the ability to archive data directly from source systems to the cloud destination of choice, with minimal local storage requirements. Instead of having to sotre archive data locally, you can now send bits of it straight to cloud, minimising your on-premises footprint.
Now supporting AWS, Backblaze B2, Caringo, Google, IBM Cloud, Microsoft Azure, Oracle Cloud, Scality, and Wasabi;
Trickle or bulk data migration – Adding bulk migration of data from one storage destination to another; and
Dynamic translation from cloud to cloud.
[image courtesy of Aparavi]
The Active Archive solution can now index, classify, and tag archived data. This makes it simple to classify data based on individual words, phrases, dates, file types, and patterns. Users can easily identify and tag data for future retrieval purposes such as compliance, reference, or analysis.
Customisable taxonomy using specific words, phrases, patterns, or meta-data
Pre-set classifications of “legal”, “confidential”, and PII
Easy to add new ad–hoc classifications at any time
Advanced Archive Search
Intuitive query interface
Search by metadata including classifications, tag, dates, file name, file type, optionally with wildcards
Search within document content using words, phrases, patterns, and complex queries
Searches across all locations
Contextual Search: produces results of the match within context
No retrieval until file is selected; no egress fees until retrieved
I was pretty enthusiastic about Aparavi when they came out of stealth, and I’m excited about some of the new features they’ve added to the solution. Data management is a hard nut to crack. Primarily because a lot of different organisations have a lot of different requirements for storing data long term. And there are a lot of different types of data that need to be stored. Aparavi isn’t a silver bullet for data management by any stretch, but it certainly seems to meet a lot of the foundational requirements for a solid archive strategy. There are some excellent options in terms of storage by location, search, and organisation.
The cool thing isn’t just that they’ve developed a solid multi-cloud story. Rather, it’s that there are options when it comes to the type of data mobility the user might require. They can choose to do bulk migrations, or take it slower by trickling data to the destination. This provides for some neat flexibility in terms of infrastructure requirements and windows of opportunity. It strikes me that it’s the sort of solution that can be tailored to work with a business’s requirements, rather than pushing it in a certain direction.
I’m also a big fan of Aparavi’s “Open Data” access approach, with an open API that “enables access to archived data for use outside of Aparavi”, along with a published data format for independent data access. It’s a nice change from platforms that feel the need to lock data into proprietary formats in order to store them long term. There’s a good chance the type of data you want to archive in the long term will be around longer than some of these archive solutions, so it’s nice to know you’ve got a chance of getting the data back if something doesn’t work out for the archive software vendor. I think it’s worth keeping an eye on Aparavi, they seem to be taking a fresh approach to what has become a vexing problem for many.
Santa Monica-based (I love that place) SaaS data protection outfit, Aparavi, recently came out of stealth, and I thought it was worthwhile covering their initial offering.
So Latin Then?
What’s an Aparavi? It’s apparently Latin and means “[t]o prepare, make ready, and equip”. The way we consume infrastructure has changed, but a lot of data protection products haven’t changed to accommodate this. Aparavi are keen to change that, and tell me that their product is “designed to work seamlessly alongside your business continuity plan to ease the burden of compliance and data protection for mid market companies”. Sounds pretty neat, so how does it work?
Aparavi uses a three tiered architecture written in Node.js and C++. It consists of:
The Aparavi hosted platform;
An on-premises software appliance; and
A source client.
[image courtesy of Aparavi]
The platform is available as a separate module if required, otherwise it’s hosted on Aparavi’s infrastructure. The software appliance is the relationship manager in the solution. It performs in-line deduplication and compression. The source client can be used as a temporary recovery location if required. AES-256 encryption is done at the source, and the metadata is also encrypted. Key storage is all handled via keyring-style encryption mechanisms. There is communication between the web platform and the appliance, but the appliance can operate when the platform is off-line if required.
There are a number of cool features of the Aparavi solution, including:
Patented point-in-time recovery – you can recover data from any combination of local and cloud storage (you don’t need the backup set to live in one place);
Cloud active data pruning – will automatically remove files, and portions of files no longer needed from cloud locations;
Multi-cloud agile retention (this is my favourite) – you can use multiple cloud locations without the need to move data from one to the other;
Open data format – open source published, with Aparavi providing a reader so data can be read by any tool; and
Multi-tier, multi-tenancy – Aparavi are very focused on delivering a multi-tier and multi-tenant environment for service providers and folks who like to scale.
Policy Engine – uses file exclusion and inclusion lists
Comprehensive Search – search by user name and appliance name as well as file name
Storage Analytics – how much you’re saving by pruning, data growth / shrinkage over time, % change monitor
Auditing and Reporting Tools
RESTful API – anything in the UI can be automated
What Does It Run On?
Aparavi runs on all Microsoft-supported Windows platforms as well as most major Linux distributions (including Ubuntu and RedHat). They use the Amazon S3 API, and support GCP and are working on OpenStack and Azure. They’ve also got some good working relationships with Cloudian and Scality, amongst others.
[image courtesy of Aparavi]
Aparavi are having a “soft launch” on October 25th. The product is licensed on the amount of source data protected. From a pricing perspective, the first TB is always free. Expect to pay US $999/year for 3TB.
Aparavi are looking to focus on the mid-market to begin with, and stressed to me that it isn’t really a tool that will replace your day to day business continuity tool. That said, they recognize that customers may end up using the tool in ways that they hadn’t anticipated.
Aparavi’s founding team of Adrian Knapp, Rod Christensen, Jonathan Calmes and Jay Hill have a whole lot of experience with data protection engines and a bunch of use cases. Speaking to Jonathan it feels like they’ve certainly thought about a lot the issues facing folks leveraging cloud for data protection. I like the open approach to storing the data, and the multi-cloud friendliness takes the story well beyond the hybrid slideware I’m accustomed to seeing from some companies.
Cloud has opened up a lot of possibilities for companies that were traditionally constrained by their own ability to deliver functional, scalable and efficient infrastructure internally. It’s since come to people’s attention that, much like the days of internal-only deployments, a whole lot of people who should know better still don’t understand what they’re doing with data protection, and there’s crap scattered everywhere. Products like Aparavi are a positive step towards taking control of data protection in fluid environments, potentially helping companies to get it together in an effective manner. I’m looking forward to diving further into the solution, and am interested to see how the industry reacts to Aparavi over the coming months.