Disaster Recovery – Do It Yourself or As-a-Service?

I don’t normally do articles off the back of someone else’s podcast episode, but I was listening to W. Curtis Preston and Prasanna Malaiyandi discuss “To DRaaS or Not to DRaaS” on The Backup Wrap-up a little while ago, and thought it was worth diving into a bit on the old weblog. In my day job I talk to many organisations about their Disaster Recovery (DR) requirements. I’ve been doing this a while, and I’ve seen the conversation change somewhat over the last few years. This article will barely skim the surface on what you need to think about, so I recommend you listen to Curtis and Prasanna’s podcast, and while you’re at it, buy Curtis’s book. And, you know, get to know your environment. And what your environment needs.

 

The Basics

As Curtis says in the podcast, “[b]ackup is hard, recovery is harder, and disaster recovery is an advanced form of recovery”. DR is rarely a matter of having another copy of your data safely stored away from wherever your primary workloads are hosted. Sure, that’s a good start, but it’s not just about getting that data back, or the servers, it’s the connectivity as well. So you have a bunch of servers running somewhere, and you’ve managed to get some / all / most of your data back. Now what? How do your users connect to that data? How do they authenticate? How long will it take to reconfigure your routing? What if your gateway doesn’t exist any more? A lot of customers think of DR in a similar fashion to the way they treat their backup and recovery approach. Sure, it’s super important that you understand how you can meet your recovery point objectives and your recovery time objectives, but there are some other things you’ll need to worry about too.

 

What’s A Disaster?

Natural

What kind of disaster are you looking to recover from? In the olden days (let’s say about 15 years ago), I was talking to clients about natural disasters in the main. What happens when the Brisbane River has a one in a hundred years flood for the second time in 5 years? Are your data centres above river level? What about your generators? What if there’s a fire? What if you live somewhere near the ocean and there’s a big old tsunami heading your way? My friends across the ditch know all about how not to build data centres on fault lines.

Accidental

Things have evolved to cover operational considerations too. I like to joke with my customers about the backhoe operator cutting through your data centre’s fibre connection, but this is usually the reason why data centres have multiple providers. And there’s alway human error to contend with. As data gets more concentrated, and organisations look to do things more efficiently (i.e. some of them are going cheap on infrastructure investments), the risk of messing up a single component can have a bigger impact than it would have previously. You can architect around some of this, for sure, but a lot of businesses are not investing in those areas and just leaving it up to luck.

Bad Hacker, Stop It

More recently, however, the conversation I’ve been having with folks across a variety of industries has been more about some kind of human malfeasance. Whether it’s bad actors (in hoodies and darkened rooms no doubt) looking to infect your environment with malware, or someone not paying attention and unleashing ransomware into your environment, we’re seeing more and more of this kind of activity having a real impact on organisations of all shapes and sizes. So not only do you need to be able to get your data back, you need to know that it’s clean and not going to re-infect your production environment.

A Real Disaster

And how pragmatic are you going to be when considering your recovery scenarios? When I first started in the industry, the company I worked for talked about their data centres being outside the blast radius (assuming someone wanted to drop a bomb in the middle of the Brisbane CBD). That’s great as far as it goes, but are all of your operational staff going to be around to start of the recovery activities? Are any of them going to be interested in trying to recovery your SQL environment if they have family and friends who’ve been impacted by some kind of earth-shattering disaster? Probably not so much. In Australia many organisations have started looking at having some workloads in Sydney, and recovery in Melbourne, or Brisbane for that matter. All cities on the Eastern seaboard. Is that enough? What if Oz gets wiped out? Should you have a copy of the data stored in Singapore? Is data sovereignty a problem for you? Will anyone care if things are going that badly for the country?

 

Can You Do It Better?

It’s not just about being able to recover your workloads either, it’s about how you can reduce the complexity and cost of that recovery. This is key to both the success of the recovery, and the likelihood of it not getting hit with relentless budget cuts. How fast do you want to be able to recover? How much data do you want to recover? To what level? I often talk to people about them shutting down their test and development environments during a DR event. Unless your whole business is built on software development, it doesn’t always make sense to have your user acceptance testing environment up and running in your DR environment.

The cool thing about public cloud is that there’s a lot more flexibility when it comes to making decisions about how much you want to run and where you want to run it. The cloud isn’t everything though. As Prasanna mentions, if the cloud can’t run all of your workloads (think mainframe and non-x86, for example), it’s pointless to try and use it as a recovery platform. But to misquote Jason Statham’s character Turkish in Snatch – “What do I know about DR?” And if you don’t really know about DR, you should think about getting someone who does involved.

 

What If You Can’t Do It?

And what if you want to DRaaS your SaaS? Chances are high that you probably can’t do much more than rely on your existing backup and recovery processes for your SaaS platforms. There’s not much chance that you can take that Salesforce data and put it somewhere else when Salesforce has a bad day. What you can do, however, is be sure that you back that Salesforce data up so that when someone accidentally lets the bad guys in you’ve got data you can recover from.

 

Thoughts

The optimal way to approach DR is an interesting problem to try and solve, and like most things in technology, there’s no real one size fits all approach that you can take. I haven’t even really touched on the pros and cons of as-a-Service offerings versus rolling your own either. Despite my love of DIY (thanks mainly to punk rock, not home improvement), I’m generally more of a fan of getting the experts to do it. They’re generally going to have more experience, and the scale, to do what needs to be done efficiently (if not always economically). That said, it’s never as simple as just offloading the responsibility to a third-party. You might have a service provider delivering a service for you, but ultimately you’ll always be accountable for your DR outcomes, even if you’re not directly responsible for the mechanics of it.

Nexsan Announces Unity NV6000

Nexsan recently announced the Nexsan Unity NV6000. I had the chance to speak to Andy Hill about it, and thought I’d share some thoughts here.

 

What Is It?

[image courtesy of Nexsan]

I’ve said it before, and I’ll say it again … in the immortal words of Silicon Valley: “It’s a box”. And a reasonably powerful one at that, coming loaded with the following specifications.

Supported Protocols SAN (Fibre Channel, iSCSI), NAS (NFS, SMB 1.0 to 3.0, FTP), Object (S3)
Disk Bays | Rack U 60 | 4U
Maximum Drives with Expansion 180
Maximum Raw Capacity (chassis | total) 1.12 PB Raw | 3.36 PB Raw
System Memory (DRAM) per controller up to 128GB
FASTier 2.5″ SSD Drives (TB) 1.92 | 3.84 | 7.68 | 15.36
3.5” 7.2K SAS Drives (TB) 4 | 6 | 8 | 10 | 12 | 14 | 16 | 18 | 20
2.5″ NVME 1DWPD SSDs (TB) N/A
Host Connectivity 16/32Gb FC | 10/25/40/100 GbE
Max CIFS | NFS File Systems 512
Data Protection: Immutable Snapshots, S3 Object-Locking, and optional Unbreakable Backup.

It’s a dual-controller platform, with each controller containing 2x Intel Xeon Silver CPUs and a 12Gb/s SAS backplane. Note that you get access to the following features included as part of the platform license:

  • Nexsan’s FASTier® Caching – Use solid-state to accelerate the performance of the underlying spinning disks
  • Nexsan Unity software version 7.0, with important enhancements to power, enterprise-class security, compliance, and ransomware protection
  • Enhanced Performance – Up to 100,000 IOPs
  • Third-Party Software Support – Windows VSS, VMware, VAAI, Commvault, Veeam Ready Repository, and more
  • Multi-Protocol Support – SAN (Fibre Channel, iSCSI), NAS (NFS, CIF, SMB1 to SMB3, FTP), Object (S3), 16/32GB FC, 10/25/40/100 GbE
  • High Availability – No single point-of-failure architecture with dual redundant storage controllers, redundant power supplies and RAID

 

Other Features

Snapshot Immutability

The snapshot immutability claim caught my eye, as immutable means a lot of things to a lot of people. Hill mentioned that the snapshot IP used on Unity was developed in-house by Nexsan and isn’t the patched together solution that some other vendors promote as an immutable solution. There are some other smarts within Unity that should give users comfort that data can’t be easily gotten at. Once you’ve set retention periods for snapshots, for example, you can’t log in to the platform and the set the date forward and have those snapshots expire. The object storage componet also supports S3 Object Lock, which is good news for punters looking to take advantage of this feature.

Unified Protocol Support

It’s in the name, and Nexsan has done a good job of incorporating a variety of storage protocols and physical access methods into the Unity platform. There’s File, Block, and Object, and support for both FC and speedy Ethernet as well. In other words, something for everyone.

Assureon Integration

One of the other features I like about the Unity is the integration with Assureon. If you’re unfamiliar with Assureon, you can check it out here. It takes storage security and compliance to another level, and is worth looking into if you have a requirement for things like regulatory compliant storage, the ability to maintain chain of custody, and fun things like that.

 

Thoughts and Further Reading

Who cares about storage arrays any more? A surprising number of people, and with good reason. Some folks still need them in the data centre. And folks are also looking for storage arrays that can do more with less. I think this is where the Nexsan offering excels, with multi-protocol and multi-transport support, along with some decent security chops and an all-inclusive licensing model, it provides for cost-effective storage (thanks to a mix of spinning rust and solid-state drives) that competes well with the solutions that have traditionally dominated the midrange market. Additionally, integration with solutions like Assureon makes this a solution that’s worth a second look, particularly if you’re in the market for object storage with a lower barrier to entry (from a cost and capacity perspective) and the ability to deal with backup data in a secure fashion.

Random Short Take #91

Squeezing this one in before the end of the year. It’s shorter than normal but we all have other things to do. Let’s get random.

  • Like the capacity and power consumption of tape but still want it on disk? Check out this coverage of the Disk Archive Corporation over at Blocks and Files.
  • This was a great series of posts on the RFC process. It doesn’t just happen by magic.
  • Jeff Geerling ran into some issues accessing media recently. It’s a stupid problem to have, and one of the reasons I’m still such a sucker for physical copies of things. I did giggle a bit when I first read the post though. These kind of issues come up frequently for folks outside the US thanks to content licensing challenges and studios just wanting us to keep paying for the same thing over and over again and not have any control over how we consume content.
  • My house was broken into recently. It’s a jarring experience at best. I never wanted to put cameras around my house, but now I have. If you do this in Queensland you can let the coppers know and they can ask for your help if there’s a crime in the area. I know it’s not very punk rock to surveil people but fuck those kids.
  • You didn’t think I’d get to 91 and not mention Dennis Rodman, did you? One of my top 5 favourite players of all time. Did everything on the court that I didn’t: played defence, grabbed rebounds, and gave many a high energy performance. So here’s some highlights on YouTube.

That’s it for this year. Stay safe, and see you in the future.

Random Short Take #90

Welcome to Random Short Take #90. I remain somewhat preoccupied with the day job and acquisitions. It’s definitely Summer here now. Let’s get random.

  • You do something for long enough, and invariably you assume that everyone else knows how to do that thing too. That’s why this article from Danny on data protection basics is so useful.
  • Speaking of data protection, Preston has a book on recovery for busy people coming soon. Read more about it here.
  • Still using a PDP-11 at home? Here’s a simple stack buffer overflow attack you can try.
  • I hate it when the machines shout at me, and so do a lot of other people it seems. JB has a nice write-up on the failure of self-service in the modern retail environment. The sooner we throw those things in the sea, the better.
  • In press release news, Hammerspace picked up an award at SC2023. One to keep an eye on.
  • In news from the day job, VMware Cloud on AWS SDDC Version 1.24 was just made generally available. You can read more about some of the new features (like Express Storage Architecture support – yay!) here. I hope to cover off some of that in more detail soon.
  • You like newsletters? Sign up for Justin’s weekly newsletter here. He does thinky stuff, and funny stuff too. It’s Justin, why would you not?
  • Speaking of newsletters, Anthony’s looking to get more subscribers to his daily newsletter, The Sizzle. To that end, he’s running a “Sizzlethon”. I know, it’s a pretty cool name. If you sign up using this link you also get a 90-day free trial. And the price of an annual subscription is very reasonable. There’s only a few days left, so get amongst it and let’s help content creators to keep creating content.

Random Short Take #89

Welcome to Random Short Take #89. I’ve been somewhat preoccupied with the day job and acquisitions. And the start of the NBA season. But Summer is almost here in the Antipodes. Let’s get random.

  • Jon Waite put out this article on how to deploy an automated Cassandra metrics cluster for VCD.
  • Chris Wahl wrote a great article on his thoughts on platform engineering as product design at scale. I’ve always found Chris to be a switched on chap, and his recent articles diving deeper into this topic have done nothing to change my mind.
  • Curtis and I have spoken about this previously, and he talks some more about the truth behind SaaS data recovery over at Gestalt IT. The only criticism I have for Curtis is that he’s just as much Mr Recovery as he is Mr Backup and he should have trademarked that too.
  • Would it be a Random Short Take without something from Chin-Fah? Probably not one worth reading. In this article he’s renovated his lab and documented the process of attaching TrueNAS iSCSI volumes to his Proxmox environment. I’m fortunate enough to not have had to do Linux iSCSI in some time, but it looks mildly easier than it used to be.
  • Press releases? Here’s one for you: Zerto research report finds companies lack a comprehensive ransomware strategy. Unlike the threat of World War 3 via nuclear strike in the eighties, ransomware is not a case of if, but when.
  • Hungry for more press releases? Datadobi is accelerating its channel momentum with StorageMAP.
  • In other PR news, Nyriad has unveiled its storage-as-a-service offering. I had a chance to speak to them recently, and they are doing some very cool stuff – worth checking out.
  • I hate all kinds of gambling, and I really hate sports gambling, and ads about it. And it drives me nuts when I see sports gambling ads in apps like NBA League Pass. So this news over at El Reg about the SBS offering consumers the chance to opt out of those kinds of ads is fantastic news. It doesn’t fix the problem, but it’s a step in the right direction.

Random Short Take #88

Welcome to Random Short Take #88. This one’s been sitting in my drafts folder for a while. Let’s get random.

Random Short Take #85

Welcome to Random Short Take #85. Let’s get random.

Random Short Take #82

Happy New Year (to those who celebrate). Let’s get random.

Random Short Take #81

Welcome to Random Short Take #81. Last one for the year, because who really wants to read this stuff over the holiday season? Let’s get random.

Take care of yourselves and each other, and I’ll hopefully see you all on the line or in person next year.

Random Short Take #79

Welcome to Random Short Take #79. Where did October go? Let’s get random.