Disaster Recovery vs Disaster Avoidance vs Data Protection

This is another one of those rambling posts that I like to write when I’m sitting in an airport lounge somewhere and I’ve got a bit of time to kill. The versus in the title is a bit misleading too, because DR and DA are both forms of data protection. And periodic data protection (PDP) is important too. But what I wanted to write about was some of the differences between DR and DA, in particular.

TL;DR – DR is not DA, and this is not PDP either. But you need to think about all of them at some point.

 

Terminology

I want to be clear about what I mean when I say these terms, because it seems like they can mean a lot of things to different folks.

  • Recovery Point Objective – The Recovery Point Objective (RPO) is the maximum amount of time in which data may have been permanently lost during an incident. You want this to be in minutes and hours, not days or weeks (ideally). RPO 0 is the idea that no data is lost when there’s a failure. A lot of vendors will talk about “Near Zero” RPOs.
  • Recovery Time Objective – The Recovery Time Objective (RTO) is the amount of time the business can be without the service, without incurring significant risks or significant losses. This is, ostensibly, how long it takes you to get back up and running after an event. You don’t really want this to be in days and weeks either.
  • Disaster Recovery – Disaster Recovery is the ability to recover applications after a major event (think flood, fire, DC is now a hole in the ground). This normally involves a failover of workloads from one DC to another in an orchestrated fashion.
  • Disaster Avoidance – Disaster avoidance “is an anticipatory strategy that is in place in order to prevent any such instance of data breach or losses. It is a defensive, proactive approach to keeping data safe” (I’m quoting this from a great blog post on the topic here)
  • Periodic Data Protection – This is the kind of data protection activity we normally associate with “backups”. It is usually a daily activity (or perhaps as frequent as hourly) and the data is normally used for ad-hoc data file recovery requests. Some people use their backup data as an archive. They’re bad people and shouldn’t be trusted. PDP is normally separate to DA or DR solutions.

 

DR Isn’t The Full Answer

I’ve had some great conversations with customers recently about adding resilience to their on-premises infrastructure. It seems like an old-fashioned concept, but a number of organisations are only now seeing the benefits of adding infrastructure-level resilience to their platforms. The first conversation usually goes something like this:

Me: So what’s your key application, and what’s your resiliency requirement?

Customer: Oh, it’s definitely Application X (usually built on Oracle or using SAP or similar). It absolutely can’t go down. Ever. We need to have RPO 0 and RTO 0 for this one. Our while business depends on it.

Me: Okay, it sounds like it’s pretty important. So what about your file server and email?

Customer: Oh, that’s not so important. We can recover those from overnight backups.

Me: But aren’t they used to store data for Application X? Don’t you have workflows that rely on email?

Customer: Oh, yeah, I guess so. But it will be too expensive to protect all of this. Can we change the RPO a bit? I don’t think the CFO will support us doing RPO 0 everywhere.

These requirements tend to change whenever we move from technical discussions to commercial discussions. In an ideal world, Martha in Accounting will have her home directory protected in a highly available fashion such that it can withstand the failure of one or more storage arrays (or data centres). The problem with this is that, if there are 1000 Marthas in the organisation, the cost of protecting that kind of data at scale becomes prohibitive, relative to the perceived value of the data. This is one of the ways I’ve seen “DR” capability added to an environment in the past. Take some older servers and put them in a site removed from the primary site, setup some scripts to copy critical data to that site, and hope nothing ever goes too wrong with the primary site.

There are obviously better ways of doing this, and common solutions may or may not involve block-level storage replication, orchestrated failover tools, and like for like compute at the secondary site (or perhaps you’ve decided to shut down test and development while you’re fixing the problem at the production site).

But what are you trying to protect against? The failure of some compute? Some storage? The network layer? A key application? All of these answers will determine the path you’ll need to go down. Keep in mind also that DR isn’t the only answer. You also need to have business continuity processes in place. A failover of workloads to a secondary site is pointless if operations staff don’t have access to a building to continue doing their work, or if people can’t work when the swipe card access machine is off-lien, or if your Internet feed only terminates in one DC, etc.

 

I’m Avoiding The Problem

Disaster Avoidance is what I like to call the really sexy resilience solution. You can have things go terribly wrong with your production workload and potentially still have it functioning like there was no problem. This is where hardware solutions like Pure Storage ActiveCluster or Dell EMC VPLEX can really shine, assuming you’ve partnered them with applications that have the smarts built in to leverage what they have to offer. Because that’s the real key to a successful disaster avoidance design. It’s great to have synchronous replication and cache-consistency across DCs, but if your applications don’t know what to do when a leg goes missing, they’ll fall over. And if you don’t have other protection mechanisms in place, such as periodic data protection, then your synchronous block replication solution will merrily synchronise malware or corrupted data from one site to another in the blink of an eye.

It’s important to understand the failure scenarios you’re protecting against too. If you’ve deployed vSphere Metro Storage Cluster, you’ll be able to run VMs even when your whole array has gone off-line (assuming you’ve set it up properly). But this won’t necessarily prevent an outage if you lose your vSphere cluster, or the whole DC. Your data will still be protected, and you’ll be in good shape in terms of recovering quickly, but there will be an outage. This is where application-level resilience can help with availability. Remember that, even if you’ve got ultra-resilient workloads protection across DCs, if your staff only have one connection into the environment, they may be left twiddling their thumbs in the event of a problem.

There’s a level of resiliency associated with this approach, and your infrastructure will certainly be able to survive the failure of a compute node, or even a bunch of disk and some compute (everything will reboot in another location). But you need to be careful not to let people think that this is something it’s not.

 

PDP, Yeah You Know Me

I mentioned problems with malware and data corruption earlier on. This is where periodic data protection solutions (such as those sold by Dell EMC, CommVault, Rubrik, Cohesity, Veeam, etc) can really get you out of a spot of bother. And if you don’t need to recover the whole VM when there’s a problem, these solutions can be a lot quicker at getting data back. The good news is that you can integrate a lot of these products with storage protection solutions and orchestration tools for a belt and braces solution to protection, and it’s not the shitshow of scripts and kludges that it was ten years ago. Hooray!

 

Final Thoughts

There’s a lot more to data protection than I’ve covered here. People like Preston have written books about the topic. And a lot of the decision making is potentially going to be out of your hands in terms of what your organisation can afford to spend (until they lose a lot of data, money (or both), then they’ll maybe change their focus). But if you do have the opportunity to work on some of these types of solutions, at least try to make sure that everyone understands exactly what they can achieve with the technologies at hand. There’s nothing worse than being hauled over the coals because some director thought they could do something amazing with infrastructure-level availability and resiliency only to have the whole thing fall over due to lack of budget. It can be a difficult conversation to have, particularly if your executives are the types of people who like to trust the folks with the fancy logos on their documents. All you can do in that case is try and be clear about what’s possible, and clear about what it will cost in time and money.

In the near future I’ll try to put together a post on various infrastructure failure scenarios and what works and what doesn’t. RPO 0 seems to be what everyone is asking for, but it may not necessarily be what everyone needs. Now please enjoy this Unfinished Business stock image.

2018 AKA The Year After 2017

I said last year that I don’t do future prediction type posts, and then I did one anyway. This year I said the same thing and then I did one around some Primary Data commentary. Clearly I don’t know what I’m doing, so here we are again. This time around, my good buddy Jason Collier (Founder at Scale Computing) had some stuff to say about hybrid cloud, and I thought I’d wade in and, ostensibly, nod my head in vigorous agreement for the most part. Firstly, though, here’s Jason’s quote:

“Throughout 2017 we have seen many organizations focus on implementing a 100% cloud focused model and there has been a push for complete adoption of the cloud. There has been a debate around on-premises and cloud, especially when it comes to security, performance and availability, with arguments both for and against. But the reality is that the pendulum stops somewhere in the middle. In 2018 and beyond, the future is all about simplifying hybrid IT. The reality is it’s not on-premises versus the cloud. It’s on-premises and the cloud. Using hyperconverged solutions to support remote and branch locations and making the edge more intelligent, in conjunction with a hybrid cloud model, organizations will be able to support highly changing application environments”.

 

The Cloud

I talk to people every day in my day job about what their cloud strategy is, and most people in enterprise environments are telling me that there are plans afoot to go all in on public cloud. No one wants to run their own data centres anymore. No one wants to own and operate their own infrastructure. I’ve been hearing this for the last five years too, and have possibly penned a few strategy documents in my time that said something similar. Whether it’s with AWS, Azure, Google or one of the smaller players, public cloud as a consumption model has a lot going for it. Unfortunately, it can be hard to get stuff working up there reliably. Why? Because no-one wants to spend time “re-factoring” their applications. As a result of this, a lot of people want to lift and shift their workloads to public cloud. This is fine in theory, but a lot of those applications are running crusty versions of Microsoft’s flagship RDBMS, or they’re using applications that are designed for low-latency, on-premises data centres, rather than being addressable over the Internet. And why is this? Because we all spent a lot of the business’s money in the late nineties and early noughties building these systems to a level of performance and resilience that we thought people wanted. Except we didn’t explain ourselves terribly well, and now the business is tired of spending all of this money on IT. And they’re tired of having to go through extensive testing cycles every time they need to do a minor upgrade. So they stop doing those upgrades, and after some time passes, you find that a bunch of key business applications are suddenly approaching end of life and in need of some serious TLC. As a result of this, those same enterprises looking to go cloud first also find themselves struggling mightily to get there. This doesn’t mean public cloud isn’t necessarily the answer, it just means that people need to think things through a bit.

 

The Edge

Another reason enterprises aren’t necessarily lifting and shifting every single workload to the cloud is the concept of data gravity. Sometimes, your applications and your data need to be close to each other. And sometimes that closeness needs to occur closest to the place you generate the data (or run the applications). Whilst I think we’re seeing a shift in the deployment of corporate workloads to off-premises data centres, there are still some applications that need everything close by. I generally see this with enterprises working with extremely large datasets (think geo-spatial stuff or perhaps media and entertainment companies) that struggle to move large amounts of the data around in a fashion that is cost effective and efficient from a time and resource perspective. There are some neat solutions to some of these requirements, such as Scale Computing’s single node deployment option for edge workloads, and X-IO Technologiesneat approach to moving data from the edge to the core. But physics is still physics.

 

The Bit In Between

So back to Jason’s comment on hybrid cloud being the way it’s really all going. I agree that it’s very much a question of public cloud and on-premises, rather than one or the other. I think the missing piece for a lot of organisations, however, doesn’t necessarily lie in any one technology or application architecture. Rather, I think the key to a successful hybrid strategy sits squarely with the capability of the organization to provide consistent governance throughout the stack. In my opinion, it’s more about people understanding the value of what their company does, and the best way to help it achieve that value, than it is about whether HCI is a better fit than traditional rackmount servers connected to fibre channel fabrics. Those considerations are important, of course, but I don’t think they have the same impact on a company’s potential success as the people and politics does. You can have some super awesome bits of technology powering your company, but if you don’t understand how you’re helping the company do business, you’ll find the technology is not as useful as you hoped it would be. You can talk all you want about hybrid (and you should, it’s a solid strategy) but if you don’t understand why you’re doing what you do, it’s not going to be as effective.

Primary Data – Seeing the Future

It’s that time of year when public relations companies send out a heap of “What’s going to happen in 2018” type press releases for us blogger types to take advantage of. I’m normally reluctant to do these “futures” based posts, as I’m notoriously bad at seeing the future (as are most people). These types of articles also invariably push the narrative in a certain direction based on whatever the vendor being represented is selling. That said I have a bit of a soft spot for Lance Smith and the team at Primary Data, so I thought I’d entertain the suggestion that I at least look at what’s on his mind. Unfortunately, scheduling difficulties meant that we couldn’t talk in person about what he’d sent through, so this article is based entirely on the paragraphs I was sent, and Lance hasn’t had the opportunity to explain himself :)

 

SDS, What Else?

Here’s what Lance had to say about software-defined storage (SDS). “Few IT professionals admit to a love of buzzwords, and one of the biggest offenders in the last few years is the term, “software-defined storage.” With marketers borrowing from the successes of “software-defined-networking”, the use of “SDS” attempts all kinds of claims. Yet the term does little to help most of us to understand what a specific SDS product can do. Despite the well-earned dislike of the phrase, true software-defined storage solutions will continue to gain traction because they try to bridge the gap between legacy infrastructure and modern storage needs. In fact, even as hardware sales declines, IDC forecasts that the SDS market will grow at a rate of 13.5% from 2017 – 2021, growing to a $16.2B market by the end of the forecast period.”

I think Lance raises an interesting point here. There’re a lot of companies claiming to deliver software-defined storage solutions in the marketplace. Some of these, however, are still heavily tied to particular hardware solutions. This isn’t always because they need the hardware to deliver functionality, but rather because the company selling the solution also sells hardware. This is fine as far as it goes, but I find myself increasingly wary of SDS solutions that are tied to a particular vendor’s interpretation of what off the shelf hardware is.

The killer feature of SDS is the idea that you can do policy-based provisioning and management of data storage in a programmatic fashion, and do this independently of the underlying hardware. Arguably, with everything offering some kind of RESTful API capability, this is the case. But I think it’s the vendors who are thinking beyond simply dishing up NFS mount points or S3-compliant buckets that will ultimately come out on top. People want to be able to run this stuff anywhere – on crappy whitebox servers and in the public cloud – and feel comfortable knowing that they’ll be able to manage their storage based on a set of business-focused rules, not a series of constraints set out by a hardware vendor. I think we’re close to seeing that with a number of solutions, but I think there’s still some way to go.

 

HCI As Silo. Discuss.

His thoughts on HCI were, in my opinion, a little more controversial. “Hyperconverged infrastructure (HCI) aims to meet data’s changing needs through automatic tiering and centralized management. HCI systems have plenty of appeal as a fast fix to pay as you grow, but in the long run, these systems represent just another larger silo for enterprises to manage. In addition, since hyperconverged systems frequently require proprietary or dedicated hardware, customer choice is limited when more compute or storage is needed. Most environments don’t require both compute and storage in equal measure, so their budget is wasted when only more CPU or more capacity is really what applications need. Most HCI architecture rely on layers of caches to ensure good storage performance.  Unfortunately, performance is not guaranteed when a set of applications running in a compute node overruns a caches capacity.  As IT begins to custom-tailor storage capabilities to real data needs with metadata management software, enterprises will begin to move away from bulk deployments of hyperconverged infrastructure and instead embrace a more strategic data management role that leverages precise storage capabilities on premises and into the cloud.”

There’re are a few nuggets in this one that I’d like to look at further. Firstly, the idea that HCI becomes just another silo to manage is an interesting one. It’s true that HCI as a technology is a bit different to the traditional compute / storage / network paradigm that we’ve been managing for the last few decades. I’m not convinced, however, that it introduces another silo of management. Or maybe, what I’m thinking is that you don’t need to let it become another silo to manage. Rather, I’ve been encouraging enterprises to look at their platform management at a higher level, focusing on the layer above the compute / storage / network to deliver automation, orchestration and management. If you build that capability into your environment, then whether you consume compute via rackmount servers, blade or HCI becomes less and less relevant. It’s easier said than done, of course, as it takes a lot of time and effort to get that layer working well. But the sweat investment is worth it.

Secondly, the notion that “[m]ost environments don’t require both compute and storage in equal measure, so their budget is wasted when only more CPU or more capacity is really what applications need” is accurate, but most HCI vendors are offering a way to expand storage or compute now without necessarily growing the other components (think Nutanix with their storage-only nodes and NetApp’s approach to HCI). I’d posit that architectures have changed enough with the HCI market leaders to the point that this is no longer a real issue.

Finally, I’m not convinced that “performance is not guaranteed when a set of applications running in a compute node overruns a caches capacity” is as much of a problem as it was a few years ago. Modern hypervisors have a lot of smarts built into them in terms of service quality and the modelling for capacity and performance sizing has improved significantly.

 

Conclusion

I like Lance, and I like what Primary Data bring to the table with their policy-based SDS solution. I don’t necessarily agree with him on some of these points (particularly as I think HCI solutions have matured a bunch in the last few years) but I do enjoy the opportunity to think about some of these ideas when I otherwise wouldn’t. So what will 2018 bring in my opinion? No idea, but it’s going to be interesting, that’s for sure.

How Soon Is Now?

This is one of those posts that is really just a loose collection of thoughts that have been bouncing around my head recently regarding software lifecycles. I’m not a software developer, merely a consumer. It’s not supported by any research and should be treated as nothing more than opinion. It should also not be used to justify certain types of behaviour. If this kind of hippy-dippy stuff isn’t for you, I’ll not be offended if you ignore this article. I also apologise for the lack of pictures in this one.

 

Picture This

I’ve been doing a lot of work recently with various enterprises using unsupported versions of software. In this particular case, the software is Windows 2003. The fact that Windows 2003 reached its end of extended support this time two years ago is neither here nor there. At least, for the purposes of this article it doesn’t matter. The problem for me isn’t the lack of support of this operating system as I don’t spend a lot of time directly involved in OS support nowadays. Rather, the problem is that vendors tell me that any software running on that platform is not supported either, as the OS may be the cause of issues I encounter and Microsoft won’t help them anymore. This is a perfectly valid position from a support point of view, as software companies are invariably very careful about ensuring the platform they run on is supported by the platform vendor. Commercially, it’s not a great look in the marketplace to be selling old stuff – it’s just not as sexy.

 

So What’s Your Point?

There’re a few things at play here that I want to explore a bit. I’ll reiterate that these are likely poorly expressed opinions at best, so please bear with me.

Stop Telling Me About Your App Store

The technology software vendors love to talk about their “app store capabilities”, particularly when it comes to cool new things like cloud. We’re all relatively happy to accept a rapid development and update cycle for our phones, and we want to pick out the services we need from a web page and deploy them quickly. Why can’t we do that with enterprise software? Well, you can up to a point. But there’s a metric shit tonne of work that needs to be done organisationally before most shops are really ready to leverage that capability. There, I’ve said it. I don’t think you can right the words Agile and DevOps in your proposals and magically be at that point. I’m not saying that there’s no value in these movements – I think they’re definitely valuable – but I still maintain there’s work to be done. As an aside, go and read The Phoenix Project. I don’t care if you’re in ops or not, just read it. It’s very cheap on Kindle. No I don’t get a cut.

What If It Breaks?

Enterprises don’t like to update their software platforms because they are inordinately afraid that something will break. To the average neckbeard, this is no big deal. We’ll reach for the backups (hehe), roll back the change and try to work out what happened so that it doesn’t happen again. But in enterprises, they aren’t the ones making the decisions. Their neck isn’t on the block if something goes wrong. It’s some middle manager you’ve never heard of in charge of a particular division within the company whose sole purpose is to support whatever business function this particular bit of software services. And the last guy who really understood anything about this critical software left the company seven years ago. And it was a bit of off the shelf software that was heavily customised and lightly documented. And so people have been clinging to this working version of the software on a particularly crusty platform for a very long time. And they are so very scared that your upgraded platform, besides causing them a lot of testing work, will break things in the environment that no one understands (and fewer still will be able to fix). I’ve worked in these environments a lot during the past 15 – 20 years. At times I’ve considered finding new employment rather than be the bunny pushing the buttons on the upgrade of Widget X to Widget X v2 for fear that something spectacularly bad happens. You think I’m exaggerating? There’s a whole consulting industry built around this crap.

But You Said This Was The Best Ever Version

When I lived in customer land, I had any number of vendors tell me about their latest versions of the their products, explaining, somewhat breathlessly, just how good this particular version was. And how much better than the old version it was. And how I should upgrade before my current support runs out. I have this conversation frequently with customers:

Me: “Version 7 of Super Software is coming to end of support life, you’ll need to upgrade to Version 7.8”

Customer: “But what’s changed that Version 7 won’t do what I need it to do anymore?”

Me: “Nothing. But we won’t support it because the platform is no longer supported”

Customer “…”

I know there are reasons, like end of support for operating systems, that mean that it just doesn’t make sense, fiscally speaking, to keep supporting old version of products. I also understand that customers are usually given plenty of notice that their favourite version of something is coming up to end of support. I still feel that we’re a little too focused on fast development of software (and improvements, of course), without always considering just how clunky some organisations are (and how difficult it can be to get the right resources in place to upgrade line of business applications). Granted, there are plenty of places who deal just fine with rapid release cycles, but large enterprises do not. And what is it that one day suddenly stops a bit of software from working? If my version goes end of support tomorrow, what changes from a technical perspective? Nothing, right? Yes and no. Nothing has changed with the version you’re running, but chances are you’re two major revisions behind the current one. I bet there’ve been a bunch of new features (some of which might be useful to you) introduced since that version came out. You can also guarantee that you’ll be in something of a bad way when new security flaws are discovered either in your old software or the old platform, because the vendors won’t be rushing to help you. It will be “best effort” if you’re lucky.

 

But You Don’t Understand My Business

It may be startling for some in the tech community to discover, but 99% of companies in the world are not focused (primarily) on matters of an IT nature. It doesn’t matter that major vendors get up on stage at their conferences and talk about how every company is an IT company. The simple fact is that most companies still treat IT as an expense, not an enabler. When vendors come along and decide that the software they told you was awesome two years ago is now terrible and you should really burn it with fire, you’re generally not going to be impressed. Because it’s possible that you’re going to have to pay to upgrade that software. And it’s very likely it’s going to cost you in terms of effort to get the software upgraded. But if your business is focused on putting beer in bottles and the current version of software is doing that for you, why should you change? On the flip side of this, software companies have demonstrated over time that it’s very hard to generate consistent revenue from net new customers. You need to keep the current ones upgrading (and paying) regularly as well. It has also been explained to me (as both a customer and integrator) that software companies are not charities. So there you go.

 

What’s The Answer Then, Smarty?

No idea. Enterprise IT is hard. It always has been. It may not be in the future. But it is right now. And software companies are still doing what software companies have always done, for good and bad reasons. I really just wanted to put some thoughts down on paper that reflected my feeling that enterprise IT is hard. And we shouldn’t always criticise people just because they’re not running the latest iteration of whatever we’re selling them.

Okay, fine. The answer is to try and keep within support where you can. And minimise your exposure in the places where you can’t. Is that what you wanted to hear? I thought so.

Enterprise IT is hard.

Faith-based Computing – Just Don’t

I’d like to be very clear up front that this post isn’t intended as a swipe at people with faith. I have faith. Really, it’s a swipe at people who can’t use the tools available to them.

 

The Problem

I get cranky when IT decisions are based on feelings rather than data. As an example, I’ve been talking to someone recently about who has outsourced support of their IT to a third party. However, they’re struggling immensely with their inability to trust someone else looking after their infrastructure. I asked them why it was a problem. They told me they didn’t think the other party could do it as well as they did. I asked for evidence of this assertion. There was none forthcoming. Rather, they just didn’t feel that the other party could do the job.

 

The Data

In IT organisations / operations there’s a lot of data available. You can get uptime statistics, performance statistics, measurements of performance against time allocated for case resolution, all kinds of stuff. And you can get it not only from your internal IT department, but also from your vendors, and across most technology in the stack from the backend to the client-facing endpoint. Everyone’s into data nowadays, and everyone wants to show you theirs. So what I don’t understand is why some people insist on ignoring the data at hand, making decisions based solely on “feelings” rather than the empirical evidence laid out in front of them.

 

What’s Happening?

I call this focus on instinct “faith-based computing”. It’s similar to faith-based medicine. While I’m a believer, I’m also a great advocate of going to my doctor when I’m suffering from an ailment. Pray for my speedy recovery by all means, but don’t stop me from talking to people of science. Faith-based computing is the idea that you can make significant decisions regarding IT based on instinct rather than the data in front of you. I’m not suggesting that in life there aren’t opportunities for instinct to play a bigger part in how you do things rather than scientific data, but IT has technology in the name. Technology is a science, not a pseudo-science like numerology. Sure, I agree there’re a bunch of factors that influence our decision-making, including education, cultural background, shiny things, all kinds of stuff.

 

Conclusion

I come across organisations on a daily basis operating without making good use of the data in front of them. This arguably keeps me in business as a consultant, but doesn’t necessarily make it fun for you. Use the metrics at hand. If you must make a decision based on instinct or non-technical data, at least be sure that you’ve evaluated the available data. Don’t just dismiss things out of hand because you don’t feel like it’s right.

2017 – The New What Next

I’m not terribly good at predicting the future, particularly when it comes to technology trends. I generally prefer to leave that kind of punditry to journalists who don’t mind putting it out there and are happy to be proven wrong on the internet time and again. So why do a post referencing a great Hot Water Music album? Well, one of the PR companies I deal with regularly sent me a few quotes through from companies that I’m generally interested in talking about. And let’s face it, I haven’t had a lot to say in the last little while due to day job commitments and the general malaise I seem to suffer from during the onset of summer in Brisbane (no, I really don’t understand the concept of Christmas sweaters in the same way my friends in the Northern Hemisphere do).

Long intro for a short post? Yes. So I’ll get to the point. Here’s one of the quotes I was sent. “As concerns of downtime grow more acute in companies around the globe – and the funds for secondary data centers shrink – companies will be turning to DRaaS. While it’s been readily available for years, the true apex of adoption will hit in 2017-2018, as prices continue to drop and organizations become more risk-averse. There are exceptional technologies out there that can solve the business continuity problem for very little money in a very short time.” This was from Justin Giardina, CTO of iland. I was fortunate enough to meet Justin at the Nimble Storage Predictive Flash launch event in February this year. Justin is a switched on guy and while I don’t want to give his company too much air time (they compete in places with my employer), I think he’s bang on the money with his assessment of the state of play with DR and market appetite for DR as a Service.

I think there are a few things at play here, and it’s not all about technology (because it rarely is). The CxO’s fascination with cloud has been (rightly or wrongly) fiscally focused, with a lot of my customers thinking that public cloud could really help reduce their operating costs. I don’t want to go too much into the accuracy of that idea, but I know that cost has been front and centre for a number of customers for some time now. Five years ago I was working in a conservative environment where we had two production DCs and a third site dedicated to data protection infrastructure. They’ve since reduced that to one production site and are leveraging outsourced providers for both DR and data protection capabilities. The workload hasn’t changed significantly, nor has the requirement to have the data protected and recoverable.

Rightly or wrongly the argument for appropriate disaster recovery infrastructure seems to be a difficult one to make in organisations, even those that have been exposed to disaster and have (through sheer dumb luck) survived the ordeal. I don’t know why it is so difficult for people to understand that good DR and data protection is worth it. I suppose it is the same as me taking a calculated risk on my insurance every year and paying a lower annual rate and gambling on the fact that I won’t have to make a claim and be exposed to higher premiums.

It’s not just about cost though. I’ve spoken to plenty of people who just don’t know what they’re doing when it comes to DR and data protection. And some of these people have been put in the tough position of having lost some data, or had a heck of a time recovering after a significant equipment failure. In the same way that I have a someone come and look at my pool pump when water is coming out of the wrong bit, these companies are keen to get people in who know what they’re doing. If you think about it, it’s a smart move. While it can be hard to admit, sometimes knowing your limitations is actually a good thing.

It’s not that we don’t have the technology, or the facilities (even in BrisVegas) to do DR and data protection pretty well nowadays. In most cases it’s easier and more reliable than it ever was. But, like on-premises email services, it seems to be a service that people are happy to make someone else’s problem. I don’t have an issue with that as a concept, as long as you understand that you’re only outsourcing some technology and processes, you’re not magically doing away with the risk and result when something goes pear-shaped. If you’re a small business without a dedicated team of people to look after your stuff, it makes a lot of sense. Even the bigger players can benefit from making it someone else’s thing to worry about it. Just make sure you know what you’re getting into.

Getting back to the original premise of this post, I agree with Justin that we’re at a tipping point regarding DRaaS adoption, and I think 2017 is going to be really interesting in terms of how companies make use of this technology to protect their assets and keep costs under control.

Pure Storage – “Architecture Matters”

I received my Xtremio Upgrade Survival Kit from Pure Storage last week and wanted to just provide a little of commentary on it. I know it’s “old news” now, but it’s been on my mind for a while and the gift pack prompted me to burst into print.

IMG_9698

Firstly, it was interesting to see the blogosphere light up when news broke that the upgrade from 2.4 to 3 was destructive. You can read a few of the posts from Nigel here, Chris here and Enrico here. Chad tried to defend the position with a typically insightful (and when you’re a VP with the vendor you hope it’s insightful) post that defended a number of decisions that got them to that point and was basically a mea culpa combined with a broader discussion around architecture. The vendors didn’t miss their chance either, with Vaughn having his say here and an interesting post by Calvin that you can read here.

But the post that I think put everything in perspective was Stephen’s. Yes, it’s all technically a bit of a mess. But we’ve been conditioned for so long to read between the lines of vendor glossies and not believe that anything is ever really non-disruptive. Every NDU carries a risk that something will go pear-shaped, and we prepare for it. Most people have had an upgrade go wrong before, particularly if your job has been enterprise storage field upgrades for the last 5 – 10 years. It’s never pretty, it’s never fun, but nowadays we’re generally prepared for it.

While I enjoy the generally ballsy marketing from Pure Storage for calling out EMC on this problem, I think that ultimately we (partners, customers) are probably all not that fussed about it really. Not that I think it’s good that we’re still having these problems. Architecture does matter. But sometimes things get stuffed up.

As an aside though, how good would it be if you worked in an environment where all you needed to do was fill out a paper slip to do a change?