Disaster Recovery vs Disaster Avoidance vs Data Protection

This is another one of those rambling posts that I like to write when I’m sitting in an airport lounge somewhere and I’ve got a bit of time to kill. The versus in the title is a bit misleading too, because DR and DA are both forms of data protection. And periodic data protection (PDP) is important too. But what I wanted to write about was some of the differences between DR and DA, in particular.

TL;DR – DR is not DA, and this is not PDP either. But you need to think about all of them at some point.

 

Terminology

I want to be clear about what I mean when I say these terms, because it seems like they can mean a lot of things to different folks.

  • Recovery Point Objective – The Recovery Point Objective (RPO) is the maximum amount of time in which data may have been permanently lost during an incident. You want this to be in minutes and hours, not days or weeks (ideally). RPO 0 is the idea that no data is lost when there’s a failure. A lot of vendors will talk about “Near Zero” RPOs.
  • Recovery Time Objective – The Recovery Time Objective (RTO) is the amount of time the business can be without the service, without incurring significant risks or significant losses. This is, ostensibly, how long it takes you to get back up and running after an event. You don’t really want this to be in days and weeks either.
  • Disaster Recovery – Disaster Recovery is the ability to recover applications after a major event (think flood, fire, DC is now a hole in the ground). This normally involves a failover of workloads from one DC to another in an orchestrated fashion.
  • Disaster Avoidance – Disaster avoidance “is an anticipatory strategy that is in place in order to prevent any such instance of data breach or losses. It is a defensive, proactive approach to keeping data safe” (I’m quoting this from a great blog post on the topic here)
  • Periodic Data Protection – This is the kind of data protection activity we normally associate with “backups”. It is usually a daily activity (or perhaps as frequent as hourly) and the data is normally used for ad-hoc data file recovery requests. Some people use their backup data as an archive. They’re bad people and shouldn’t be trusted. PDP is normally separate to DA or DR solutions.

 

DR Isn’t The Full Answer

I’ve had some great conversations with customers recently about adding resilience to their on-premises infrastructure. It seems like an old-fashioned concept, but a number of organisations are only now seeing the benefits of adding infrastructure-level resilience to their platforms. The first conversation usually goes something like this:

Me: So what’s your key application, and what’s your resiliency requirement?

Customer: Oh, it’s definitely Application X (usually built on Oracle or using SAP or similar). It absolutely can’t go down. Ever. We need to have RPO 0 and RTO 0 for this one. Our while business depends on it.

Me: Okay, it sounds like it’s pretty important. So what about your file server and email?

Customer: Oh, that’s not so important. We can recover those from overnight backups.

Me: But aren’t they used to store data for Application X? Don’t you have workflows that rely on email?

Customer: Oh, yeah, I guess so. But it will be too expensive to protect all of this. Can we change the RPO a bit? I don’t think the CFO will support us doing RPO 0 everywhere.

These requirements tend to change whenever we move from technical discussions to commercial discussions. In an ideal world, Martha in Accounting will have her home directory protected in a highly available fashion such that it can withstand the failure of one or more storage arrays (or data centres). The problem with this is that, if there are 1000 Marthas in the organisation, the cost of protecting that kind of data at scale becomes prohibitive, relative to the perceived value of the data. This is one of the ways I’ve seen “DR” capability added to an environment in the past. Take some older servers and put them in a site removed from the primary site, setup some scripts to copy critical data to that site, and hope nothing ever goes too wrong with the primary site.

There are obviously better ways of doing this, and common solutions may or may not involve block-level storage replication, orchestrated failover tools, and like for like compute at the secondary site (or perhaps you’ve decided to shut down test and development while you’re fixing the problem at the production site).

But what are you trying to protect against? The failure of some compute? Some storage? The network layer? A key application? All of these answers will determine the path you’ll need to go down. Keep in mind also that DR isn’t the only answer. You also need to have business continuity processes in place. A failover of workloads to a secondary site is pointless if operations staff don’t have access to a building to continue doing their work, or if people can’t work when the swipe card access machine is off-lien, or if your Internet feed only terminates in one DC, etc.

 

I’m Avoiding The Problem

Disaster Avoidance is what I like to call the really sexy resilience solution. You can have things go terribly wrong with your production workload and potentially still have it functioning like there was no problem. This is where hardware solutions like Pure Storage ActiveCluster or Dell EMC VPLEX can really shine, assuming you’ve partnered them with applications that have the smarts built in to leverage what they have to offer. Because that’s the real key to a successful disaster avoidance design. It’s great to have synchronous replication and cache-consistency across DCs, but if your applications don’t know what to do when a leg goes missing, they’ll fall over. And if you don’t have other protection mechanisms in place, such as periodic data protection, then your synchronous block replication solution will merrily synchronise malware or corrupted data from one site to another in the blink of an eye.

It’s important to understand the failure scenarios you’re protecting against too. If you’ve deployed vSphere Metro Storage Cluster, you’ll be able to run VMs even when your whole array has gone off-line (assuming you’ve set it up properly). But this won’t necessarily prevent an outage if you lose your vSphere cluster, or the whole DC. Your data will still be protected, and you’ll be in good shape in terms of recovering quickly, but there will be an outage. This is where application-level resilience can help with availability. Remember that, even if you’ve got ultra-resilient workloads protection across DCs, if your staff only have one connection into the environment, they may be left twiddling their thumbs in the event of a problem.

There’s a level of resiliency associated with this approach, and your infrastructure will certainly be able to survive the failure of a compute node, or even a bunch of disk and some compute (everything will reboot in another location). But you need to be careful not to let people think that this is something it’s not.

 

PDP, Yeah You Know Me

I mentioned problems with malware and data corruption earlier on. This is where periodic data protection solutions (such as those sold by Dell EMC, CommVault, Rubrik, Cohesity, Veeam, etc) can really get you out of a spot of bother. And if you don’t need to recover the whole VM when there’s a problem, these solutions can be a lot quicker at getting data back. The good news is that you can integrate a lot of these products with storage protection solutions and orchestration tools for a belt and braces solution to protection, and it’s not the shitshow of scripts and kludges that it was ten years ago. Hooray!

 

Final Thoughts

There’s a lot more to data protection than I’ve covered here. People like Preston have written books about the topic. And a lot of the decision making is potentially going to be out of your hands in terms of what your organisation can afford to spend (until they lose a lot of data, money (or both), then they’ll maybe change their focus). But if you do have the opportunity to work on some of these types of solutions, at least try to make sure that everyone understands exactly what they can achieve with the technologies at hand. There’s nothing worse than being hauled over the coals because some director thought they could do something amazing with infrastructure-level availability and resiliency only to have the whole thing fall over due to lack of budget. It can be a difficult conversation to have, particularly if your executives are the types of people who like to trust the folks with the fancy logos on their documents. All you can do in that case is try and be clear about what’s possible, and clear about what it will cost in time and money.

In the near future I’ll try to put together a post on various infrastructure failure scenarios and what works and what doesn’t. RPO 0 seems to be what everyone is asking for, but it may not necessarily be what everyone needs. Now please enjoy this Unfinished Business stock image.

Cloudistics, Choice and Private Cloud

I’ve had my eye on Cloudistics for a little while now.  They published an interesting post recently on virtualisation and private cloud. It makes for an interesting read, and I thought I’d comment briefly and post this article if for no other reason than you can find your way to the post and check it out.

TL;DR – I’m rambling a bit, but it’s not about X versus Y, it’s more about getting your people and processes right.

 

Cloud, Schmoud

There are a bunch of different reasons why you’d want to adopt a cloud operating model, be it public, private or hybrid. These include the ability to take advantage of:

  • On-demand service;
  • Broad network access;
  • Resource pooling;
  • Rapid elasticity; and
  • Measured service, or pay-per-use.

Some of these aspects of cloud can be more useful to enterprises than others, depending in large part on where they are in their journey (I hate calling it that). The thing to keep in mind is that cloud is really just a way of doing things slightly differently to improve deficiencies in areas that are normally not tied to one particular piece of technology. What I mean by that is that cloud is a way of dealing with some of the issues that you’ve probably seen in your IT organisation. These include:

  • Poor planning;
  • Complicated network security models;
  • Lack of communication between IT and the business;
  • Applications that don’t scale; and
  • Lack of capacity planning.

Operating Expenditure

These are all difficult problems to solve, primarily because people running IT organisations need to be thinking not just about technology problems, but also people and business problems. And solving those problems takes resources, something that’s often in short supply. Coupled with the fact that many businesses feel like they’ve been handing out too much money to their IT organisations for years and you start to understand why many enterprises are struggling to adapt to new ways of doing things. One thing that public cloud does give you is a way to consume resources via OpEx rather than CapEx. The benefit here is that you’re only consuming what you need, and not paying for the whole thing to be built out on the off chance you’ll use it all over the five year life of the infrastructure. Private cloud can still provide this kind of benefit to the business via “showback” mechanisms that can really highlight the cost of infrastructure being consumed by internal business units. Everyone has complained at one time or another about the Finance group having 27 test environments, now they can let the executives know just what that actually costs.

Are You Really Cloud Native?

Another issue with moving to cloud is that a lot of enterprises are still looking to leverage Infrastructure-as-a-Service (IaaS) as an extension of on-premises capabilities rather than using cloud-native technologies. If you’ve gone with lift and shift (or “move and improve“) you’ve potentially just jammed a bunch of the same problems you had on-premises in someone else’s data centre. The good thing about moving to a cloud operating model (even if it’s private) is that you’ll get people (hopefully) used to consuming services from a catalogue, and taking responsibility for how much their footprint occupies. But if your idea of transformation is running SQL 2005 on Windows Server 2003 deployed from VMware vRA then I think you’ve got a bit of work to do.

 

Conclusion

As Cloudistics point out in their article, it isn’t really a conversation about virtualisation versus private cloud, as virtualisation (in my mind at least) is the platform that makes a lot of what we do nowadays with private cloud possible. What is more interesting is the private versus public debate. But even that one is no longer as clear cut as vendors would like you to believe. If a number of influential analysts are right, most of the world has started to realise that it’s all about a hybrid approach to cloud. The key benefits of adopting a new way of doing things are more about fixing up the boring stuff, like process. If you think you get your house in order simply by replacing the technology that underpins it then you’re in for a tough time.

2018 AKA The Year After 2017

I said last year that I don’t do future prediction type posts, and then I did one anyway. This year I said the same thing and then I did one around some Primary Data commentary. Clearly I don’t know what I’m doing, so here we are again. This time around, my good buddy Jason Collier (Founder at Scale Computing) had some stuff to say about hybrid cloud, and I thought I’d wade in and, ostensibly, nod my head in vigorous agreement for the most part. Firstly, though, here’s Jason’s quote:

“Throughout 2017 we have seen many organizations focus on implementing a 100% cloud focused model and there has been a push for complete adoption of the cloud. There has been a debate around on-premises and cloud, especially when it comes to security, performance and availability, with arguments both for and against. But the reality is that the pendulum stops somewhere in the middle. In 2018 and beyond, the future is all about simplifying hybrid IT. The reality is it’s not on-premises versus the cloud. It’s on-premises and the cloud. Using hyperconverged solutions to support remote and branch locations and making the edge more intelligent, in conjunction with a hybrid cloud model, organizations will be able to support highly changing application environments”.

 

The Cloud

I talk to people every day in my day job about what their cloud strategy is, and most people in enterprise environments are telling me that there are plans afoot to go all in on public cloud. No one wants to run their own data centres anymore. No one wants to own and operate their own infrastructure. I’ve been hearing this for the last five years too, and have possibly penned a few strategy documents in my time that said something similar. Whether it’s with AWS, Azure, Google or one of the smaller players, public cloud as a consumption model has a lot going for it. Unfortunately, it can be hard to get stuff working up there reliably. Why? Because no-one wants to spend time “re-factoring” their applications. As a result of this, a lot of people want to lift and shift their workloads to public cloud. This is fine in theory, but a lot of those applications are running crusty versions of Microsoft’s flagship RDBMS, or they’re using applications that are designed for low-latency, on-premises data centres, rather than being addressable over the Internet. And why is this? Because we all spent a lot of the business’s money in the late nineties and early noughties building these systems to a level of performance and resilience that we thought people wanted. Except we didn’t explain ourselves terribly well, and now the business is tired of spending all of this money on IT. And they’re tired of having to go through extensive testing cycles every time they need to do a minor upgrade. So they stop doing those upgrades, and after some time passes, you find that a bunch of key business applications are suddenly approaching end of life and in need of some serious TLC. As a result of this, those same enterprises looking to go cloud first also find themselves struggling mightily to get there. This doesn’t mean public cloud isn’t necessarily the answer, it just means that people need to think things through a bit.

 

The Edge

Another reason enterprises aren’t necessarily lifting and shifting every single workload to the cloud is the concept of data gravity. Sometimes, your applications and your data need to be close to each other. And sometimes that closeness needs to occur closest to the place you generate the data (or run the applications). Whilst I think we’re seeing a shift in the deployment of corporate workloads to off-premises data centres, there are still some applications that need everything close by. I generally see this with enterprises working with extremely large datasets (think geo-spatial stuff or perhaps media and entertainment companies) that struggle to move large amounts of the data around in a fashion that is cost effective and efficient from a time and resource perspective. There are some neat solutions to some of these requirements, such as Scale Computing’s single node deployment option for edge workloads, and X-IO Technologiesneat approach to moving data from the edge to the core. But physics is still physics.

 

The Bit In Between

So back to Jason’s comment on hybrid cloud being the way it’s really all going. I agree that it’s very much a question of public cloud and on-premises, rather than one or the other. I think the missing piece for a lot of organisations, however, doesn’t necessarily lie in any one technology or application architecture. Rather, I think the key to a successful hybrid strategy sits squarely with the capability of the organization to provide consistent governance throughout the stack. In my opinion, it’s more about people understanding the value of what their company does, and the best way to help it achieve that value, than it is about whether HCI is a better fit than traditional rackmount servers connected to fibre channel fabrics. Those considerations are important, of course, but I don’t think they have the same impact on a company’s potential success as the people and politics does. You can have some super awesome bits of technology powering your company, but if you don’t understand how you’re helping the company do business, you’ll find the technology is not as useful as you hoped it would be. You can talk all you want about hybrid (and you should, it’s a solid strategy) but if you don’t understand why you’re doing what you do, it’s not going to be as effective.

Primary Data – Seeing the Future

It’s that time of year when public relations companies send out a heap of “What’s going to happen in 2018” type press releases for us blogger types to take advantage of. I’m normally reluctant to do these “futures” based posts, as I’m notoriously bad at seeing the future (as are most people). These types of articles also invariably push the narrative in a certain direction based on whatever the vendor being represented is selling. That said I have a bit of a soft spot for Lance Smith and the team at Primary Data, so I thought I’d entertain the suggestion that I at least look at what’s on his mind. Unfortunately, scheduling difficulties meant that we couldn’t talk in person about what he’d sent through, so this article is based entirely on the paragraphs I was sent, and Lance hasn’t had the opportunity to explain himself :)

 

SDS, What Else?

Here’s what Lance had to say about software-defined storage (SDS). “Few IT professionals admit to a love of buzzwords, and one of the biggest offenders in the last few years is the term, “software-defined storage.” With marketers borrowing from the successes of “software-defined-networking”, the use of “SDS” attempts all kinds of claims. Yet the term does little to help most of us to understand what a specific SDS product can do. Despite the well-earned dislike of the phrase, true software-defined storage solutions will continue to gain traction because they try to bridge the gap between legacy infrastructure and modern storage needs. In fact, even as hardware sales declines, IDC forecasts that the SDS market will grow at a rate of 13.5% from 2017 – 2021, growing to a $16.2B market by the end of the forecast period.”

I think Lance raises an interesting point here. There’re a lot of companies claiming to deliver software-defined storage solutions in the marketplace. Some of these, however, are still heavily tied to particular hardware solutions. This isn’t always because they need the hardware to deliver functionality, but rather because the company selling the solution also sells hardware. This is fine as far as it goes, but I find myself increasingly wary of SDS solutions that are tied to a particular vendor’s interpretation of what off the shelf hardware is.

The killer feature of SDS is the idea that you can do policy-based provisioning and management of data storage in a programmatic fashion, and do this independently of the underlying hardware. Arguably, with everything offering some kind of RESTful API capability, this is the case. But I think it’s the vendors who are thinking beyond simply dishing up NFS mount points or S3-compliant buckets that will ultimately come out on top. People want to be able to run this stuff anywhere – on crappy whitebox servers and in the public cloud – and feel comfortable knowing that they’ll be able to manage their storage based on a set of business-focused rules, not a series of constraints set out by a hardware vendor. I think we’re close to seeing that with a number of solutions, but I think there’s still some way to go.

 

HCI As Silo. Discuss.

His thoughts on HCI were, in my opinion, a little more controversial. “Hyperconverged infrastructure (HCI) aims to meet data’s changing needs through automatic tiering and centralized management. HCI systems have plenty of appeal as a fast fix to pay as you grow, but in the long run, these systems represent just another larger silo for enterprises to manage. In addition, since hyperconverged systems frequently require proprietary or dedicated hardware, customer choice is limited when more compute or storage is needed. Most environments don’t require both compute and storage in equal measure, so their budget is wasted when only more CPU or more capacity is really what applications need. Most HCI architecture rely on layers of caches to ensure good storage performance.  Unfortunately, performance is not guaranteed when a set of applications running in a compute node overruns a caches capacity.  As IT begins to custom-tailor storage capabilities to real data needs with metadata management software, enterprises will begin to move away from bulk deployments of hyperconverged infrastructure and instead embrace a more strategic data management role that leverages precise storage capabilities on premises and into the cloud.”

There’re are a few nuggets in this one that I’d like to look at further. Firstly, the idea that HCI becomes just another silo to manage is an interesting one. It’s true that HCI as a technology is a bit different to the traditional compute / storage / network paradigm that we’ve been managing for the last few decades. I’m not convinced, however, that it introduces another silo of management. Or maybe, what I’m thinking is that you don’t need to let it become another silo to manage. Rather, I’ve been encouraging enterprises to look at their platform management at a higher level, focusing on the layer above the compute / storage / network to deliver automation, orchestration and management. If you build that capability into your environment, then whether you consume compute via rackmount servers, blade or HCI becomes less and less relevant. It’s easier said than done, of course, as it takes a lot of time and effort to get that layer working well. But the sweat investment is worth it.

Secondly, the notion that “[m]ost environments don’t require both compute and storage in equal measure, so their budget is wasted when only more CPU or more capacity is really what applications need” is accurate, but most HCI vendors are offering a way to expand storage or compute now without necessarily growing the other components (think Nutanix with their storage-only nodes and NetApp’s approach to HCI). I’d posit that architectures have changed enough with the HCI market leaders to the point that this is no longer a real issue.

Finally, I’m not convinced that “performance is not guaranteed when a set of applications running in a compute node overruns a caches capacity” is as much of a problem as it was a few years ago. Modern hypervisors have a lot of smarts built into them in terms of service quality and the modelling for capacity and performance sizing has improved significantly.

 

Conclusion

I like Lance, and I like what Primary Data bring to the table with their policy-based SDS solution. I don’t necessarily agree with him on some of these points (particularly as I think HCI solutions have matured a bunch in the last few years) but I do enjoy the opportunity to think about some of these ideas when I otherwise wouldn’t. So what will 2018 bring in my opinion? No idea, but it’s going to be interesting, that’s for sure.

How Soon Is Now?

This is one of those posts that is really just a loose collection of thoughts that have been bouncing around my head recently regarding software lifecycles. I’m not a software developer, merely a consumer. It’s not supported by any research and should be treated as nothing more than opinion. It should also not be used to justify certain types of behaviour. If this kind of hippy-dippy stuff isn’t for you, I’ll not be offended if you ignore this article. I also apologise for the lack of pictures in this one.

 

Picture This

I’ve been doing a lot of work recently with various enterprises using unsupported versions of software. In this particular case, the software is Windows 2003. The fact that Windows 2003 reached its end of extended support this time two years ago is neither here nor there. At least, for the purposes of this article it doesn’t matter. The problem for me isn’t the lack of support of this operating system as I don’t spend a lot of time directly involved in OS support nowadays. Rather, the problem is that vendors tell me that any software running on that platform is not supported either, as the OS may be the cause of issues I encounter and Microsoft won’t help them anymore. This is a perfectly valid position from a support point of view, as software companies are invariably very careful about ensuring the platform they run on is supported by the platform vendor. Commercially, it’s not a great look in the marketplace to be selling old stuff – it’s just not as sexy.

 

So What’s Your Point?

There’re a few things at play here that I want to explore a bit. I’ll reiterate that these are likely poorly expressed opinions at best, so please bear with me.

Stop Telling Me About Your App Store

The technology software vendors love to talk about their “app store capabilities”, particularly when it comes to cool new things like cloud. We’re all relatively happy to accept a rapid development and update cycle for our phones, and we want to pick out the services we need from a web page and deploy them quickly. Why can’t we do that with enterprise software? Well, you can up to a point. But there’s a metric shit tonne of work that needs to be done organisationally before most shops are really ready to leverage that capability. There, I’ve said it. I don’t think you can right the words Agile and DevOps in your proposals and magically be at that point. I’m not saying that there’s no value in these movements – I think they’re definitely valuable – but I still maintain there’s work to be done. As an aside, go and read The Phoenix Project. I don’t care if you’re in ops or not, just read it. It’s very cheap on Kindle. No I don’t get a cut.

What If It Breaks?

Enterprises don’t like to update their software platforms because they are inordinately afraid that something will break. To the average neckbeard, this is no big deal. We’ll reach for the backups (hehe), roll back the change and try to work out what happened so that it doesn’t happen again. But in enterprises, they aren’t the ones making the decisions. Their neck isn’t on the block if something goes wrong. It’s some middle manager you’ve never heard of in charge of a particular division within the company whose sole purpose is to support whatever business function this particular bit of software services. And the last guy who really understood anything about this critical software left the company seven years ago. And it was a bit of off the shelf software that was heavily customised and lightly documented. And so people have been clinging to this working version of the software on a particularly crusty platform for a very long time. And they are so very scared that your upgraded platform, besides causing them a lot of testing work, will break things in the environment that no one understands (and fewer still will be able to fix). I’ve worked in these environments a lot during the past 15 – 20 years. At times I’ve considered finding new employment rather than be the bunny pushing the buttons on the upgrade of Widget X to Widget X v2 for fear that something spectacularly bad happens. You think I’m exaggerating? There’s a whole consulting industry built around this crap.

But You Said This Was The Best Ever Version

When I lived in customer land, I had any number of vendors tell me about their latest versions of the their products, explaining, somewhat breathlessly, just how good this particular version was. And how much better than the old version it was. And how I should upgrade before my current support runs out. I have this conversation frequently with customers:

Me: “Version 7 of Super Software is coming to end of support life, you’ll need to upgrade to Version 7.8”

Customer: “But what’s changed that Version 7 won’t do what I need it to do anymore?”

Me: “Nothing. But we won’t support it because the platform is no longer supported”

Customer “…”

I know there are reasons, like end of support for operating systems, that mean that it just doesn’t make sense, fiscally speaking, to keep supporting old version of products. I also understand that customers are usually given plenty of notice that their favourite version of something is coming up to end of support. I still feel that we’re a little too focused on fast development of software (and improvements, of course), without always considering just how clunky some organisations are (and how difficult it can be to get the right resources in place to upgrade line of business applications). Granted, there are plenty of places who deal just fine with rapid release cycles, but large enterprises do not. And what is it that one day suddenly stops a bit of software from working? If my version goes end of support tomorrow, what changes from a technical perspective? Nothing, right? Yes and no. Nothing has changed with the version you’re running, but chances are you’re two major revisions behind the current one. I bet there’ve been a bunch of new features (some of which might be useful to you) introduced since that version came out. You can also guarantee that you’ll be in something of a bad way when new security flaws are discovered either in your old software or the old platform, because the vendors won’t be rushing to help you. It will be “best effort” if you’re lucky.

 

But You Don’t Understand My Business

It may be startling for some in the tech community to discover, but 99% of companies in the world are not focused (primarily) on matters of an IT nature. It doesn’t matter that major vendors get up on stage at their conferences and talk about how every company is an IT company. The simple fact is that most companies still treat IT as an expense, not an enabler. When vendors come along and decide that the software they told you was awesome two years ago is now terrible and you should really burn it with fire, you’re generally not going to be impressed. Because it’s possible that you’re going to have to pay to upgrade that software. And it’s very likely it’s going to cost you in terms of effort to get the software upgraded. But if your business is focused on putting beer in bottles and the current version of software is doing that for you, why should you change? On the flip side of this, software companies have demonstrated over time that it’s very hard to generate consistent revenue from net new customers. You need to keep the current ones upgrading (and paying) regularly as well. It has also been explained to me (as both a customer and integrator) that software companies are not charities. So there you go.

 

What’s The Answer Then, Smarty?

No idea. Enterprise IT is hard. It always has been. It may not be in the future. But it is right now. And software companies are still doing what software companies have always done, for good and bad reasons. I really just wanted to put some thoughts down on paper that reflected my feeling that enterprise IT is hard. And we shouldn’t always criticise people just because they’re not running the latest iteration of whatever we’re selling them.

Okay, fine. The answer is to try and keep within support where you can. And minimise your exposure in the places where you can’t. Is that what you wanted to hear? I thought so.

Enterprise IT is hard.

Opinion: How Much Do You Really Care About Your Data?

I’m sure you care a lot about it. I care a lot about my data. And I get paid to care about other people’s data too (up to a point). I did this as a topic for vBrownBag at Dell EMC World 2017. Unfortunately I wasn’t as prepared as I should have been and had a brain freeze towards the end and cut it short. The premise of the talk was around sizing for data protection activities. Whilst I didn’t want to go into specifics from a technical perspective, I think some people have been missing some fundamental stuff and I thought it might be useful to post some concepts that may prove useful.

 

Build The Foundation

Get your data protection foundation in place before you buy anything. It seems obvious, but you need to do a whole lot of preparatory work before you can put a reliable data protection solution in place. The key to a solid foundation, in my opinion, is understanding the answers to the following questions:

  • What are you trying to do?
  • What’s really important to the business?
  • How do you categorise the information and articulate the value?
  • What about dead data?

Once you’ve been able to successfully answer these questions, you can start to think about how the answers can be converted into a data protection solution.

 

What Are You Trying To Do?

Understanding what you’re actually trying to do is important. It seems a simple thing, but I run into a lot of well-meaning people who don’t understand the difference between backup and recovery, disaster recovery, and archiving. You might perform backup activities on a daily / weekly / monthly basis to provide a mechanism to recover files in the event of deletion, system failure, or some kind of corruption. You provide disaster recovery facilities in the event of a massive system failure, usually due to a significant environmental failure (for example, one of your data centres is flooded or your primary storage array has fallen over and can’t get up). An archive is something that you need to keep for a defined period of time but is not something that you access frequently. It’s a bit like those old photos you keep in a shoe box under your bed (remember when photos were actual things we held in our hands?). You might access them from time to time but they’re kept in a different spot from the documents you access on a daily basis. It’s important not to confound these three activities, as the technical solutions that can provide these services often look very different.

 

What’s Really Important To the Business?

So now you understand the kind of activity you’re trying to conduct. At this point it’s a good idea to try and understand what data you’re trying to protect. Spoiler alert – you’ll need to talk to the business. Sure, they might come back to you and tell you that everything is important and everything needs to be kept forever. You can park that for the time being. More importantly, they’ll be able to tell you what is the mostest of the most important applications and data. and when those applications and data are accessed. This information is important when it comes to assessing the capability of the proposed data protection solution. Some customers I’ve consulted with only run business hours and don’t care if it takes a full weekend to do their backups. Other businesses run applications that can’t be down while backups are running, so they need to look at alternative approaches to data protection that can be done in a shorter / non-disruptive timeframe (and usually for a higher cost). The business can also guide you on what really keeps the company running. I’ve lost count of the times I’ve walked into a company to do some consulting and observed a lot of people in the IT department who didn’t understand what was important to the business or why the business had placed such an emphasis on one particular application. You don’t need to be a world-leading expert on Widget X, but if that’s the main product sold by your company, it’s a good idea to at least understand the basics and the systems that support those basics.

 

How Do You Categorise The Information And Articulate The Value?

Understanding the data you have to protect is important. But how do you understand it’s value? You need to talk to the business about it. Once you understand what’s important to them, you can start to articulate the various levels of “value” that can assign to data. And it might not just be application data that is valuable to your company. Obviously, the infrastructure platforms hosting that data are also important (and certainly worthy of attention). Some companies find it simpler to articulate the value of data in terms of their core business, or in terms of revenue generated, or sometimes in terms of whether someone will die if the system is down (this is more often a healthcare system consideration than something you might see in retail). It’s also important that, once you think you’ve identified the data and estimated its value, that you get someone important in the business to review these assumptions and sign off on them. I’ve worked in plenty of places where the business has one idea of what’s being done with the data and the IT department has a whole other idea of what’s going on. It’s also important to revisit these assumptions on a regular basis to ensure that your protection systems haven’t been left behind when the company “pivots”.

 

What About Dead Data?

Finally, consider data lifecycles. It’s okay to delete things. It’s sometimes even a good thing. Not just because it clears up space, or provides some level of catharsis, but there may be legislative considerations that require you to get rid of old records to reduce the company’s exposure to potential litigation. Not everything needs to be kept forever. If it does, you may not need to back it up every day. Keeping everything forever will eventually cost you a lot of money to support. Unfortunately it’s often at the point that people have been doing this for a few years that they realise this approach may not be the best one.

 

Conclusion

Data protection can be hard. Particularly if you haven’t had the opportunity to understand the business and consult with them regarding what’s important to them (and how you can help them, not just get in the way). Hopefully I’ve provided some useful pointers here that will get you on the right path. Obviously, everyone’s situation is different, and what might be important to you may not be important to someone else. Life is like that. The point of this is that there’s a whole lot of work that needs to happen before you get to the point of even thinking about what the solution will look like. It seems like a common sense notion, but it’s one that is often dismissed in the rush to get solutions delivered in the enterprise.

While I’m on my soapbox, Preston is a lot better at this stuff than I am, so you should check out his latest book on data protection – it’s a ripper.

And if you want to see jet-lagged me ramble on (albeit briefly) during my vBrownBag debut, here you go.

OT – Career Advice

If you’ve ever checked out my LinkedIn profile you’ll know I’m not necessarily a shining light of consistency in terms of the work I do and who I do it for. That said, while I’m not a GreyBeard yet, my sideburns have silvered somewhat and I’m nothing if not opinionated when it comes to giving advice about working in IT (for good and bad). Funnily enough someone I know on the Internet (Neil) was curious about what IT folk had to say about getting into IT and put together a brief article and quotes from myself and 110 other people who know a bit about this stuff. I hate the term “guru”, but there are certainly a bunch of smart folk giving out some great advice here. Check it out when you have a moment.

Faith-based Computing – Just Don’t

I’d like to be very clear up front that this post isn’t intended as a swipe at people with faith. I have faith. Really, it’s a swipe at people who can’t use the tools available to them.

 

The Problem

I get cranky when IT decisions are based on feelings rather than data. As an example, I’ve been talking to someone recently about who has outsourced support of their IT to a third party. However, they’re struggling immensely with their inability to trust someone else looking after their infrastructure. I asked them why it was a problem. They told me they didn’t think the other party could do it as well as they did. I asked for evidence of this assertion. There was none forthcoming. Rather, they just didn’t feel that the other party could do the job.

 

The Data

In IT organisations / operations there’s a lot of data available. You can get uptime statistics, performance statistics, measurements of performance against time allocated for case resolution, all kinds of stuff. And you can get it not only from your internal IT department, but also from your vendors, and across most technology in the stack from the backend to the client-facing endpoint. Everyone’s into data nowadays, and everyone wants to show you theirs. So what I don’t understand is why some people insist on ignoring the data at hand, making decisions based solely on “feelings” rather than the empirical evidence laid out in front of them.

 

What’s Happening?

I call this focus on instinct “faith-based computing”. It’s similar to faith-based medicine. While I’m a believer, I’m also a great advocate of going to my doctor when I’m suffering from an ailment. Pray for my speedy recovery by all means, but don’t stop me from talking to people of science. Faith-based computing is the idea that you can make significant decisions regarding IT based on instinct rather than the data in front of you. I’m not suggesting that in life there aren’t opportunities for instinct to play a bigger part in how you do things rather than scientific data, but IT has technology in the name. Technology is a science, not a pseudo-science like numerology. Sure, I agree there’re a bunch of factors that influence our decision-making, including education, cultural background, shiny things, all kinds of stuff.

 

Conclusion

I come across organisations on a daily basis operating without making good use of the data in front of them. This arguably keeps me in business as a consultant, but doesn’t necessarily make it fun for you. Use the metrics at hand. If you must make a decision based on instinct or non-technical data, at least be sure that you’ve evaluated the available data. Don’t just dismiss things out of hand because you don’t feel like it’s right.

2017 – The New What Next

I’m not terribly good at predicting the future, particularly when it comes to technology trends. I generally prefer to leave that kind of punditry to journalists who don’t mind putting it out there and are happy to be proven wrong on the internet time and again. So why do a post referencing a great Hot Water Music album? Well, one of the PR companies I deal with regularly sent me a few quotes through from companies that I’m generally interested in talking about. And let’s face it, I haven’t had a lot to say in the last little while due to day job commitments and the general malaise I seem to suffer from during the onset of summer in Brisbane (no, I really don’t understand the concept of Christmas sweaters in the same way my friends in the Northern Hemisphere do).

Long intro for a short post? Yes. So I’ll get to the point. Here’s one of the quotes I was sent. “As concerns of downtime grow more acute in companies around the globe – and the funds for secondary data centers shrink – companies will be turning to DRaaS. While it’s been readily available for years, the true apex of adoption will hit in 2017-2018, as prices continue to drop and organizations become more risk-averse. There are exceptional technologies out there that can solve the business continuity problem for very little money in a very short time.” This was from Justin Giardina, CTO of iland. I was fortunate enough to meet Justin at the Nimble Storage Predictive Flash launch event in February this year. Justin is a switched on guy and while I don’t want to give his company too much air time (they compete in places with my employer), I think he’s bang on the money with his assessment of the state of play with DR and market appetite for DR as a Service.

I think there are a few things at play here, and it’s not all about technology (because it rarely is). The CxO’s fascination with cloud has been (rightly or wrongly) fiscally focused, with a lot of my customers thinking that public cloud could really help reduce their operating costs. I don’t want to go too much into the accuracy of that idea, but I know that cost has been front and centre for a number of customers for some time now. Five years ago I was working in a conservative environment where we had two production DCs and a third site dedicated to data protection infrastructure. They’ve since reduced that to one production site and are leveraging outsourced providers for both DR and data protection capabilities. The workload hasn’t changed significantly, nor has the requirement to have the data protected and recoverable.

Rightly or wrongly the argument for appropriate disaster recovery infrastructure seems to be a difficult one to make in organisations, even those that have been exposed to disaster and have (through sheer dumb luck) survived the ordeal. I don’t know why it is so difficult for people to understand that good DR and data protection is worth it. I suppose it is the same as me taking a calculated risk on my insurance every year and paying a lower annual rate and gambling on the fact that I won’t have to make a claim and be exposed to higher premiums.

It’s not just about cost though. I’ve spoken to plenty of people who just don’t know what they’re doing when it comes to DR and data protection. And some of these people have been put in the tough position of having lost some data, or had a heck of a time recovering after a significant equipment failure. In the same way that I have a someone come and look at my pool pump when water is coming out of the wrong bit, these companies are keen to get people in who know what they’re doing. If you think about it, it’s a smart move. While it can be hard to admit, sometimes knowing your limitations is actually a good thing.

It’s not that we don’t have the technology, or the facilities (even in BrisVegas) to do DR and data protection pretty well nowadays. In most cases it’s easier and more reliable than it ever was. But, like on-premises email services, it seems to be a service that people are happy to make someone else’s problem. I don’t have an issue with that as a concept, as long as you understand that you’re only outsourcing some technology and processes, you’re not magically doing away with the risk and result when something goes pear-shaped. If you’re a small business without a dedicated team of people to look after your stuff, it makes a lot of sense. Even the bigger players can benefit from making it someone else’s thing to worry about it. Just make sure you know what you’re getting into.

Getting back to the original premise of this post, I agree with Justin that we’re at a tipping point regarding DRaaS adoption, and I think 2017 is going to be really interesting in terms of how companies make use of this technology to protect their assets and keep costs under control.

Pure Storage – “Architecture Matters”

I received my Xtremio Upgrade Survival Kit from Pure Storage last week and wanted to just provide a little of commentary on it. I know it’s “old news” now, but it’s been on my mind for a while and the gift pack prompted me to burst into print.

IMG_9698

Firstly, it was interesting to see the blogosphere light up when news broke that the upgrade from 2.4 to 3 was destructive. You can read a few of the posts from Nigel here, Chris here and Enrico here. Chad tried to defend the position with a typically insightful (and when you’re a VP with the vendor you hope it’s insightful) post that defended a number of decisions that got them to that point and was basically a mea culpa combined with a broader discussion around architecture. The vendors didn’t miss their chance either, with Vaughn having his say here and an interesting post by Calvin that you can read here.

But the post that I think put everything in perspective was Stephen’s. Yes, it’s all technically a bit of a mess. But we’ve been conditioned for so long to read between the lines of vendor glossies and not believe that anything is ever really non-disruptive. Every NDU carries a risk that something will go pear-shaped, and we prepare for it. Most people have had an upgrade go wrong before, particularly if your job has been enterprise storage field upgrades for the last 5 – 10 years. It’s never pretty, it’s never fun, but nowadays we’re generally prepared for it.

While I enjoy the generally ballsy marketing from Pure Storage for calling out EMC on this problem, I think that ultimately we (partners, customers) are probably all not that fussed about it really. Not that I think it’s good that we’re still having these problems. Architecture does matter. But sometimes things get stuffed up.

As an aside though, how good would it be if you worked in an environment where all you needed to do was fill out a paper slip to do a change?