Server Fail Worker Frustrated

Executive Summary: IS A CLOUD-FIRST STRATEGY SENSIBLE FOR DISASTER RECOVERY?

Introducing part one of a three part Business Continuity blog series. In this first post we look at whether a cloud-first strategy is sensible for Disaster Recovery (DR), what you should look for in a ‘best-practice’ DR solution, and how a good DR strategy can prevent disasters from ransomware.

Using the existing storage solutions that you already own, the Peer Global File Service (PeerGFS) software monitors multiple vendor file systems in real-time to create a highly-available active-active DR solution, intelligently help prevent ransomware from spreading, whilst also creating an off-site backup.

To find out more or request a trial copy, click one of the buttons below.

More About PeerGFS
Download Request

Part 1: Is a cloud-first strategy sensible for Disaster Recovery?

There are some great backup solutions that leverage the scalability and availability of public cloud. But are they the correct solution as part of an enterprise disaster recovery solution?

Let’s take a look at workload considerations to help determine what works well in the cloud, and what works better in a data centre.

Workloads Overview

Workloads ideal for public cloud tend to be more ‘bursty’ in nature, which are great for an elastic compute model that can scale up as well as down, such as:

  • DevOps
  • Variable workloads, such as seasonal retail that would need more compute to be spun up in the lead up to Christmas
  • Compute intensive workloads, such as analytics / machine learning

To maintain the hardware in a datacentre all year round that could cope with the busier, more compute-heavy times, you would have a data centre full of tin (iron) that for most of the year might be going to waste, just waiting for those busier times. You still need to power it, maintain it, insure it, and so on. That doesn’t make economic sense.

But, there are some workloads that often aren’t suitable for public cloud. Examples could be:

  • Primary backups, because of restore times for large data sets over an internet connection, and possibly the cost of egress back to the data centre
  • High-performance applications that constantly demand a lot of disk I/O and network throughput

Some workloads need to be kept running all the time, whilst others are more sporadic in nature. Because public cloud allows anyone to rent a virtual machine and associated resources by the hour, only pay for the number of hours consumed and then shut down or tear down the infrastructure when its no longer needed, more variable workloads are very attractive candidates for public cloud.

Whereas, if you rented a VM by the hour and ran it ragged 24 hours a day, seven days a week, there will come a point when it probably becomes cheaper to run that workload on your own hardware in your own data centre.

It’s similar to the company car, that I don’t have. Let me explain.

My work day often consists of sitting in front of a keyboard, mashing my palms into the keys and hoping something legible comes out as a result – much like I’m doing now whilst writing this. I have Teams or Zoom meetings with my colleagues, customers, prospective customers and technology partners, but for the most part, I don’t need to travel ordinarily.

Except, when I do! And that’s when I would hire a nice car to drive to a meeting or a conference. If I was on the road several times a week, then a company car would make economic sense, but as I don’t, it works out cheaper each year to hire one when I need it, and then give it back again. And of course, I don’t need to worry about maintenance, car tax, depreciation, and so on.

It’s worth keeping in mind that like car rental companies, nobody is running a cloud business as a charity, and of course there’s some margin in there somewhere. I just need to weigh up which option is more cost-effective for me, or my workloads.

I recently watched a YouTube video on public cloud repatriation, discussing this very topic, and why some companies are bringing their workloads back from public cloud and back into their data centres. I was very impressed by Bobby Allen, whose LinkedIn profile describes him as a “Cloud Therapist at Google – living at the intersection of cloud computing & sustainability“.

After wondering for a moment if he was given that title by the Senior Vice Architect of Made-Up Job Names, I realised that he was actually asking some very pertinent questions, such as:

“As the expense of running a workload in public cloud increases, does the value of the workload increase? If not, it probably shouldn’t be in public cloud.”

And:

 “Should an application that doesn’t have unlimited value be put in a place that has unlimited scale and spend?”

Take a moment to consider that second one. That sounds like a pretty wise question to ask when considering a workload for public cloud, if you ask me.

He further stated:

“When you run a virtualisation solution on-prem, you have a finite set of functionality, under a cost ceiling. When you move workloads to the cloud, they can use or consume so many more services, spin up and connect to so many more things, and generally have the ability to scale up quickly, and the unknown and unpredictable cost of that can make many IT Director’s fearful of making the switch to public cloud.”

That makes total sense! They would no longer have that security blanket of a cost ceiling.

No one wants to suddenly be hit by an unexpected bill, because they haven’t accounted for something, or there has been a mission creep that turned out to be expensive when the bill comes in. It’s an uncomfortable conversation for someone to have with the business stakeholders.

Workloads Suitable for the Cloud or Data Centre

So, given that some workload types ARE suitable for public cloud hosting, and some are definitely not. What about for example, backing up to a cloud-hosted VM?

In my opinion, it can definitely form part of a disaster recovery solution, especially as part of a mitigation strategy for cyber attacks such as ransomware, but perhaps not as your ONLY or primary backup. Here is a good reason why.

An important factor when designing a disaster recovery strategy is the Recovery Time Objective, or RTO. Or put another way, how long it will take following a disaster to get everything up and running as normal again. Of course, if you need to recover a single file or a few files and folders, over an Internet connection, that’s probably realistic and an acceptable RTO.

But what if you had to restore a larger amount of data over that Internet connection?

What if an entire file server, hosting multiple terabytes of data needed restoring, or heaven forbid, an entire data centre’s worth of data? The RTO would be astronomical and not at all realistic. It could take weeks or maybe even months to restore, and the organisation should see that as a business risk that rules it out.

There are solutions on the market that can redirect a user who’s trying to access a locally corrupted or infected file to the non-corrupted ‘good’ version in public cloud, without having to copy it back on-premises.  That sounds like a pretty good solution, but the degree of vendor lock-in that you would be subjected to, would put many people off. If you decided to stop using their gateway to the cloud, you wouldn’t have access to your cloud-hosted files. You have business continuity, but how long is it still going to take to repair your on-site file server?

Download Request

About the author

Spencer Allingham Headshot
Spencer Allingham
Presales Engineer at | + posts

A thirty-year veteran within the IT industry, Spencer has progressed from technical support and e-commerce development through IT systems management and for ten years, technical pre-sales engineering. Focussing much of that time on the performance and utilisation of enterprise storage, Spencer has spoken on these topics at VMworld, European VMUGS and TechUG conferences, as well as at Gartner conferences.

At Peer Software, Spencer assists customers with deployment and configuration of PeerGFS, Peer’s Global File Service for multi-site, multi-platform file synchronisation.