DRAAS on Multi-cloud or How to Avoid the Fate of Parler

Overview

In pursuit of excellence, companies rely on cloud providers more and more, as they provide them with tools to achieve this excellence as well as very trustworthy SLAs. Internet is full of success stories of various companies migrating completely to the cloud — AWS, Azure, GCP, and others. However, as usual, there is always the other side of the coin — vendor-lock. Focus on a single cloud provider leads to complete dependency, and sometimes it may end in a disaster. In this article, we’ll explain what might happen and how to get prepared for any kind of cloud disaster. Let’s get started with what happened to Parler.

Terminology

Regardless of a solution, one might adopt, it is vital to have a Disaster recovery plan that clearly and explicitly states what happens during the outage, what actions need to be taken to bring the application back online. Let’s get started with core terminology and requirements.

Disaster recovery plan

The most common issue during the initial creation of a disaster recovery plan is to understand what it actually means and what needs to be included in it. When Alpacked was first asked to create this document, we did the same thing as anyone else would do — try to Google it. Unfortunately, almost everything we could find was either meaningless articles written to exist instead of given real advice or enterprise-grade articles without any specifics. What everyone looks for is a real example of how companies deal with outages and prevent them from happening. So we decided to share our experience and eliminate a vacuum of real-world scenarios

  1. Notify users of the disruption of service
  2. Determine the severity of the disaster
  3. Implement a proper application recovery plan dependent on the extent of the disaster
  4. Monitor progress
  5. Verify service health and stability
  6. Notify users of the recovery of service
  7. Release incident report
  1. Navigate to Route53 and change the DB name record to point to the newly promoted master instance”
  1. Delete promoted replica from another region
  2. Update RDS to create a new read replica

Going above and beyond

So far we have reviewed a way to get prepared for an outage event within a single cloud provider. But what if the whole cloud provider goes down or, god forbid, you face the same situation as Parler did? It requires more preparation. In this case, a multi-cloud or hybrid cloud comes in place.

Disaster recovery as a service

It may sound counterintuitive, but the greatest challenge of disaster recovery one might face is not a creation of a plan or infrastructure setup, but continuous maintenance and testing. Infrastructure and application are subject to constant changes. Even a small change might make a plan outdated and therefore worthless. Keeping a disaster recovery plan up to date and verified requires thorough work and a lot of effort, which usually ends up in the need of outsourcing it to a team of professionals. Alpacked has been supporting customers in this area for a while already and has developed a list of frameworks, processes, and automation tools that allows us to implement it quickly regardless of the infrastructure type and ensure compliance with an SLA.

FAQ

Q: What is a Disaster Recovery Plan?

DevOps Consulting and Service Provider