Cloud and Disaster Recovery: Workload Management and Documentation

Add Your Comments

New concepts around cloud are allowing data centers and business to become a lot more agile. This means new ways to deliver applications, support new users, and even test out new technologies. The other major factor that cloud now introduces is greater levels of resiliency. This means organizations have a lot more options around workload delivery – even when there’s an emergency.

With that in mind – let’s pause here and discuss cloud, virtualization and working with disaster recovery. Whenever DR planning is brought up, numerous other elements need to be discussed for the project to be successful. One of those is managing and working with virtual workloads. Within a virtual environment, administrators will deploy servers, and top of those – applications.

All applications differ in their requirements and how the operate in a virtual state. Of course there are packaged apps which are fairly standard – but what about custom or more in-depth applications? Here are some critical factors to consider when working with diverse applications:

  • In a virtual environment, it’s very important to continuously monitor and maintain the application environment. This means using tools to see how well an application is running and what type of workload it’s able to handle. Remember, an application without the right resources won’t function properly and result in poor end-user performance. Even if a server has the resources, unless they are properly allocated, the application will suffer.
  • Production applications in a virtual environment may need additional support. This means deploying monitoring tools to ensure optimal performance at any given time. Planning for deploying a virtual application must happen as well. Can the application run in a 64bit environment? Does it require special patching if it’s running on a VM? Can we replicate the data should the server go down? These are all questions which have to be answered before an application is placed in a virtual state.
  • The great thing about a virtual environment is the ability to test and develop deployment methodologies without affecting live environments.
    • Administrators are able to create clones of a production VM and test patching, updates, or even conduct stress testing.
    • This can all be done in a safe, segmented environment were production data isn’t affected. By testing real workloads in a test environment, administrators can catch bugs and issues which they can then apply to their production systems.
    • For more granular testing, administrators are able to run applications and monitor them with graphs and set metrics. These metrics are then gathered together over a set period of time to help find an application’s optimal performance environment based on the existing virtual infrastructure.
  • The better managed an application environment is, the better the DR planning process. If an administrator is unaware of how their applications will perform during a spike or a disaster event, he cannot properly protect it. The more testing and development done in a controlled environment, the better prepared staff can be should an unforeseen event occur.

With all of this in mind – there’s one very often overlooked element when working with cloud, virtualization and disaster recovery. Knowledge plays a big role when planning out a DR initiative – especially in a virtual environment. Administrators must take the time to learn as much as they can about their virtual workloads to properly back them up, and protect them.

Creating DR documentation

With DR planning comes the important task of documentation. As I mentioned earlier, this step is often forgotten or saved for last and least important. This couldn’t be more incorrect. Documentation for a DR strategy is very critical to the success of the failover plan.

Administrators must not only create current distributed environment documentation, but they must also create what is known as a “living DR workbook.”

  • This workbook is a truly all-encompassing document which will evolve as the environment changes.
  • The document will reflect each IT team and their direct responsibilities should an event occur.
  • This document will also spell out different scenarios for different departments.
  • There will be remediation steps for each team and each person responsible will have a task when an outage or pre-designated event occurs.
  • Managers must continuously present this workbook to their staff and ensure that they understand their roles and functions should an event happen.

All IT team staff and key business personnel must be trained in DR event management. Should an actual disaster occur, all key people involved, business or IT, must know the course of action to be taken. This will include alerting, immediate remediation, and damage control. Remember, the only way a DR plan stays relevant is if there is continuous training happening at all levels.

DR documentation, training and awareness all serve a very key purpose. The idea is simple – what good is a robust DR plan if no one knows what to do when a disaster actually happens? The only way a distributed environment can be used properly with disaster recovery is if all the right people are able to make good decisions based on a planned out directive. Today’s businesses are heavily reliant on their IT infrastructure – this means business stakeholders must have a say and action item with the living DR plan.

Add Your Comments

  • (will not be published)