Crash High Availability Crash Prevention Checklist Business Continuity

Everyone’s got a [high availability] plan ’til they get punched in the face

Paul Koufalis

Ahhh…Mike Tyson. You gotta love Mike Tyson. You, on the other hand, you are probably more like Marvis Frazier. I see the puzzled look on your face: no, not Smokin’ Joe Frazier, the first person to beat Muhammad Ali in 1971. I’m talking about Marvis Frazier, his son. Look it up: KO’d in 30 seconds. Nice uppercut in the first round. When people talk to me about their high availability measures, I think about Marvis Frazier. Good looking guy. He talked the talk and walked the walk, but when it came time to deliver, he failed. Twice. Most of your high availability planning is the same. It’s good enough to impress the 24-year-old junior auditor from Deloitte but will never deliver when you actually need it.

Capture-decran-le-2021-02-01-a-16.24.44

So, what do you do?

Start by reading the precursor to this article: The Top 5 Business Continuity Excuses We Hear Every Day. Then read on for some concrete suggestions.

High Availability vs. Disaster Recovery vs. Business Continuity

Before we start, I want to clarify the differences between these 3 terms. Here are my $0.02 CAD definitions:

The Business Continuity Plan (BCP) defines what needs to happen if there is a major disruption to any part of the business, not just IT.
The Disaster Recovery Plan (DR) is more IT-centric: the data center is under 4 feet of water. Now what?
High Availability Planning (HA) is at the service level: what do I need to do to ensure that my OpenEdge-based ERP system is available to satisfy realistic business requirements?

When a massive ice storm hit Montreal in 1998, all those redundant power supplies didn’t help one bit. Not only did the power go out for 7 – 30 days, depending on your location, but the entire downtown core was shut down because it was too dangerous to walk around the city: think 5 cm (2 inch) thick slabs of ice falling from 30 story buildings.

With no access to the downtown core, a corporate BCP includes information about temporary workspace or people working from home. And with no power to the data center, the DR plan details its relocation to a secondary site, preferably to an area with a more agreeable climate (like Aruba!). Finally, the HA plan protects the ERP system from localized disruptions such as server reboots or lost connections to external systems.

Get Started with a High Availability Workshop

Different people interact in different ways with your OpenEdge system: users, developers, tech support, DBAs, management… Start by getting representatives from all these groups into a room for a brainstorming workshop. While the technical knowledge will come from the I.T. side, management needs to be there to represent the business and users will know things like the fact that there is a critical data transfer with an important vendor or customer every Tuesday at 11:41.

Map Out the Components and Interconnections

I’m shocked (ok not really – I should be shocked) at how often I discover that clients do not have comprehensive and easily available information on the architecture of their OpenEdge environments and all their dependencies. You need to map out every hardware system, application, web service, network component, internal and external interface, etc. that is required to completely run your critical application ecosystem. For example, imagine you rely on an external web service for forex data: what if that service goes down? Or a firewall blip blocks access? What if a router goes down and you can’t print labels in the warehouse?

Define Your Needs

Notice how this section is late in the blog: you would think that something as important as “define your needs” would be first, but it is not necessarily true that all components and interconnections have the same high availability requirements. Take the list from the previous section and decide how important each component is, from the business’ perspective, to the availability of the entire service. The OpenEdge databases, core ABL application, linux servers, storage subsystems and most of the network infrastructure will be highly critical. Conversely, perhaps you only upload/download EDI a few times per day: the x-hour gaps allow you a much more liberal interpretation of “available”.

Relationships are key to the success of this phase: without realistic cost numbers the business will naturally request 99.999% uptime of all components (roughly 5 minutes of downtime per year), but the cost will likely be exorbitant. And without meaningful input from the business, it will be difficult for I.T. to assign criticality to some of the lesser known components and connections.

What’s it Going to Take?

Now for the fun part: what is required to fulfill each component+availability combination? List solutions for each of the requirements, including estimated material and consulting costs, internal man-hours and expected duration to implement. Discuss the solutions internally and refine as needed, until you have an action plan that management can approve.

Ready, Set, GO!

I wish I could say “now comes the easy part”, but none of you would believe me. The reality is that implementing your plan will be challenging and that you will hit multiple obstacles along the way. In the end, you will have a plan that everyone will support because everyone was involved in its creation.

Don’t Be Marvis

It isn’t over. Like you, Marvis trained hard to get in the ring with Tyson but clearly he wasn’t up to the task. Make sure you stay prepared by exercising your plan at least once or twice a year. Adjust and improve your plan after each workout so that one day, when it’s your turn to enter the ring, you’ll be ready to dodge the uppercut and deliver your own knockout punch!

So, what do you do?

High Availability vs. Disaster Recovery vs. Business Continuity

Get Started with a High Availability Workshop

Map Out the Components and Interconnections

Define Your Needs

What’s it Going to Take?

Ready, Set, GO!

Don’t Be Marvis

Related Articles

Understanding UNIX Performance Metrics – Part Two

Migration Story – Part One of Two

From Big Iron to a “Commodity Server”

In this real-life case study of the migration of a 1,500+ user system from Solaris to Linux, we examine the question of how large a...