After-Imaging Business Continuity

New DBA Series: No More AIxcuses – Part Two

Mark Bonneville

If you follow the guidelines suggested here, you can expect more hours of uninterrupted sleep at the very least, and at best? You will experience minimal disruption to your system should your OpenEdge databases be damaged for reasons beyond your control.

In Part 1, we covered the fundamentals of AI, a bit on scripting, the AI File Management Daemon (aka AI Archiver aka AIMGT), and the AI-related commands. Here in Part 2, we cover the trail left by AI in your environment, maintaining and monitoring your AI subsystem plus the crux of After Imaging – restoring your database with after image roll forward!

AI Artifacts

Now that you have After Imaging and the AIMGT running, let’s review the new artifacts after imaging creates in your environment.

Database Structure file (.st)

The lines in your structure file that begin with “a” are the AI files you added when enabling AI in Part I:

a /user_data/db/atm/atm.a1
a /user_data/db/atm/atm.a2
a /user_data/db/atm/atm.a3
a /user_data/db/atm/atm.a4

Your Actual AI Files

The entries above correspond to the AI files on disk that hold the AI notes replayed against your backup should a roll forward be required. According to how you implemented the archiver, these are the files the AIMGT archives (-aircinterval n, or as needed if not set). Place these on a dedicated file system on the database server that will not compete with your database files or your BI file.

Database Log File Entries (.lg)

With Afer-Imaging and AIMGT enabled, you will see many new entries in your database log file. Here are just a few common ones:

BACKUP105: (1362) Full backup started.
BACKUP105: (3777) Switched to ai extent /user_data/db/atm/atm.a2.
BACKUP105: (3778) This is after-image file number 2 since the last AIMAGE BEGIN
AIMGT 104: (3777) Switched to ai extent /user_data/db/atm/atm.a3.
AIMGT 104: (3778) This is after-image file number 3 since the last AIMAGE BEGIN

AI Archiver entries show up in the database log file as “AIMGT,” e.g., AIMGT 5: (13213) A new archive interval 120 has been set.

After Image Archive Directory (-aiarcdir) and Archived After Image Files

This is where the AIMGT places copies of your full AI files before marking them as empty for re-use. This directory is best placed on a local (NOT NFS) drive, with the files transferred separately to a remote server, preferably in another city (or at least another building). You’ll need these files plus a full probkup of your database to roll forward to a point in time.

WARNING #1: Verify and monitor that the AIMGT can see and write to your AI Archive Directory. If there’s a problem writing to an archive directory, the AIMGT will permanently discard that directory from its list, potentially leaving it with no place to archive.

WARNING #2: Resist the temptation to archive directly to a network drive, whether UNC, NFS, Samba or other. We have seen this situation lead to a frozen database on numerous occasions.

What’s In a Name?

The archived ai files are named with 5 components:

Now THAT’S a file name!

The good – all the information you need is stored in the file name. The bad – all the information you don’t need is stored in the file name. The ugly:

The full file path of the source database – /user_data/db/atm/atm.db
The date of the “aimage begin” backup
The time of the “aimage begin” backup
The AI sequence number
The original AI file name

CAUTION: This file naming convention changed in OE version 10.1. In older versions, the format included the date and time of the last backup in the filename. This turned out to be confusing, so later versions dropped them.

AI Archiver Log

The AI Archiver creates its own log – <dbname>.archival.log in your database directory the first time an entry is made. This log contains all of the relevant information about your database backups and archived after-image files required to reconstruct your database to a point in time. Use this to inform a manual recovery effort or to create an automated database recovery system.

Here is an example entry:

Whoa!! Just a bit challenging to read! In OE 10.1c and later, each line contains the following fields, separated by commas.

Entry Type
Explicit Database Name
Backup Date
Backup Time
Date of After Image Begin
Time of After Image Begin
AI Extent Sequence Number
Original Name of the Archived Extent
Name of the Archived Extent

Some Final Setup Steps to Ensure Reliability

Monitor the Moving Parts

Periodically verify that the AIMGT is up. On *nix, e.g., you can script “$ ps –ef | grep aimgt” and send out an alert email if it’s not there.
Check for AIMGT errors in the database log file
Set up alerts on key backup and AI indicators such as backup age, the number of full extents and the number of locked extents. You will see these at the top of the ProTop Trends Main Dashboard for the database:

Manage Your Files

Archive Log – Just like your database log, if not properly managed, your <dbname>.archival.log can grow very quickly depending on your configuration. Be sure to archive and truncate regularly!
AI Archive Files – have a purge process running that deletes archived AI files older than some X days, where X is determined by your internal retention policies. You will have to script this yourself.

How to Restore and Roll Forward

Restore Your Backup

To restore your last full OE database backup – you know where your last full backup is, right? Ideally, it’s within easy reach, perhaps on disk on your database machine.

If this is production, SAVE the “bad” database and all its related files (BI, AI, etc.). You might need the AI files that were active when you lost your database. Easiest way is to rename the directories containing the database files
1. But why not a probkup? Because in a corruption/crash situation where you need to restore and roll forward, it is possible that a probkup is neither possible nor desirable
2. This is why you should have four and even five times the database size as available disk space: old DB, old backup, maybe a new backup, maybe a dump, maybe a load and/or restore…or two. Lack of disk space should not be the reason why you’re down for 4-6-8 hours after an event.
Set your OE environment variables/path
cd into the target directory
prostrct create mydbname mydbname.st (optional – if your AI/BI/DB files are in different directories)
prorest mydbname /bkup/mydbname.bkup

Roll Forward Your AI Archive Files (and full or busy AI files)

Identify the required ai archives. Find them in your <database>.archival.log or <database>.log. Include in your list any actual AI files that were active but not archived, usually just the AI extent that was busy when the incident occurred. This is one of the reasons why we save off the “bad” db and ALL of its files!

If the required archives are not already there, copy them to a temporary directory on your database server while your restore is going on.

The archives are named such that an alphabetical listing (*nix ls, WIN dir) will show them in the order in which they should be rolled forward:

$ ls
user_data~db~atm~atm.20110601.003551.00000003.atm.a3
user_data~db~atm~atm.20110601.003551.00000004.atm.a4
user_data~db~atm~atm.20110601.003551.00000005.atm.a1
user_data~db~atm~atm.20110601.003551.00000006.atm.a2
user_data~db~atm~atm.20110601.003551.00000007.atm.a3
user_data~db~atm~atm.20110601.003551.00000008.atm.a4

The easiest way to roll forward is to list the file names in a text file, then use the -ailist parameter:

rfutil $DB -C roll forward -ailist ai_list.txt

Alternately, below is a scripted example for applying complete ai archives on *nix.

for file in $(ls $TMP_AI_DIR)
do 
   rfutil $DB –C roll forward –a $file
done

(If push comes to shove, take a guess and try one – the error message will tell you the correct archive file by sequence to apply first.)

Point In Time Recovery

Where human (or other) error damaged the data in the database, you must recover your database just before when the damage began. Applying all of your ai archives and the last busy extent (if available) will repeat the damage. You must determine as closely as possible the point in time at which the damage began.

Similarly, if you have multiple databases, you may need to roll them all forward to a common point in time.

With that point in time in mind, execute the roll forward command like this:

rfutil <database> -C roll forward endtime <time> -a <ai-file>

where <time> is in the form: yyyy:mm:dd:hh:mm:ss

Now restart your environment and validate.

Again, these are just the highlights; refer to your OpenEdge documentation for complete outage scenarios and details.

Use AI! There are no more AIxcuses, Period.

Congratulations! Now you know enough about OpenEdge after imaging to not be dangerous. AI and AI archive files are named intelligently, making restore and roll-forward simple. The AI <database>.archival.log file shows which files go with which backup. The AI archiver daemon makes archive management eminently sustainable. And remember, maintain and monitor your AI archive directory, so all the files you need are there when you need them.

Now don’t just go and tuck all these great AI ideas away for later, put them to use and you’ll sleep more soundly!