New DBA Series: Basics of Database Storage – Part One
Why is storage so important in a world of 99.9% buffer hits? I’m glad you asked!
Getting started
This is a BIG topic, so we are going to break it out into a multi-part saga. First, let’s talk about some basics.
Database block size
When you create an OpenEdge database from scratch, one of the first properties that you need to specify is the database block size. By default, the value is 4K on Windows and Linux and 8K on “other UNIX” ie. AIX, HPUX, and Solaris.
The most important thing is that the block size be at least 4k and no smaller than the filesystem block size. We generally find that bigger blocks result in slightly better performance so if performance is critical we often go with 8k. There are, however, arguments* for matching the DB block size to the file system block size, since file system operations occur at that size.
* There is a rare and esoteric problem called a torn page which slightly favors matching the database and file system block sizes. However, in my [Paul] nearly thirty years as an OpenEdge DBA, I have suspected that torn pages were a potential explanation for a database corruption once…maybe twice.
Type 0-1-2 storage areas
Storage areas provide the DBA with tools to manage the storage location of objects within a database: essentially, what goes where.
Over the years many of our customer databases have gone through three generations of storage area structure, with each generation adding huge improvements. From this experience it is clear that, unless you are stuck running Progress version 9 or earlier, you should be using type 2 storage areas exclusively. But you don’t get storage areas automatically! You have to take steps to define the needed storage areas (we will talk about that in a future post) and move your data into them. If your database has simply been converted from v8 to v9 to oe10 and upward then it is very likely that all of your data is in the “Schema Area” and that you are not getting the benefit of these powerful storage features.
Type “0”
There isn’t really anything officially called a “type 0” storage area – we’re making that up for convenience. Prior to version 9, a Progress DBA did not have any tools to manage objects within the database. The DBA would create the database on disk, load the empty schema, then hand off control to development who would then add application tables and indexes. Blocks in the database contained either table data or index trees, and a table block could contain data from any number of tables.
From a practical perspective, this meant that data was [mostly] stored in chronological order. This was ok for day-to-day operations, as the working set (work being done by the business *right now*) was often small enough to [mostly] stay in the -B buffer cache. Any kind of reporting, on the other hand, was often painful, as data was scattered across a wide array of data blocks. Imagine querying all orders from a customer over the past 3 months: those order and order-line records would have been spread over three months worth of application data from ALL parts of the business, mixed in with invoices, shipping records, work orders, inventory data, GL entries… you get the idea. The application needed to read a relatively huge number of physical blocks to extract the relatively small number of records requested by the user!
This arrangement also severely limits the ultimate size of the database. As database sizes grew rapidly in the 90s Progress needed to do something…
Type 1
With Progress version 9, the Engine Crew added type 1 storage areas. This gave DBAs the ability to segregate data by assigning database objects to storage areas. You could have thousands of storage areas! Each storage area was like opening up a wormhole in space and worries about database size limits receded into the background. The DBA could also specify the maximum number of records per block (usually abbreviated RPB), for cases where the default value of 32 RPB (4k blocks) or 64 RPB (8k blocks) was not optimal. Data blocks could still contain mixed records from any of the tables assigned to the area. Best of all you could adopt storage areas without having any impact on the application code. There was no need to modify the application to take advantage of storage areas!
DBAs were excited! The first thing we did was to segregate large and/or very active tables into their own storage area. GL, audit and inventory tables were common candidates, and the users were happy! Or at least users whose DBA took advantage of the new feature were
Type 2
With OpenEdge 10, Progress introduced type 2 storage areas, which added two main features to the physical structure of the database. Once again these features were added transparently – application code has no awareness of storage areas!
First, and most important, data was segregated at the block level. Any database block in a type 2 area contains data from only one object. This means that data is much less physically scattered across the database, as all the records for that table are stored in a smaller number of blocks than if they were to be interleaved with other data.
Second, the concept of clusters was introduced, defining the minimum number of database blocks (Blocks per Cluster or BPC) to add when expanding a storage area. If insufficient space is available to add a record, the area is expanded by BPC blocks (i.e. one cluster) and those blocks are assigned to the object that needed them.
Users whose DBAs took advantage of this saw instant performance improvements with some of their larger reports and those improvements tended to degrade more slowly over time! Wow! From a DBA’s perspective, this means that the concept of annual or biannual dump-and-loads is often (not always) no longer required. That’s a pretty significant improvement!
The Before image file
Aahhhh…the BI file. I cannot count the number of times that a client presented us with a Ferrari of a server, only to pair it with Mini Cooper tires.
Writes to the BI file are single-threaded, and all changes to the database are first logged to the BI file so that if there is a crash, the database can be recovered. Yes, with thousands of users, access to the physical BI file is one-at-a-time. Properly configured, this works very well, and at WSS we have clients with many thousands of concurrent users. But improper configuration, either at the database level or within the storage subsystem, inevitably leads to poor performance and unhappy users.
We suggest a few simple tweaks, and we’ll dive into the topic a bit more in subsequent blog posts. Note that these are for Enterprise RDBMS databases, not Workgroup.
- Set the BI block size to 16KB
- Set the number of BI buffers to around 100 – 200. The cost is insignificant (100 x 16 KB) but please note that more is not necessarily better, there is no advantage to going crazy and using a value like 10,000!
- Set the BI cluster size to at least 8MB.
- Start the Before Image Writer (BIW)
After image files
Similar to BI files, all changes to the database are logged to the AI files before being written to the database. The main difference is that the AI logs are meant to be archived and saved, in case you need to restore a backup and recover changes made to the database after the backup occurred.
The same performance concepts for the BI file apply to the AI subsystem, and optimizing the speed at which data can be written to disk is paramount.
Our suggested tweaks are also similar to those given above:
- Create at least 8 AI files, it is handy to have more sometimes but don’t go crazy – you probably don’t want 30+
- Set the AI block size to 16 KB
- Set the number of AI buffers the same as for BI buffers – that old kbase article that you found about using 150% is obsolete and deprecated
- Start the After Image Writer (AIW)
Fixed or variable AI extents?
This is a great question, and one that does not have a super simple answer. 98% of databases will hum along happily and speedily with 8 variable-length AI extents. This is what we recommend for nearly all of our clients.
The other 2% are databases that are subject to such a monstrously large number of database changes that the cost to constantly expand the variable length AI extents is actually affecting the business. For those databases, large, fixed-length AI extents are preferable. But again, the performance cost must be to the business, not solely to the DBA!
Before you all decide to use fixed-length extents, understand that there are some pretty important risks involved. If you choose to rotate AI files only when they fill, then you cannot know to what point in time you could recover your offsite backups. Your rotation interval will be dictated by the volume of changes that you make in the database and that are written to the AI files. You need a fixed AI rotation interval to meet a defined RPO.
However, if something goes wrong with your archiving process and you’re not monitoring with a tool like ProTop, once you fill that last fixed-length AI extent, your database will crash or stall. With variable length extents, you have a little more breathing room as that last busy extent will continue to grow until it runs out of file system space.
Also, recovering from a scenario where all the AI files are full requires some knowledge and experience. It isn’t difficult, per se, but unless you have experienced this in the past, you might have some trouble diagnosing the root cause and applying solutions, leaving your database down for an extended period of time.
Your foundation
Like the foundation to your home, database storage is the structure on which great performance and reliability are built. Respect the storage. Take care of it. Monitor it.
See you next week!