A big PostgreSQL upgrade

I don’t write here very often about stuff I do at work, but I just finished a project that was a hassle all out of scale with the actual utility, and I wanted to document it for posterity, in the hope that the next time I need to do this, maybe I’ll have some better notes.

Introduction

A bit of backstory. I have three personal PostgreSQL database “clusters” (what PostgreSQL calls a single server, for some reason unknown to me), plus I manage two-and-a-half more at work. All of them run FreeBSD, and all of them are installed from a custom package repository (so that, among other things, we can configure them to use the right Kerberos implementation). These systems were all binary-upgraded from PostgreSQL 9.4 to 9.6 in the summer of 2020, but all are much older than that — one of the databases started out 20 years ago running PostgreSQL 7.0, and another one was at some point ported over from a 32-bit Debian server (although not by binary upgrade). We have a shared server for folks who want an easy but capable backend for web sites that’s managed by someone else, and that has about 50 GB of user data in it (including, apparently, an archive of 4chan from a decade or so ago that was used for a research project). For my personal servers, I have applications that depend on two of the database “clusters” (one on my home workstation and another on my all-purpose server), and the third “cluster” is used for development of work projects. At work, in addition to the big shared server, we have a couple of core infrastructure applications (account management and DNS/DHCP) that depend on the other “cluster” — these are separate to avoid dependency loops — and the “half” was originally supposed to be an off-site replication target for that infrastructure server, but since I never managed that, I could use it for testing the upgrade path.

Now, as I said, we were running PostgreSQL 9.6. As of December 1, that version was still supported in the FreeBSD ports collection, and so we could build packages for it, but it had just gone end-of-live as far as the PostgreSQL Global Development Project was concerned. The FreeBSD ports maintainers have a history and practice of not keeping old versions of packages that are no longer supported upstream — unlike Linux there’s no big corporate entity selling support contracts and funding continued maintenance of obsolete software packages and therefore no “megafreeze”. So it was clear that we needed to get off of 9.6 and onto something modern — preferably 14.1, the most recent release, so we don’t have to do this again for another few years. But if I got stuck anywhere in the upgrade process, I wanted it to be as recent a release train as I could possibly get onto. For that reason, I decided to step through each major release using the binary pg_upgrade process, identifying and resolving issues at a point where it would still be relatively easy to roll back if I needed to do some manual tweaking of the database contents (which as it happened did turn out to be necessary).

All but one of these databases are small enough that it would be practical to upgrade them by using a full dump and restore procedure, but of course the 50GB shared database is too big for that. I wanted to maximize my chances of finding any pitfalls before having to upgrade that database, which meant the same pg_upgrade in-place binary upgrade process for all of them. Running pg_upgrade on FreeBSD is a bit involved, because different PostgreSQL versions cannot be installed together in the same filesystem, but this part of the procedure is fairly well documented in other sources online. I have two separate package build systems, one for work and one for personal, because the work one doesn’t need to bother with time-consuming stuff like web browsers and X, whereas the personal one is what’s used by all my workstations so it has all of that. In both cases, though, package repositories are just served directly from the repository poudriere creates after every package build.

Building packages

Because poudriere builds packages in a clean environment, there is no difficulty in building a package set that includes multiple PostgreSQL releases. Where the challenge comes in, however, is those packages for which PostgreSQL (or more specifically, postgresql-client) is an upward dependency — they can only be built against one PostgreSQL version, either the default one defined in the FreeBSD ports framework, or (most relevant for my case) the one set in /usr/local/etc/poudriere.d/make.conf in the DEFAULT_VERSIONS variable. poudriere has a concept of “package sets”, packages built with different build parameters but otherwise from the same sources and build environment, which makes it easy to build the six different package repositories required for this project: we can just create a pgsql10-make.conf, pgsql11-make.conf, and so on, and then use the -z setname option to poudriere bulk to build each repository.

Now, one of the things poudriere is good at, by design, is figuring out which packages in a repository need to be rebuilt based on current configuration and package sources — so we don’t need to actually rebuild all of the packages six times. First, I added all the newer PostgreSQL packages (postgresql{10,11,12,13,14}-{client,server,contrib,docs}) to my package list, and built my regular package set with my regular make.conf and all of those versions included. (I lie: actually I’ve been doing this for quite some time.) Then, I made six copies of my repository (I could have used hard links to avoid copying, but I had the disk space available) using cp -pRP, after first checking the poudriere manual page to verify where the setname goes in the path. (For the record, it’s jail-portstree-setname.) Then I could step through each setname with poudriere bulk and only rebuild those packages which depended on postgresql-client. All I needed to do was make these additional sets available through my web server and I would be ready to go.

Because I knew this was going to be a time-consuming project, I chose to freeze my ports tree after the Thanksgiving holiday: I could either manage a regular package update cycle (which I usually do once a month) or I could do the database upgrade, not both.

The “easy” systems

For obvious reasons, I started out working on my personal machines; they all have slightly different configurations, with their own motley collections of databases, loaded extensions, and client applications. They were set up at different times and were not under any sort of configuration management, so they were all subject to some amount of configuration drift, but none support remote access, so I didn’t have to worry about synchronizing pg_hba.conf or TLS certificates — which was a concern for the work servers, which had exclusively remote clients. And since I was the only direct user of these machines, it was easy for me to make the upgrades in lockstep on all three servers, so I wouldn’t get stranded with different machines requiring different PostgreSQL releases. (That wouldn’t have been a crisis on any of those machines, but would have been a bigger issue at work where the package repository configuration is under configuration management.)

The whole process actually was pretty easy, at least at first, and worked pretty much as the various how-to articles suggest: create a directory tree where you can install the old packages, stop the old server, initdb, run pg_upgrade, start the new server, and do whatever pg_upgrade told you to do. This is not automatable! You have to actually pay attention to what pg_upgrade says, which will vary depending on database configuration, what extensions are loaded, and on the specific contents of the database cluster, in addition to which “old” and “new” PostgreSQL releases are targeted. (You must always use the pg_upgrade supplied with the new server release.) I’ll give a full rundown of the process at the end of this post.

The first showstopper issue I ran into is that PostgreSQL 12 dropped support for WITH OIDS. If you’re not familiar, in early versions of PostgreSQL (indeed, even before it was called that), every row in every table would automatically get a column called oid, which was a database-wide unique numerical identifier. There are a whole bunch of reasons why this turned out to scale poorly, but the most import of these was that the original implementation stored these identifiers in an int32 on disk, so if you had more than four billion tuples in your database, they would no longer be unique (and nothing in the implementation would enforce uniqueness, because that was too expensive). The oid type served a useful function in the internals of the database, but by PostgreSQL 9.x, actually using the oid column was deprecated, and the default was changed to create new tables WITHOUT OIDS.

It should not surprise you, if you’ve read this far, that some of these databases dated back to PostgreSQL 8, and therefore were created WITH OIDS, even if they they didn’t actually use the implicit oid column for anything. (I had to carefully check, because some of my applications actually did use them in the past, but I was able to convince myself that all of those mistakes had been fixed years ago.) None of this was an issue until I got to the step of upgrading from PostgreSQL 11 to 12 — because PostgreSQL 12 entirely dropped support for WITH OIDS tables: you can’t upgrade from 11 to 12 without first either dropping the old tables or using ALTER TABLE ... SET WITHOUT OIDS while running the old release of the server. pg_upgrade can’t patch this over for you. On more than on occasion, I got the 12.x packages installed only to have pg_upgrade fail and have to roll back to 11.x.

The first time this happened, I was almost ready to give up, but I was able to find a howto on the web with the following extremely helpful bit of SQL to find all of the WITH OIDS tables in a database:

SELECT 'ALTER TABLE "' || n.nspname || '"."' || c.relname || '" SET WITHOUT OIDS;'
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE 1=1
  AND c.relkind = 'r'
  AND c.relhasoids = true
  AND n.nspname  'pg_catalog' 
ORDER BY n.nspname, c.relname;

(Hint: replacing the semicolon at the end with \gexec will directly execute the DDL statements returned by the query, so you don’t have to cut and paste.) Note that this procedure must be run on every database in the cluster, using the database superuser account, to get rid of any remaining OIDs.

Another important thing I ran into is that some PostgreSQL server-side extensions define new data types, and the code in those extensions must be available to the old server implementation for pg_upgrade. The easiest way to ensure this is to make sure that the old versions of all of the extension packages are installed in the same temporary filesystem as the old server packages are. In my case this was easy because I was only using extensions included in the postgresql-contrib package, which in the process above was built for every version I was stepping through.

Once I fixed the WITH OIDS issue, I completed the upgrade to 12.x and let it burn in for a day before continuing on with 13 and 14, so the whole process took about a week, but I was confident that I could do it in under four hours for the work servers if I could just deal with the OID issue.

The hard systems

I used the query above to check all of my work databases for OIDful tables, and there were quite a bunch. I was able to confirm that the ones for our internal applications were just ancient cruft. Frankly, most of the data in our shared server is also ancient cruft, so I largely did the same there, but several of the tables belonged to someone who was still an active user, and so I asked first. (He told me I could just drop the whole schema, which was convenient.) Finally, I was ready, and sent out an announcement that our internal applications would be shut down and the shared database server would be unavailable for some unknown number of hours this week. The process ended up taking about 2½ hours, most of which was spent copying 50 GB of probably-dead data.

pg_upgrade has two modes of operation: normally, it copies all of the binary row-data files from the old version to the new version, which allows you to restart the old database with the old data if necessary. There is another mode, wherein pg_upgrade hard-links the row-data files between the two versions; this is much faster, and obviously uses much less space, but at the cost of not being able to easily roll back to the old server version if something goes wrong. All of our servers use ZFS, so a rollback is less painful than a full restore from backups would be, but it’s still much better if I don’t have to exercise that option. On the big shared server, it would simply take too long (and too much space) to copy all of the data for every upgrade, but it made sense to copy the data for the upgrade from 9.6 (old-production) to 10.x, and then link for the each successive upgrades, guaranteeing that I could restart the old production database at any point in the process but not worrying so much about the intermediate steps that would be overtaken by events in short order.

Those other servers are included in our configuration management, which I had to stop during the upgrade process (otherwise it would keep on trying to revert the package repository to the production one and then reinstall the old PostgreSQL packages). This also required paying more attention to the server configuration files, since those were managed and I didn’t want to start the database server without the correct configuration or certificates (having had some painful and confusing recent experiences with this). I had to stop various network services and cron jobs on half a dozen internal servers, and call out to our postmaster to keep the mail system from trying to talk to the account database while it was down (all of these applications are held together with chewing gum and sticky tape, so if someone tried to create a mailing-list while the account database was down, the result would likely be an inconsistent state rather than a clean error). I started by copying the existing network and accounts database to the off-site server, so that I could run through the complete upgrade process on real data but on a server nobody was relying on. (I initially tried to use pg_basebackup for this, but it didn’t work, and I fell back to good old tar.) It was in running through this process that I discovered I had neglected to account for a few pieces of our managed configuration. That dealt with, I then proceeded to the production account and network database, and finally the big shared database full of junk.

The actual upgrade procedure

Note that as a consequence of having previously upgraded from PostgreSQL 9.4 to 9.6, our package builds override PG_USER and PG_UID to their old values; thus, the database superuser is called pgsql and not postgres as in the current stock FreeBSD packages. The procedure assumes that you are typing commands (and reading their output!) at a root shell.

Preparatory steps on each server

Before doing anything, check /etc/rc.conf and /usr/local/pgsql to verify that there is nothing special about the database configuration. Most importantly, check for a postgresql_data setting in rc.conf: if this is set, it will need to be changed for every step in the upgrade. Also, check for any custom configuration files in the data directory itself; these will need to be copied or merged into the new server configuration. (Because of the way our configuration management works, I could just copy these, except for two lines that needed to be appended to postgresql.conf.)

zfs create rootvg/tmp/pg_upgrade
cd /usr/src
make -s installworld DESTDIR=/tmp/pg_upgrade
mount -t devfs none /tmp/pg_upgrade/dev

Note that these steps will be different for servers running in a jail — you will probably have to do all of this from the outside. The mount of a devfs inside the destination tree is probably unnecessary; it’s not a complete chroot environment and the package setup scripts aren’t needed.

At this point, I would stop configuration management and set downtime in monitoring so the on-call person doesn’t get paged.

conffiles="cacert.pem keytab pg_hba.conf pg_ident.conf postgresql_puppet_extras.conf server.crt server.key"

This just sets a shell variable for convenience to restore the managed configuration before starting the server after each upgrade. Depending on your environment it might not be necessary; you will almost certainly have to customize this for your environment.

The following is the main loop of the upgrade procedure. You’ll repeat this for each release you need to step through, substituting the appropriate version numbers in each command.

# install the _old_ database release in the temporary install tree
pkg -r /tmp/pg_upgrade/ install -y postgresql96-server postgresql96-contrib
# Update the package repository configuration to point to the repo for the _new_ release
vi /usr/local/etc/pkg/repos/production.conf 
cd /usr/local/pgsql
service postgresql stop
# If and only if you set postgresql_data in rc.conf, update it for the new release
sysrc postgresql_data="/path/to/data/$newvers"
# Try to install the new release. This is known to fail going from 9.6 to 10.
pkg install postgresql10-server postgresql10-contrib
# If the above fails, run "pkg upgrade" and then repeat the "pkg install" command

# Create the new database control files. Will fail if the data directory already exists.
service postgresql initdb
# Check whether the upgrade can work
su pgsql -c 'pg_upgrade -b /tmp/pg_upgrade/usr/local/bin -B /usr/local/bin -d /usr/local/pgsql/data96 -D /usr/local/pgsql/data10 -c'
# The most common reason for this to fail is if the locale is misconfigured.
# You may need to set postgresql_initdb_flags in rc.conf to fix this, but you
# will have to delete the data10 directory and redo starting from the initdb.

# Same as the previous command without "-c". Use "-k" instead for link mode.</i<
su pgsql -c 'pg_upgrade -b /tmp/pg_upgrade/usr/local/bin -B /usr/local/bin -d /usr/local/pgsql/data96 -D /usr/local/pgsql/data10'

# pg_upgrade will likely have output some instructions, but we need to start
# the server first, which means fixing the configuration.
for a in ${conffiles}; do cp -pRP data96/$a data10/; done
tail -2 data96/postgresql.conf  >> data10/postgresql.conf
service postgresql start

# If pg_upgrade told you to update extensions, do that now:
su pgsql -c 'psql -f update_extensions.sql template1'
# if pg_upgrade told you to rebuild indexes, do htat now
su pgsql -c 'psql -f reindex_hash.sql template1'

# For a big database, this can be interrupted once it gets to "medium"
# (make sure to let it complete once you have gotten to the final version).
su pgsql -c ./analyze_new_cluster.sh 
# If the new version is 14.x or above, run the following command instead:
su pgsql -c 'vacuumdb --all --analyze-in-stages'

Before moving on to the next version in your upgrade path, you should probably check that the server is running properly and authenticating connections in accordance with whatever policy you’ve defined.

I followed this procedure for all of my servers, and ran into only one serious issue — of course it was on the big 50GB shared server. pg_upgrade -c fails to diagnose when the database contains a custom aggregate as described in this mailing-list thread, and the upgrade process errors out when loading the schema into 14.x. The only fix is to drop the aggregates in question (after first reinstalling and starting the 13.x server) and then recreating them the “new way” after completing the upgrade. Thankfully this was easy for me, but it might not have been.

After all that, once you’ve verified that your applications are all functioning, you are almost done: rebuild the production package set, restart configuration management, and remove the temporary install tree (there’s nothing in it that is needed any more).

This entry was posted in FreeBSD and tagged , . Bookmark the permalink.