PG18 preserves planner statistics on upgrade — even from PG14
Postgres 18 can preserve planner statistics during major version upgrades. But can it work when upgrading from an older version to PG18 — which many plan to do soon? Let's test and see.
PG18 eliminates the post-upgrade performance cliff:
- Before PG18, every major upgrade wiped planner statistics — causing slow queries, timeouts, and potential outages until analyze completed
- PG18 preserves statistics during pg_upgrade, even when upgrading from PG14/15/16/17 — not just PG18→PG19
- For large databases (hundreds of tables, billions of rows), post-upgrade analyze alone could take hours
- pg_upgrade in PG18 handles this automatically — no manual steps for basic stats
The problem: upgrading Postgres has always meant losing statistics
Every major Postgres upgrade — 14 to 15, 15 to 16, 16 to 17 — has reset planner statistics to zero. The moment you switch to the new cluster, the query planner is blind. It falls back to default assumptions, produces terrible plans, and your application hits a performance cliff. This means you need to run ANALYZE, either in a single connection or parallelizing it with vacuumdb --analyze-only -j$N (but this has issues with partitioned tables — it's going to be much better in Postgres 19, but that's another story).
This is not a theoretical risk. It happens constantly (it's one of my favorite topics to blame managed platforms for lack of upgrade automation – their tradition is to leave it on your shoulders):
- AWS RDS upgrade, PG14 to 16: A developer skipped the post-upgrade
ANALYZE. All looked good during the weekend. Then Monday came, and a frequent query that ran on SeqScan put the server down. P0 incident, painful RCA with trivial mitigation that came too late –ANALYZE;. - On-prem upgrade, 16 to 17, 70+ databases: CPU spiked immediately after cutover. All looked great in planning. And the sad part is that they had parallelized stats recalculation using
vacuumdb -j, and it worked smoothly in the past, but the team had recently partitioned lots of tables... Guess which databases experienced performance issues? Yep, those with partitioned tables.
Another thing: the speed of ANALYZE. As Greg Sabino Mullane noted back in 2016, analyze can be painfully slow — slow enough that the default analyze methods sometimes take longer than the entire rest of the upgrade.
CYBERTEC echoed this for large-scale environments:
"For large databases with hundreds of tables, some containing billions of rows and dozens of columns, the post-upgrade analyze could take hours."
And if default_statistics_target is adjusted to a higher value (e.g., 1000), the duration of ANALYZE increases proportionally.
Multiple cloud providers and Postgres vendors — including Azure and AWS — document this problem in their upgrade guides. It is a well-known source of post-upgrade performance regression.
What PG18 changes
Postgres 18 introduces statistics export and import. From the release notes:
"Before PostgreSQL 18, these statistics didn't carry over on a major version upgrade, which could cause significant query performance degradations on busy systems until the ANALYZE finished running. PostgreSQL 18 introduces the ability to keep planner statistics through a major version upgrade, which helps an upgraded cluster reach expected performance more quickly after the upgrade."
But can we use PG18's new pg_dump --statistics-only to extract stats from an older server — say, PG16 or even PG14? Let's find out.
Let's test it
Start PG16, create some data, then use PG18's pg_dump to extract stats from PG16 and import them into an empty PG18 table. A shared volume lets both containers access the dump file:
mkdir -p /tmp/pgstats-demo
docker run -d --name pg16 \
-p 5416:5432 \
-v /tmp/pgstats-demo:/shared \
-e POSTGRES_PASSWORD=postgres \
postgres:16
until docker exec pg16 \
pg_isready -U postgres 2>/dev/null
do sleep 1; done
docker exec pg16 psql -U postgres -c "
create table demo as
select g as id, md5(g::text) as val
from generate_series(1, 10000000) g;
analyze demo;"
# PG18's pg_dump extracts stats from PG16
docker run --rm \
--network host \
-v /tmp/pgstats-demo:/shared \
postgres:18 \
bash -c "PGPASSWORD=postgres \
pg_dump -h localhost -p 5416 \
-U postgres --statistics-only \
postgres > /shared/pg16_stats.sql"
docker rm -f pg16
docker run -d --name pg18 \
-p 5418:5432 \
-v /tmp/pgstats-demo:/shared \
-e POSTGRES_PASSWORD=postgres \
postgres:18
until docker exec pg18 \
pg_isready -U postgres 2>/dev/null
do sleep 1; done
docker exec pg18 psql -U postgres \
-c "create table demo (id int, val text)"
docker exec pg18 \
bash -c "psql -U postgres < /shared/pg16_stats.sql"
Now verify — the PG18 table has zero rows, but the planner sees 10M:
select reltuples, relpages,
(select count(*) from demo) as actual_rows
from pg_class
where relname = 'demo';
reltuples | relpages | actual_rows
--------------+----------+-------------
9.999034e+06 | 83392 | 0
reltuples ≈ 10M on an empty table. Autovacuum would show 0. The stats came from PG16, via PG18's pg_dump. It works. And this is exactly what pg_upgrade does internally — automatically, no manual steps.
Why it works
The reason is simple once you see the architecture:
1. pg_upgrade always uses the NEW cluster's pg_dump.
In src/bin/pg_upgrade/dump.c, pg_upgrade invokes the target cluster's pg_dump binary (new_cluster.bindir) and points it at the old cluster (cluster_conn_opts(&old_cluster)). When you upgrade PG16 to PG18, it is PG18's pg_dump — the one with --statistics support — that connects to your PG16 server. Using a newer pg_dump against an older server is officially supported: "pg_dump can also dump from PostgreSQL servers older than its own version."
2. pg_dump reads from standard catalog views.
The statistics export reads from pg_class (for relpages, reltuples, relallvisible) and pg_stats (for per-column statistics like null_frac, n_distinct, avg_width, correlation, most_common_vals, histogram_bounds). These are standard system catalog views that have existed in Postgres for decades.
3. The restore functions only need to exist on the target.
pg_restore_relation_stats() and pg_restore_attribute_stats() are new PG18 functions. They run on the new cluster during restore. The old cluster never needs to know they exist.
The implementation (commit 1fd1bd871012 by Corey Huinker and Jeff Davis, with Nathan Bossart's follow-up pg_restore_extended_stats()) also handles version differences gracefully:
- Pre-v14 clusters:
reltuples = 0gets remapped to-1(the modern "never analyzed" convention) - Pre-v17 clusters: range type statistics are skipped — rebuilt on first
ANALYZE - Pre-v18 clusters:
relallfrozen(new in PG18) defaults to0— set by firstvacuum
Caveats
One important limitation: extended statistics created with create statistics (multivariate n_distinct, functional dependencies, multivariate MCV lists) are not preserved. Single-column statistics from pg_stats (including per-column most_common_vals and histogram_bounds) and relation-level statistics from pg_class are all carried over — it's only the multi-column extended stats that require re-analysis.
The official pg_upgrade documentation recommends a two-step post-upgrade process:
vacuumdb --all \
--analyze-in-stages \
--missing-stats-only
vacuumdb --all \
--analyze-only
The first command uses --missing-stats-only (also new in PG18) to quickly regenerate only the statistics that were not carried over — extended statistics and expression index stats. The second command re-analyzes everything, which is still worthwhile: the new major version may have improved statistics collection algorithms, so fresh stats can produce better plans than the carried-over ones.
Since the stats dump is metadata-only — no table data is read, just catalog queries — it adds seconds, not minutes, to the pg_upgrade process even for large schemas with thousands of tables.
Planning your next major upgrade
PG18's statistics preservation removes one of the biggest risks in major Postgres upgrades, and it works retroactively — you do not need to be on PG18 already to benefit. If you are on PG14, PG15, PG16, or PG17, upgrading to PG18 will preserve your planner statistics.
At PostgresAI, we specialize in zero-downtime, zero-data-loss, reversible major Postgres upgrades. Our methodology — battle-tested at GitLab scale (multi-TB databases, 100k+ TPS) — combines physical-to-logical replication with pause/resume capabilities. PG18's statistics preservation complements this approach perfectly: your upgraded cluster starts with optimal query plans from the first second.
Note: statistics preservation applies to pg_upgrade-based workflows. Logical replication upgrades (including our zero-downtime approach) still require a post-upgrade ANALYZE on the target. The key difference with our methodology: that ANALYZE runs while the old cluster is still serving production traffic, so there is never a moment when the planner is blind.
Learn about zero-downtime upgrades | See what customers say about our help
