Skip to main content
View rawEdit

Database- and cluster-level metrics

PostgreSQL system-level metrics from various pg_stat_* views.

Data sources​

ViewDescription
pg_stat_databaseDatabase-level aggregates
pg_stat_bgwriterBackground writer statistics
pg_stat_archiverWAL archiver status
pg_stat_replicationReplication status
pg_settingsConfiguration parameters

Database metrics​

Transaction metrics​

MetricTypeDescription
pg_stat_database_xact_commit_totalCounterTransactions committed
pg_stat_database_xact_rollback_totalCounterTransactions rolled back
pg_stat_database_deadlocks_totalCounterDeadlocks detected
pg_stat_database_conflicts_totalCounterRecovery conflicts (replicas)

Connection metrics​

MetricTypeDescription
pg_stat_database_numbackendsGaugeActive connections
pg_settings_max_connectionsGaugeMaximum allowed connections

Buffer metrics​

MetricTypeDescription
pg_stat_database_blks_read_totalCounterBlocks read from disk
pg_stat_database_blks_hit_totalCounterBlocks found in buffer cache

Temporary file metrics​

MetricTypeDescription
pg_stat_database_temp_files_totalCounterTemporary files created
pg_stat_database_temp_bytes_totalCounterTemporary file bytes written

Background writer metrics​

MetricTypeDescription
pg_stat_bgwriter_checkpoints_timed_totalCounterScheduled checkpoints
pg_stat_bgwriter_checkpoints_req_totalCounterRequested checkpoints
pg_stat_bgwriter_checkpoint_write_time_seconds_totalCounterCheckpoint write time
pg_stat_bgwriter_checkpoint_sync_time_seconds_totalCounterCheckpoint sync time
pg_stat_bgwriter_buffers_checkpoint_totalCounterBuffers written during checkpoints
pg_stat_bgwriter_buffers_clean_totalCounterBuffers written by background writer
pg_stat_bgwriter_buffers_backend_totalCounterBuffers written by backends
pg_stat_bgwriter_buffers_alloc_totalCounterBuffers allocated

WAL archiver metrics​

MetricTypeDescription
pg_stat_archiver_archived_count_totalCounterWAL files archived
pg_stat_archiver_failed_count_totalCounterFailed archive attempts
pg_stat_archiver_last_archived_timeGaugeLast successful archive timestamp

Replication metrics​

MetricTypeDescription
pg_stat_replication_sent_lsnGaugeWAL sent to replica
pg_stat_replication_write_lsnGaugeWAL written on replica
pg_stat_replication_flush_lsnGaugeWAL flushed on replica
pg_stat_replication_replay_lsnGaugeWAL replayed on replica
pg_replication_lag_bytesGaugeReplication lag in bytes
pg_replication_lag_secondsGaugeEstimated replication lag

Labels​

LabelDescriptionExample
datnameDatabase namemyapp
cluster_nameCluster identifierproduction
node_nameNode identifierprimary

Common queries​

Transactions per second​

sum(rate(pg_stat_database_xact_commit_total[5m]))

Rollback ratio​

rate(pg_stat_database_xact_rollback_total[5m])
/
(rate(pg_stat_database_xact_commit_total[5m]) + rate(pg_stat_database_xact_rollback_total[5m]))

Buffer cache hit ratio​

sum(rate(pg_stat_database_blks_hit_total[5m]))
/
(sum(rate(pg_stat_database_blks_hit_total[5m])) + sum(rate(pg_stat_database_blks_read_total[5m])))

Connection utilization​

sum(pg_stat_database_numbackends)
/
pg_settings_max_connections

Checkpoint frequency​

rate(pg_stat_bgwriter_checkpoints_timed_total[5m])
+
rate(pg_stat_bgwriter_checkpoints_req_total[5m])

Backend buffer writes (problematic)​

High values indicate checkpoint tuning needed:

rate(pg_stat_bgwriter_buffers_backend_total[5m])
/
(
rate(pg_stat_bgwriter_buffers_checkpoint_total[5m])
+
rate(pg_stat_bgwriter_buffers_clean_total[5m])
+
rate(pg_stat_bgwriter_buffers_backend_total[5m])
)

Replication lag​

pg_replication_lag_seconds

WAL archive status​

time() - pg_stat_archiver_last_archived_time

Dashboard usage​

These metrics are used in:

Interpreting system metrics​

Buffer cache hit ratio​

Target: > 99% for frequently accessed data

select
datname,
blks_hit,
blks_read,
round(100.0 * blks_hit / nullif(blks_hit + blks_read, 0), 2) as hit_ratio
from pg_stat_database
where datname not like 'template%';

Low hit ratio causes:

  • shared_buffers too small
  • Working set larger than memory
  • Cold cache after restart

Checkpoint behavior​

Healthy checkpoint pattern:

  • Mostly timed checkpoints (scheduled)
  • Low requested checkpoints (forced by WAL volume)
  • Minimal backend buffer writes
select
checkpoints_timed,
checkpoints_req,
buffers_checkpoint,
buffers_clean,
buffers_backend
from pg_stat_bgwriter;

Temporary files​

High temporary file usage indicates:

  • work_mem too low
  • Complex queries with large sorts/joins
  • Hash joins spilling to disk
select
datname,
temp_files,
pg_size_pretty(temp_bytes) as temp_size
from pg_stat_database
where temp_files > 0;

Troubleshooting​

Metrics show zero after restart​

Background writer and database statistics reset on restart. This is expected.

Replication lag metrics missing​

Verify:

  1. Replication is configured and active
  2. Monitoring connects to primary (not replica)
  3. pg_stat_replication is accessible:
    select * from pg_stat_replication;

Connection count doesn't match application​

numbackends includes:

  • Application connections
  • Monitoring connections
  • Superuser reserved connections
  • Replication connections

Check pg_stat_activity for breakdown:

select backend_type, count(*)
from pg_stat_activity
group by backend_type;