04. Wait event analysis (Active Session History)

Active Session History (ASH) style dashboard for deep wait event analysis.

ASH by wait type

ASH by wait event

ASH by query

Purpose

Similar to AWS RDS Performance Insights or Oracle ASH, this dashboard provides:

Detailed breakdown of what sessions are waiting on
Historical wait event patterns
Correlation between waits and performance issues

When to use

Database is slow but CPU is not high
Investigating lock contention
Understanding I/O patterns
Diagnosing intermittent slowdowns

Prerequisites

This dashboard uses wait event data collected from pg_stat_activity by pgwatch. No additional extensions are required.

Key panels

Wait event distribution

What it shows:

Stacked area chart of wait events over time
Each color represents a wait event type

Wait event categories:

Category	Description	Common events
CPU	On-CPU processing	`CPU`
IO	Disk I/O operations	`DataFileRead`, `WALWrite`
LWLock	Internal PostgreSQL locks	`BufferContent`, `LockManager`
Lock	Row/table locks	`tuple`, `transactionid`
BufferPin	Buffer pinning	`BufferPin`
Activity	Background processes	`LogicalLauncherMain`
IPC	Inter-process communication	`BgWorkerStartup`

Top wait events

What it shows:

Ranked list of most common wait events
Percentage of total wait time

Interpretation guide:

Wait event	Meaning	Action
`CPU`	Query processing	Normal if workload-appropriate
`DataFileRead`	Reading from disk	Check shared_buffers, add memory
`DataFileWrite`	Writing to disk	Normal for writes
`WALWrite`	WAL I/O	Check storage speed
`LWLock:BufferContent`	Buffer contention	Scale up, partition tables, reduce bloat
`Lock:tuple`	Row lock wait	Check for lock conflicts
`Lock:transactionid`	Transaction wait	Long transactions blocking

Wait events by database

Breakdown showing which databases contribute most to waits.

Wait events by query

Links wait events to specific queries (when available).

Limitations:

Only captures queries active during sampling
Short queries may be missed
Use with 02. Query analysis for complete picture

Understanding wait events

CPU waits

High CPU wait is normal for compute-bound workloads. Investigate if:

CPU wait is disproportionate to actual CPU usage
Query throughput is lower than expected

I/O waits

Common I/O wait events:

Event	Cause	Solution
`DataFileRead`	Reading data not in cache	Increase `shared_buffers`, add indexes
`DataFileExtend`	Growing data files	Pre-extend files, faster storage
`DataFileFlush`	Flushing dirty pages	Tune `checkpoint_*` settings
`WALSync`	Syncing WAL	Faster WAL storage, adjust `commit_delay`

Lock waits

Lock wait events indicate heavyweight lock contention:

Event	Cause	Solution
`Lock:tuple`	Row-level lock wait	Avoid multiple sessions modifying same rows; use `FOR UPDATE SKIP LOCKED` or `NOWAIT`
`Lock:transactionid`	Waiting for transaction	Find and address long transactions
`Lock:relation`	Table-level lock	Use low `lock_timeout` with retries for DDL. See Zero-downtime schema migrations

LWLock waits

Lightweight locks are internal to PostgreSQL:

Event	Cause	Solution
`LWLock:BufferContent`	Concurrent access to same buffer page	Scale up instance, partition tables, reduce foreign keys, address bloat with `VACUUM FULL` or `pg_repack`
`LWLock:LockManager`	Fast-path lock limit exceeded (16 locks/backend)	Enable partition pruning, reduce indexes, use connection pooler. See Postgres Marathon series

Variables

Variable	Purpose
`cluster_name`	Cluster filter
`node_name`	Node filter
`db_name`	Database filter
`wait_event_type`	Filter by event type

Overview — 01. Node overview
Lock details — 13. Lock contention
Query correlation — 02. Query analysis

Troubleshooting

No wait event data

Verify pgwatch is collecting metrics:

docker compose logs pgwatch | grep -i wait

Check VictoriaMetrics has wait event data:

curl 'http://localhost:8428/api/v1/query?query=pg_stat_activity_count'

Ensure pg_stat_activity is accessible to the monitoring user

Wait events don't match RDS Performance Insights

Different sampling rates and methodologies may cause variations. PostgresAI monitoring uses:

Default: 10ms sampling interval
Aggregation into time buckets for visualization

RDS Performance Insights may use different intervals.

"Other" category too large

The "Other" category contains infrequent or miscellaneous waits. If large:

Check for uncommon extensions
Review background worker activity
Consider if sampling rate is appropriate

ASH by wait type​

ASH by wait event​

ASH by query​

Purpose​

When to use​

Prerequisites​

Key panels​

Wait event distribution​

Top wait events​

Wait events by database​

Wait events by query​

Understanding wait events​

CPU waits​

I/O waits​

Lock waits​

LWLock waits​

Variables​

Related dashboards​

Troubleshooting​

No wait event data​

Wait events don't match RDS Performance Insights​

"Other" category too large​