Skip to main content
View rawEdit

04. Wait event analysis (Active Session History)

Active Session History (ASH) style dashboard for deep wait event analysis.

ASH by wait type

ASH by wait type

ASH by wait event

ASH by wait event

ASH by query

ASH by query

Purpose

Similar to AWS RDS Performance Insights or Oracle ASH, this dashboard provides:

  • Detailed breakdown of what sessions are waiting on
  • Historical wait event patterns
  • Correlation between waits and performance issues

When to use

  • Database is slow but CPU is not high
  • Investigating lock contention
  • Understanding I/O patterns
  • Diagnosing intermittent slowdowns

Prerequisites

This dashboard uses wait event data collected from pg_stat_activity by pgwatch. No additional extensions are required.

Key panels

Wait event distribution

What it shows:

  • Stacked area chart of wait events over time
  • Each color represents a wait event type

Wait event categories:

CategoryDescriptionCommon events
CPUOn-CPU processingCPU
IODisk I/O operationsDataFileRead, WALWrite
LWLockInternal PostgreSQL locksBufferContent, LockManager
LockRow/table lockstuple, transactionid
BufferPinBuffer pinningBufferPin
ActivityBackground processesLogicalLauncherMain
IPCInter-process communicationBgWorkerStartup

Top wait events

What it shows:

  • Ranked list of most common wait events
  • Percentage of total wait time

Interpretation guide:

Wait eventMeaningAction
CPUQuery processingNormal if workload-appropriate
DataFileReadReading from diskCheck shared_buffers, add memory
DataFileWriteWriting to diskNormal for writes
WALWriteWAL I/OCheck storage speed
LWLock:BufferContentBuffer contentionScale up, partition tables, reduce bloat
Lock:tupleRow lock waitCheck for lock conflicts
Lock:transactionidTransaction waitLong transactions blocking

Wait events by database

Breakdown showing which databases contribute most to waits.

Wait events by query

Links wait events to specific queries (when available).

Limitations:

  • Only captures queries active during sampling
  • Short queries may be missed
  • Use with 02. Query analysis for complete picture

Understanding wait events

CPU waits

High CPU wait is normal for compute-bound workloads. Investigate if:

  • CPU wait is disproportionate to actual CPU usage
  • Query throughput is lower than expected

I/O waits

Common I/O wait events:

EventCauseSolution
DataFileReadReading data not in cacheIncrease shared_buffers, add indexes
DataFileExtendGrowing data filesPre-extend files, faster storage
DataFileFlushFlushing dirty pagesTune checkpoint_* settings
WALSyncSyncing WALFaster WAL storage, adjust commit_delay

Lock waits

Lock wait events indicate heavyweight lock contention:

EventCauseSolution
Lock:tupleRow-level lock waitAvoid multiple sessions modifying same rows; use FOR UPDATE SKIP LOCKED or NOWAIT
Lock:transactionidWaiting for transactionFind and address long transactions
Lock:relationTable-level lockUse low lock_timeout with retries for DDL. See Zero-downtime schema migrations

LWLock waits

Lightweight locks are internal to PostgreSQL:

EventCauseSolution
LWLock:BufferContentConcurrent access to same buffer pageScale up instance, partition tables, reduce foreign keys, address bloat with VACUUM FULL or pg_repack
LWLock:LockManagerFast-path lock limit exceeded (16 locks/backend)Enable partition pruning, reduce indexes, use connection pooler. See Postgres Marathon series

Variables

VariablePurpose
cluster_nameCluster filter
node_nameNode filter
db_nameDatabase filter
wait_event_typeFilter by event type

Troubleshooting

No wait event data

  1. Verify pgwatch is collecting metrics:

    docker compose logs pgwatch | grep -i wait
  2. Check VictoriaMetrics has wait event data:

    curl 'http://localhost:8428/api/v1/query?query=pg_stat_activity_count'
  3. Ensure pg_stat_activity is accessible to the monitoring user

Wait events don't match RDS Performance Insights

Different sampling rates and methodologies may cause variations. PostgresAI monitoring uses:

  • Default: 10ms sampling interval
  • Aggregation into time buckets for visualization

RDS Performance Insights may use different intervals.

"Other" category too large

The "Other" category contains infrequent or miscellaneous waits. If large:

  1. Check for uncommon extensions
  2. Review background worker activity
  3. Consider if sampling rate is appropriate