Skip to main content
View rawEdit

04. Wait event analysis (Active Session History)

Active Session History (ASH) style dashboard for deep wait event analysis.

ASH by wait type​

ASH by wait type

ASH by wait event​

ASH by wait event

ASH by query​

ASH by query

Purpose​

Similar to AWS RDS Performance Insights or Oracle ASH, this dashboard provides:

  • Detailed breakdown of what sessions are waiting on
  • Historical wait event patterns
  • Correlation between waits and performance issues

When to use​

  • Database is slow but CPU is not high
  • Investigating lock contention
  • Understanding I/O patterns
  • Diagnosing intermittent slowdowns

Prerequisites​

This dashboard uses wait event data collected from pg_stat_activity by pgwatch. No additional extensions are required.

Key panels​

Wait event distribution​

What it shows:

  • Stacked area chart of wait events over time
  • Each color represents a wait event type

Wait event categories:

CategoryDescriptionCommon events
CPUOn-CPU processingCPU
IODisk I/O operationsDataFileRead, WALWrite
LWLockInternal PostgreSQL locksBufferContent, LockManager
LockRow/table lockstuple, transactionid
BufferPinBuffer pinningBufferPin
ActivityBackground processesLogicalLauncherMain
IPCInter-process communicationBgWorkerStartup

Top wait events​

What it shows:

  • Ranked list of most common wait events
  • Percentage of total wait time

Interpretation guide:

Wait eventMeaningAction
CPUQuery processingNormal if workload-appropriate
DataFileReadReading from diskCheck shared_buffers, add memory
DataFileWriteWriting to diskNormal for writes
WALWriteWAL I/OCheck storage speed
LWLock:BufferContentBuffer contentionScale up, partition tables, reduce bloat
Lock:tupleRow lock waitCheck for lock conflicts
Lock:transactionidTransaction waitLong transactions blocking

Wait events by database​

Breakdown showing which databases contribute most to waits.

Wait events by query​

Links wait events to specific queries (when available).

Limitations:

  • Only captures queries active during sampling
  • Short queries may be missed
  • Use with 02. Query analysis for complete picture

Understanding wait events​

CPU waits​

High CPU wait is normal for compute-bound workloads. Investigate if:

  • CPU wait is disproportionate to actual CPU usage
  • Query throughput is lower than expected

I/O waits​

Common I/O wait events:

EventCauseSolution
DataFileReadReading data not in cacheIncrease shared_buffers, add indexes
DataFileExtendGrowing data filesPre-extend files, faster storage
DataFileFlushFlushing dirty pagesTune checkpoint_* settings
WALSyncSyncing WALFaster WAL storage, adjust commit_delay

Lock waits​

Lock wait events indicate heavyweight lock contention:

EventCauseSolution
Lock:tupleRow-level lock waitAvoid multiple sessions modifying same rows; use FOR UPDATE SKIP LOCKED or NOWAIT
Lock:transactionidWaiting for transactionFind and address long transactions
Lock:relationTable-level lockUse low lock_timeout with retries for DDL. See Zero-downtime schema migrations

LWLock waits​

Lightweight locks are internal to PostgreSQL:

EventCauseSolution
LWLock:BufferContentConcurrent access to same buffer pageScale up instance, partition tables, reduce foreign keys, address bloat with VACUUM FULL or pg_repack
LWLock:LockManagerFast-path lock limit exceeded (16 locks/backend)Enable partition pruning, reduce indexes, use connection pooler. See Postgres Marathon series

Variables​

VariablePurpose
cluster_nameCluster filter
node_nameNode filter
db_nameDatabase filter
wait_event_typeFilter by event type

Troubleshooting​

No wait event data​

  1. Verify pgwatch is collecting metrics:

    docker compose logs pgwatch | grep -i wait
  2. Check VictoriaMetrics has wait event data:

    curl 'http://localhost:8428/api/v1/query?query=pg_stat_activity_count'
  3. Ensure pg_stat_activity is accessible to the monitoring user

Wait events don't match RDS Performance Insights​

Different sampling rates and methodologies may cause variations. PostgresAI monitoring uses:

  • Default: 10ms sampling interval
  • Aggregation into time buckets for visualization

RDS Performance Insights may use different intervals.

"Other" category too large​

The "Other" category contains infrequent or miscellaneous waits. If large:

  1. Check for uncommon extensions
  2. Review background worker activity
  3. Consider if sampling rate is appropriate