04. Wait event analysis (Active Session History)
Active Session History (ASH) style dashboard for deep wait event analysis.
ASH by wait type

ASH by wait event

ASH by query

Purpose
Similar to AWS RDS Performance Insights or Oracle ASH, this dashboard provides:
- Detailed breakdown of what sessions are waiting on
- Historical wait event patterns
- Correlation between waits and performance issues
When to use
- Database is slow but CPU is not high
- Investigating lock contention
- Understanding I/O patterns
- Diagnosing intermittent slowdowns
Prerequisites
This dashboard uses wait event data collected from pg_stat_activity by pgwatch. No additional extensions are required.
Key panels
Wait event distribution
What it shows:
- Stacked area chart of wait events over time
- Each color represents a wait event type
Wait event categories:
| Category | Description | Common events |
|---|---|---|
| CPU | On-CPU processing | CPU |
| IO | Disk I/O operations | DataFileRead, WALWrite |
| LWLock | Internal PostgreSQL locks | BufferContent, LockManager |
| Lock | Row/table locks | tuple, transactionid |
| BufferPin | Buffer pinning | BufferPin |
| Activity | Background processes | LogicalLauncherMain |
| IPC | Inter-process communication | BgWorkerStartup |
Top wait events
What it shows:
- Ranked list of most common wait events
- Percentage of total wait time
Interpretation guide:
| Wait event | Meaning | Action |
|---|---|---|
CPU | Query processing | Normal if workload-appropriate |
DataFileRead | Reading from disk | Check shared_buffers, add memory |
DataFileWrite | Writing to disk | Normal for writes |
WALWrite | WAL I/O | Check storage speed |
LWLock:BufferContent | Buffer contention | Scale up, partition tables, reduce bloat |
Lock:tuple | Row lock wait | Check for lock conflicts |
Lock:transactionid | Transaction wait | Long transactions blocking |
Wait events by database
Breakdown showing which databases contribute most to waits.
Wait events by query
Links wait events to specific queries (when available).
Limitations:
- Only captures queries active during sampling
- Short queries may be missed
- Use with 02. Query analysis for complete picture
Understanding wait events
CPU waits
High CPU wait is normal for compute-bound workloads. Investigate if:
- CPU wait is disproportionate to actual CPU usage
- Query throughput is lower than expected
I/O waits
Common I/O wait events:
| Event | Cause | Solution |
|---|---|---|
DataFileRead | Reading data not in cache | Increase shared_buffers, add indexes |
DataFileExtend | Growing data files | Pre-extend files, faster storage |
DataFileFlush | Flushing dirty pages | Tune checkpoint_* settings |
WALSync | Syncing WAL | Faster WAL storage, adjust commit_delay |
Lock waits
Lock wait events indicate heavyweight lock contention:
| Event | Cause | Solution |
|---|---|---|
Lock:tuple | Row-level lock wait | Avoid multiple sessions modifying same rows; use FOR UPDATE SKIP LOCKED or NOWAIT |
Lock:transactionid | Waiting for transaction | Find and address long transactions |
Lock:relation | Table-level lock | Use low lock_timeout with retries for DDL. See Zero-downtime schema migrations |
LWLock waits
Lightweight locks are internal to PostgreSQL:
| Event | Cause | Solution |
|---|---|---|
LWLock:BufferContent | Concurrent access to same buffer page | Scale up instance, partition tables, reduce foreign keys, address bloat with VACUUM FULL or pg_repack |
LWLock:LockManager | Fast-path lock limit exceeded (16 locks/backend) | Enable partition pruning, reduce indexes, use connection pooler. See Postgres Marathon series |
Variables
| Variable | Purpose |
|---|---|
cluster_name | Cluster filter |
node_name | Node filter |
db_name | Database filter |
wait_event_type | Filter by event type |
Related dashboards
- Overview — 01. Node overview
- Lock details — 13. Lock contention
- Query correlation — 02. Query analysis
Troubleshooting
No wait event data
-
Verify pgwatch is collecting metrics:
docker compose logs pgwatch | grep -i wait -
Check VictoriaMetrics has wait event data:
curl 'http://localhost:8428/api/v1/query?query=pg_stat_activity_count' -
Ensure
pg_stat_activityis accessible to the monitoring user
Wait events don't match RDS Performance Insights
Different sampling rates and methodologies may cause variations. PostgresAI monitoring uses:
- Default: 10ms sampling interval
- Aggregation into time buckets for visualization
RDS Performance Insights may use different intervals.
"Other" category too large
The "Other" category contains infrequent or miscellaneous waits. If large:
- Check for uncommon extensions
- Review background worker activity
- Consider if sampling rate is appropriate