Dashboard overview

PostgresAI monitoring includes 14 pre-built Grafana dashboards designed for expert-level PostgreSQL troubleshooting.

Dashboard categories

Triage and overview

#	Dashboard	Purpose
01	Node overview	High-level node health, wait events, sessions
02	Query analysis	Top-N queries by various metrics
03	Single query	Deep-dive into specific queryid

Wait events and locks

#	Dashboard	Purpose
04	Wait events	Active session history (ASH-style)
13	Lock contention	Lock waits and blocking chains

Storage and maintenance

#	Dashboard	Purpose
05	Backups	Backup status and WAL archiving
07	Autovacuum & xmin horizon	Autovacuum, dead tuples, bloat, and xmin-horizon root cause analysis
08	Table stats	Aggregated table metrics
09	Single table	Deep-dive into specific table
10	Index health	Index usage and bloat
11	Single index	Deep-dive into specific index
12	SLRU	SLRU cache statistics

Replication and HA

#	Dashboard	Purpose
06	Replication	Replication lag and slot status

I/O

#	Dashboard	Purpose
14	I/O statistics	I/O by backend type (`pg_stat_io`, PostgreSQL 16+)

Stack health

#	Dashboard	Purpose
--	Self-monitoring	Monitoring stack health

Common variables

Most dashboards share these filter variables:

Variable	Purpose	Example
`cluster_name`	Cluster identifier	`production`, `staging`
`node_name`	Node within cluster	`primary`, `replica-1`
`db_name`	Database filter	`myapp`, `All`

Exceptions:

06. Replication and Self-monitoring have no template variables at all (06 is a placeholder; self-monitoring reports on the single monitoring instance).
14. I/O statistics has only cluster_name and node_name (no database filter — pg_stat_io is instance-level).
11. Single index names its database variable datname (label "DB name") rather than db_name.

Recommended workflow

Incident response

Start with 01. Node overview
- Check wait event distribution
- Look for session count anomalies
- Note TPS/QPS patterns
Identify the bottleneck
- High CPU wait events — Check queries (02)
- High IO wait events — Check disk activity, queries
- High LWLock — Check specific lock type (13)
Drill down
- Use 02. Query analysis to find problematic queries
- Use 03. Single query for detailed metrics on specific queryid

Routine monitoring

Task	Dashboard	What to look for
Query review	02. Query analysis	New slow queries, regression
Index health	10. Index health	Unused indexes, bloat
Table health	08. Table stats	Bloat, sequential scans
Vacuum status	07. Autovacuum & xmin horizon	Dead tuple accumulation, xmin-horizon blockers
I/O attribution	14. I/O statistics	Reads/writes by backend type (PG16+)

Legend options

02. Query analysis has a Query texts variable (legend_label) that switches how query texts are rendered in legends:

Option	Value	Shows
Smart truncation (default)	`displayname_long`	Query text with smart truncation
Raw texts	`displayname_raw_long`	Full raw query text

Select the format using the Query texts variable at the top of the dashboard.

Top-N filtering

Many dashboards limit each panel to the top-N series (for example, the top_n variable on 02. Query analysis offers 5, 10, 15, 20, 50, 100, 500). These panels use plain PromQL topk($top_n, ...), which keeps only the highest-ranked series and drops the long tail — it does not sum the remainder into a separate bucket. The per-relation dashboards (08. Table stats, 10. Index health) use the same topk($top_n, ...) approach.

If the objects you care about are not visible, raise top_n or drill into the corresponding single-object dashboard to see the detail.

Time range tips

Dashboards default to a now-1h time range in 0.15, tuned for readable, recent patterns out of the box.

Incident investigation: The default now-1h shows recent patterns; widen as needed
Trend analysis: Use 24h-7d for capacity planning
Comparison: Use "Compare to" feature for week-over-week analysis

Next steps

01. Node overview — Start here for incident response
02. Query analysis — Top queries breakdown

Dashboard categories​

Triage and overview​

Wait events and locks​

Storage and maintenance​

Replication and HA​

I/O​

Stack health​

Common variables​

Recommended workflow​

Incident response​

Routine monitoring​

Legend options​

Top-N filtering​

Time range tips​

Next steps​