PostgresAI monitoring includes 14 pre-built Grafana dashboards designed for expert-level PostgreSQL troubleshooting.
Dashboard categories​
Triage and overview​
Wait events and locks​
Storage and maintenance​
Replication and HA​
| # | Dashboard | Purpose |
|---|
| 06 | Replication | Replication lag and slot status |
Stack health​
Common variables​
All dashboards share these filter variables:
| Variable | Purpose | Example |
|---|
cluster_name | Cluster identifier | production, staging |
node_name | Node within cluster | primary, replica-1 |
db_name | Database filter | myapp, All |
Recommended workflow​
Incident response​
-
Start with 01. Node overview
- Check wait event distribution
- Look for session count anomalies
- Note TPS/QPS patterns
-
Identify the bottleneck
- High CPU wait events — Check queries (02)
- High IO wait events — Check disk activity, queries
- High LWLock — Check specific lock type (13)
-
Drill down
- Use 02. Query analysis to find problematic queries
- Use 03. Single query for detailed metrics on specific queryid
Routine monitoring​
| Task | Dashboard | What to look for |
|---|
| Query review | 02. Query analysis | New slow queries, regression |
| Index health | 10. Index health | Unused indexes, bloat |
| Table health | 08. Table stats | Bloat, sequential scans |
| Vacuum status | 07. Autovacuum | Dead tuple accumulation |
Legend options​
Most query-related dashboards support multiple legend formats:
| Format | Shows | Use case |
|---|
queryid | Numeric ID only | Compact view |
displayname | Truncated query | Default |
displayname_long | Full query with context | Debugging |
Select the format using the Query texts variable at the top of dashboards.
Time range tips​
- Incident investigation: Start with 15m-1h to see recent patterns
- Trend analysis: Use 24h-7d for capacity planning
- Comparison: Use "Compare to" feature for week-over-week analysis
Next steps​