Skip to main content
View rawEdit

Dashboard overview

PostgresAI monitoring includes 14 pre-built Grafana dashboards designed for expert-level PostgreSQL troubleshooting.

Dashboard categories

Triage and overview

#DashboardPurpose
01Node overviewHigh-level node health, wait events, sessions
02Query analysisTop-N queries by various metrics
03Single queryDeep-dive into specific queryid

Wait events and locks

#DashboardPurpose
04Wait eventsActive session history (ASH-style)
13Lock contentionLock waits and blocking chains

Storage and maintenance

#DashboardPurpose
05BackupsBackup status and WAL archiving
07AutovacuumVacuum progress and bloat
08Table statsAggregated table metrics
09Single tableDeep-dive into specific table
10Index healthIndex usage and bloat
11Single indexDeep-dive into specific index
12SLRUSLRU cache statistics

Replication and HA

#DashboardPurpose
06ReplicationReplication lag and slot status

Stack health

#DashboardPurpose
--Self-monitoringMonitoring stack health

Common variables

All dashboards share these filter variables:

VariablePurposeExample
cluster_nameCluster identifierproduction, staging
node_nameNode within clusterprimary, replica-1
db_nameDatabase filtermyapp, All

Incident response

  1. Start with 01. Node overview

    • Check wait event distribution
    • Look for session count anomalies
    • Note TPS/QPS patterns
  2. Identify the bottleneck

    • High CPU wait events — Check queries (02)
    • High IO wait events — Check disk activity, queries
    • High LWLock — Check specific lock type (13)
  3. Drill down

    • Use 02. Query analysis to find problematic queries
    • Use 03. Single query for detailed metrics on specific queryid

Routine monitoring

TaskDashboardWhat to look for
Query review02. Query analysisNew slow queries, regression
Index health10. Index healthUnused indexes, bloat
Table health08. Table statsBloat, sequential scans
Vacuum status07. AutovacuumDead tuple accumulation

Legend options

Most query-related dashboards support multiple legend formats:

FormatShowsUse case
queryidNumeric ID onlyCompact view
displaynameTruncated queryDefault
displayname_longFull query with contextDebugging

Select the format using the Query texts variable at the top of dashboards.

Time range tips

  • Incident investigation: Start with 15m-1h to see recent patterns
  • Trend analysis: Use 24h-7d for capacity planning
  • Comparison: Use "Compare to" feature for week-over-week analysis

Next steps