Skip to main content
View rawEdit

Architecture

Deep-dive into PostgresAI monitoring system components and data flow.

System overview​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PostgresAI Monitoring β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ PostgreSQL β”‚ β”‚ PostgreSQL β”‚ β”‚ PostgreSQL β”‚ β”‚
β”‚ β”‚ Cluster A β”‚ β”‚ Cluster B β”‚ β”‚ Cluster C β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ pgwatch β”‚ β”‚
β”‚ β”‚ (collector)β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ Prometheus format β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚VictoriaMetricsβ”‚ β”‚
β”‚ β”‚ (storage) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ PromQL β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Grafana β”‚ β”‚
β”‚ β”‚(visualization)β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components​

pgwatch β€” Metrics collector​

Purpose: Collect PostgreSQL metrics and expose in Prometheus format

Key functions:

  • Execute SQL queries against PostgreSQL
  • Transform results to Prometheus metrics
  • Expose the /pgwatch metrics endpoint on :9091 (Prometheus sink)

Configuration:

SettingDefaultDescription
Scrape interval30sVictoriaMetrics scrapes the pgwatch-prometheus job every 30s (the 15s global default is overridden for this job; the separate query-info job β€” metrics_path: /query_info_metrics β€” runs every 300s)
Collection intervalper-metricEach metric group has its own interval in metrics.yml (most 30s; pg_stat_activity/wait_events 15s)

Collected data sources:

ViewMetrics
pg_stat_statementsQuery performance
pg_stat_activitySession state
wait_events (from pg_stat_activity)Wait event sampling
pg_stat_all_tables / table_statsTable access patterns
pg_stat_all_indexesIndex usage
db_stats (from pg_stat_database)Database-level stats
bgwriterCheckpoint behavior

VictoriaMetrics β€” time-series database​

Purpose: Store and query metrics

Key functions:

  • Ingest metrics from pgwatch
  • Compress and store time-series data
  • Execute PromQL queries

Performance characteristics:

AspectVictoriaMetricsPrometheus
Compression10x betterBaseline
Query speed2-5x fasterBaseline
Memory usage3-5x lowerBaseline
High availabilityBuilt-in clusteringFederation

Storage model:

In this deployment VictoriaMetrics is started with -storageDataPath=/victoria-metrics-data, and the victoria_metrics_data Docker volume is mounted at /victoria-metrics-data. The on-disk layout under that path follows VictoriaMetrics' standard structure (recent vs. historical data parts, a label index, and optional snapshots).

Grafana β€” Visualization​

Purpose: Dashboard and alerting UI

Key functions:

  • Render time-series charts
  • Dashboard templating with variables
  • Unified alerting

Dashboard structure:

PostgresAI dashboards:
β”œβ”€β”€ 01. Node overview (cluster-level)
β”œβ”€β”€ 02. Query analysis (top-N queries)
β”œβ”€β”€ 03. Single query (query deep-dive)
β”œβ”€β”€ 04. Wait events (session analysis)
β”œβ”€β”€ 05. Backups (WAL archiving)
β”œβ”€β”€ 06. Replication (lag monitoring)
β”œβ”€β”€ 07. Autovacuum (vacuum status)
β”œβ”€β”€ 08. Table stats (table analysis)
β”œβ”€β”€ 09. Single table (table deep-dive)
β”œβ”€β”€ 10. Index health (index analysis)
β”œβ”€β”€ 11. Single index (index deep-dive)
β”œβ”€β”€ 12. SLRU (cache stats)
β”œβ”€β”€ 13. Lock contention (lock waits)
β”œβ”€β”€ 14. I/O statistics (pg_stat_io, PG16+)
└── Self-monitoring (stack health)

Data flow​

Collection flow​

1. pgwatch connects to PostgreSQL
└── Uses monitoring user credentials
└── Executes metric collection queries

2. Query results transformed to metrics
└── Column values β†’ metric values
└── Column names β†’ labels

3. Metrics exposed on the `/pgwatch` endpoint (`:9091`)
└── Prometheus exposition format
└── Timestamp attached

4. VictoriaMetrics scrapes the pgwatch-prometheus sink
└── HTTP GET pgwatch-prometheus:9091/pgwatch
└── `pgwatch-prometheus` job scrape_interval: 30s (scrape_timeout 25s)

5. Metrics stored in VictoriaMetrics
└── Compressed time-series storage
└── Indexed by labels

Query flow​

1. User opens Grafana dashboard
└── Dashboard loads panel queries

2. Grafana sends PromQL to VictoriaMetrics
└── Variables substituted
└── Time range applied

3. VictoriaMetrics executes query
└── Index lookup by labels
└── Data retrieval from storage
└── Aggregation/calculation

4. Results returned to Grafana
└── Time series data
└── Rendered as charts

Metric naming​

Convention​

pgwatch exports series as pgwatch_<metric-group>_<column>. The Prometheus metric type is driven by each metric group's gauges: list in config/pgwatch-prometheus/metrics.yml: a column is emitted as a Prometheus gauge only if its group lists it (or uses gauges: ['*']); otherwise it is emitted as a counter. Note this is the exported type, not the PostgreSQL semantics β€” the db_stats and pg_stat_statements groups use gauges: ['*'] / explicit gauge lists, so their cumulative columns (e.g. xact_commit, exec_time_total) are exported as gauges even though they only ever increase. Cumulative columns in the pg_stat_database family are also not _total-suffixed.

pgwatch_<metric-group>_<column>

Examples (Type = the exporter's emitted Prometheus type):
pgwatch_db_stats_xact_commit # Gauge (transactions committed; db_stats uses gauges: ['*'])
pgwatch_db_stats_numbackends # Gauge (current backends)
pgwatch_pg_stat_statements_exec_time_total # Gauge (total exec time, ms; listed in pg_stat_statements gauges)

Labels​

The cluster label is cluster (set from custom_tags.cluster). cluster_name is only the Grafana template variable; dashboard filters select with cluster="$cluster_name".

pgwatch_<metric-group>_<column>{
cluster="production",
node_name="primary",
datname="myapp",
schemaname="public",
relname="users"
}

Storage requirements​

Calculation​

Storage = metrics_per_second Γ— bytes_per_sample Γ— retention_seconds

Typical values:
- metrics_per_second: 50-200 per database
- bytes_per_sample: 3-5 bytes (VictoriaMetrics compressed)
- retention: 1,209,600 seconds (14 days)

Example: 5 databases, 14-day retention
= 5 Γ— 100 Γ— 1.5 Γ— 1,209,600
= 907,200,000 bytes β‰ˆ 907 MB (β‰ˆ 865 MiB)

Scaling factors​

FactorImpact
More databasesLinear increase
More tables/indexesSublinear (only active tracked)
Longer retentionLinear increase
Shorter scrape intervalLinear increase

High availability​

HA architecture​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Load Balancer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
β”‚pgwatch-1β”‚ β”‚pgwatch-2β”‚ β”‚pgwatch-3β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ remote_write
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ VictoriaMetrics Cluster β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚vmstorage1β”‚ β”‚vmstorage2β”‚ β”‚vmstorage3β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
β”‚ Grafana β”‚
β”‚ (HA) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Failure modes​

Component failureImpactRecovery
Single pgwatchPartial data lossAutomatic failover
Single vmstorageNo data loss (replication)Automatic
All pgwatchCollection stopsManual restart
All vmstorageQuery unavailableRestore from backup
GrafanaUI unavailableLoad balancer failover

Data privacy β€” metadata only​

PostgresAI monitoring collects only database metadata β€” no actual data or query parameters are ever accessed.

Collected data types​

Data typeExampleStorage location
Database statisticsConnections, transactions, cache hit ratioPrometheus (VictoriaMetrics)
Normalized queriesselect * from users where id = $1PostgreSQL sink
Wait eventsCPU, IO, Lock, LWLockPrometheus (VictoriaMetrics)
Table statisticsRow count, dead tuples, last vacuumPrometheus (VictoriaMetrics)
Index statisticsSize, scans, tuples readPrometheus (VictoriaMetrics)
Column statisticsFrom pg_statistic for bloat estimatesPrometheus (VictoriaMetrics)

NOT collected​

  • Actual table data (row contents)
  • Query parameter values ($1, $2 remain as placeholders)
  • Application secrets or credentials
  • Connection passwords

Metric definitions​

Review exactly what is collected:

Verify monitoring database role and its permissions​

# See exact SQL for creating monitoring role
npx [email protected] prepare-db --print-sql

Security architecture​

Network​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DMZ / Public β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Grafana β”‚ ← HTTPS (443) β”‚
β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Internal Network β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ VictoriaMetrics (sink-prometheus) β”‚ β”‚
β”‚ β”‚ (port 9090 internal, host 59090) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ scrapes pgwatch-prometheus:9091/pgwatch β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ pgwatch (pgwatch-postgres, pgwatch-prometheus)β”‚ β”‚
β”‚ β”‚ metrics scraped on pgwatch-prometheus:9091 β”‚ β”‚
β”‚ β”‚ (web/health ports 8080/8089 are internal) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Database Network β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚PostgreSQLβ”‚ β”‚PostgreSQLβ”‚ β”‚PostgreSQLβ”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Credentials​

ComponentCredential typeStorage
PostgreSQLPasswordEnvironment variable / secret
VictoriaMetricsBasic authConfig file
GrafanaOAuth/LDAPDatabase

Performance characteristics​

Collection overhead​

Frequencies below are the full preset intervals from config/pgwatch-prometheus/metrics.yml (see the per-metric note above):

Metric typeQuery costFrequency
pg_stat_database (db_stats)Low30s
pg_stat_statementsMedium30s
pg_stat_all_tables / table_statsMedium30s
Bloat estimation (pg_table_bloat, pg_btree_bloat)High7200s (2h)

Query performance​

Query typeTypical latency
Instant query10-100ms
Range query (1h)50-200ms
Range query (24h)200-500ms
Range query (7d)500ms-2s

Resource usage​

ComponentCPUMemoryDisk I/O
pgwatchLow256 MiBMinimal
VictoriaMetricsMedium2 GiB+Medium
GrafanaLow512 MiBLow