Skip to main content
View rawEdit

Architecture

Deep-dive into PostgresAI monitoring system components and data flow.

System overview​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PostgresAI Monitoring β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ PostgreSQL β”‚ β”‚ PostgreSQL β”‚ β”‚ PostgreSQL β”‚ β”‚
β”‚ β”‚ Cluster A β”‚ β”‚ Cluster B β”‚ β”‚ Cluster C β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ pgwatch β”‚ β”‚
β”‚ β”‚ (collector)β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ Prometheus format β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚VictoriaMetricsβ”‚ β”‚
β”‚ β”‚ (storage) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ PromQL β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Grafana β”‚ β”‚
β”‚ β”‚(visualization)β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components​

pgwatch β€” Metrics collector​

Purpose: Collect PostgreSQL metrics and expose in Prometheus format

Key functions:

  • Execute SQL queries against PostgreSQL
  • Transform results to Prometheus metrics
  • Expose /metrics endpoint

Configuration:

SettingDefaultDescription
Scrape interval15sCollection frequency
Statement timeout30sMax query duration
Max connections3Connections per database

Collected data sources:

ViewMetrics
pg_stat_statementsQuery performance
pg_stat_activitySession state, wait events
pg_stat_user_tablesTable access patterns
pg_stat_user_indexesIndex usage
pg_stat_databaseDatabase-level stats
pg_stat_bgwriterCheckpoint behavior

VictoriaMetrics β€” Time-series database​

Purpose: Store and query metrics

Key functions:

  • Ingest metrics from pgwatch
  • Compress and store time-series data
  • Execute PromQL queries

Performance characteristics:

AspectVictoriaMetricsPrometheus
Compression10x betterBaseline
Query speed2-5x fasterBaseline
Memory usage3-5x lowerBaseline
High availabilityBuilt-in clusteringFederation

Storage model:

Data directory structure:
/var/lib/victoriametrics/
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ small/ # Recent data (in-memory)
β”‚ └── big/ # Historical data (on-disk)
β”œβ”€β”€ indexdb/ # Label indexes
└── snapshots/ # Point-in-time backups

Grafana β€” Visualization​

Purpose: Dashboard and alerting UI

Key functions:

  • Render time-series charts
  • Dashboard templating with variables
  • Unified alerting

Dashboard structure:

PostgresAI dashboards:
β”œβ”€β”€ 01. Node overview (cluster-level)
β”œβ”€β”€ 02. Query analysis (top-N queries)
β”œβ”€β”€ 03. Single query (query deep-dive)
β”œβ”€β”€ 04. Wait events (session analysis)
β”œβ”€β”€ 05. Backups (WAL archiving)
β”œβ”€β”€ 06. Replication (lag monitoring)
β”œβ”€β”€ 07. Autovacuum (vacuum status)
β”œβ”€β”€ 08. Table stats (table analysis)
β”œβ”€β”€ 09. Single table (table deep-dive)
β”œβ”€β”€ 10. Index health (index analysis)
β”œβ”€β”€ 11. Single index (index deep-dive)
β”œβ”€β”€ 12. SLRU (cache stats)
β”œβ”€β”€ 13. Lock contention (lock waits)
└── Self-monitoring (stack health)

Data flow​

Collection flow​

1. pgwatch connects to PostgreSQL
└── Uses monitoring user credentials
└── Executes metric collection queries

2. Query results transformed to metrics
└── Column values β†’ metric values
└── Column names β†’ labels

3. Metrics exposed on /metrics endpoint
└── Prometheus exposition format
└── Timestamp attached

4. VictoriaMetrics scrapes pgwatch
└── HTTP GET /metrics
└── Configurable interval (default 15s)

5. Metrics stored in VictoriaMetrics
└── Compressed time-series storage
└── Indexed by labels

Query flow​

1. User opens Grafana dashboard
└── Dashboard loads panel queries

2. Grafana sends PromQL to VictoriaMetrics
└── Variables substituted
└── Time range applied

3. VictoriaMetrics executes query
└── Index lookup by labels
└── Data retrieval from storage
└── Aggregation/calculation

4. Results returned to Grafana
└── Time series data
└── Rendered as charts

Metric naming​

Convention​

{source}_{view}_{metric}_{unit}_{type}

Examples:
pg_stat_database_xact_commit_total # Counter
pg_stat_database_numbackends # Gauge
pg_stat_statements_total_exec_time_seconds # Counter

Labels​

{metric_name}{
cluster_name="production",
node_name="primary",
datname="myapp",
schemaname="public",
relname="users"
}

Storage requirements​

Calculation​

Storage = metrics_per_second Γ— bytes_per_sample Γ— retention_seconds

Typical values:
- metrics_per_second: 50-200 per database
- bytes_per_sample: 3-5 bytes (VictoriaMetrics compressed)
- retention: 1,209,600 seconds (14 days)

Example: 5 databases, 14-day retention
= 5 Γ— 100 Γ— 1.5 Γ— 1,209,600
= ~900 MiB

Scaling factors​

FactorImpact
More databasesLinear increase
More tables/indexesSublinear (only active tracked)
Longer retentionLinear increase
Shorter scrape intervalLinear increase

High availability​

HA architecture​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Load Balancer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
β”‚pgwatch-1β”‚ β”‚pgwatch-2β”‚ β”‚pgwatch-3β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ remote_write
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ VictoriaMetrics Cluster β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚vmstorage1β”‚ β”‚vmstorage2β”‚ β”‚vmstorage3β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
β”‚ Grafana β”‚
β”‚ (HA) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Failure modes​

Component failureImpactRecovery
Single pgwatchPartial data lossAutomatic failover
Single vmstorageNo data loss (replication)Automatic
All pgwatchCollection stopsManual restart
All vmstorageQuery unavailableRestore from backup
GrafanaUI unavailableLoad balancer failover

Data privacy β€” metadata only​

PostgresAI monitoring collects only database metadata β€” no actual data or query parameters are ever accessed.

Collected data types​

Data typeExampleStorage location
Database statisticsConnections, transactions, cache hit ratioPrometheus (VictoriaMetrics)
Normalized queriesselect * from users where id = $1PostgreSQL sink
Wait eventsCPU, IO, Lock, LWLockPrometheus (VictoriaMetrics)
Table statisticsRow count, dead tuples, last vacuumPrometheus (VictoriaMetrics)
Index statisticsSize, scans, tuples readPrometheus (VictoriaMetrics)
Column statisticsFrom pg_statistic for bloat estimatesPrometheus (VictoriaMetrics)

NOT collected​

  • Actual table data (row contents)
  • Query parameter values ($1, $2 remain as placeholders)
  • Application secrets or credentials
  • Connection passwords

Metric definitions​

Review exactly what is collected:

Verify monitoring database role and its permissions​

# See exact SQL for creating monitoring role
npx postgresai@latest prepare-db --print-sql

Security architecture​

Network​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DMZ / Public β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Grafana β”‚ ← HTTPS (443) β”‚
β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Internal Network β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ VictoriaMetrics β”‚ β”‚
β”‚ β”‚ (port 8428 internal) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ pgwatch β”‚ β”‚
β”‚ β”‚ (port 8080 internal) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Database Network β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚PostgreSQLβ”‚ β”‚PostgreSQLβ”‚ β”‚PostgreSQLβ”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Credentials​

ComponentCredential typeStorage
PostgreSQLPasswordEnvironment variable / secret
VictoriaMetricsBasic authConfig file
GrafanaOAuth/LDAPDatabase

Performance characteristics​

Collection overhead​

Metric typeQuery costFrequency
pg_stat_databaseLow15s
pg_stat_statementsMedium15s
pg_stat_user_tablesMedium60s
Bloat estimationHigh300s

Query performance​

Query typeTypical latency
Instant query10-100ms
Range query (1h)50-200ms
Range query (24h)200-500ms
Range query (7d)500ms-2s

Resource usage​

ComponentCPUMemoryDisk I/O
pgwatchLow256 MiBMinimal
VictoriaMetricsMedium2 GiB+Medium
GrafanaLow512 MiBLow