Skip to main content
View rawEdit

Architecture

Deep-dive into PostgresAI monitoring system components and data flow.

System overview

┌─────────────────────────────────────────────────────────────────────┐
│ PostgresAI Monitoring │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ PostgreSQL │ │ PostgreSQL │ │ PostgreSQL │ │
│ │ Cluster A │ │ Cluster B │ │ Cluster C │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ pgwatch │ │
│ │ (collector)│ │
│ └──────┬──────┘ │
│ │ Prometheus format │
│ ┌──────▼────────┐ │
│ │VictoriaMetrics│ │
│ │ (storage) │ │
│ └──────┬────────┘ │
│ │ PromQL │
│ ┌──────▼────────┐ │
│ │ Grafana │ │
│ │(visualization)│ │
│ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

Components

pgwatch — Metrics collector

Purpose: Collect PostgreSQL metrics and expose in Prometheus format

Key functions:

  • Execute SQL queries against PostgreSQL
  • Transform results to Prometheus metrics
  • Expose /metrics endpoint

Configuration:

SettingDefaultDescription
Scrape interval15sCollection frequency
Statement timeout30sMax query duration
Max connections3Connections per database

Collected data sources:

ViewMetrics
pg_stat_statementsQuery performance
pg_stat_activitySession state, wait events
pg_stat_user_tablesTable access patterns
pg_stat_user_indexesIndex usage
pg_stat_databaseDatabase-level stats
pg_stat_bgwriterCheckpoint behavior

VictoriaMetrics — Time-series database

Purpose: Store and query metrics

Key functions:

  • Ingest metrics from pgwatch
  • Compress and store time-series data
  • Execute PromQL queries

Performance characteristics:

AspectVictoriaMetricsPrometheus
Compression10x betterBaseline
Query speed2-5x fasterBaseline
Memory usage3-5x lowerBaseline
High availabilityBuilt-in clusteringFederation

Storage model:

Data directory structure:
/var/lib/victoriametrics/
├── data/
│ ├── small/ # Recent data (in-memory)
│ └── big/ # Historical data (on-disk)
├── indexdb/ # Label indexes
└── snapshots/ # Point-in-time backups

Grafana — Visualization

Purpose: Dashboard and alerting UI

Key functions:

  • Render time-series charts
  • Dashboard templating with variables
  • Unified alerting

Dashboard structure:

PostgresAI dashboards:
├── 01. Node overview (cluster-level)
├── 02. Query analysis (top-N queries)
├── 03. Single query (query deep-dive)
├── 04. Wait events (session analysis)
├── 05. Backups (WAL archiving)
├── 06. Replication (lag monitoring)
├── 07. Autovacuum (vacuum status)
├── 08. Table stats (table analysis)
├── 09. Single table (table deep-dive)
├── 10. Index health (index analysis)
├── 11. Single index (index deep-dive)
├── 12. SLRU (cache stats)
├── 13. Lock contention (lock waits)
└── Self-monitoring (stack health)

Data flow

Collection flow

1. pgwatch connects to PostgreSQL
└── Uses monitoring user credentials
└── Executes metric collection queries

2. Query results transformed to metrics
└── Column values → metric values
└── Column names → labels

3. Metrics exposed on /metrics endpoint
└── Prometheus exposition format
└── Timestamp attached

4. VictoriaMetrics scrapes pgwatch
└── HTTP GET /metrics
└── Configurable interval (default 15s)

5. Metrics stored in VictoriaMetrics
└── Compressed time-series storage
└── Indexed by labels

Query flow

1. User opens Grafana dashboard
└── Dashboard loads panel queries

2. Grafana sends PromQL to VictoriaMetrics
└── Variables substituted
└── Time range applied

3. VictoriaMetrics executes query
└── Index lookup by labels
└── Data retrieval from storage
└── Aggregation/calculation

4. Results returned to Grafana
└── Time series data
└── Rendered as charts

Metric naming

Convention

{source}_{view}_{metric}_{unit}_{type}

Examples:
pg_stat_database_xact_commit_total # Counter
pg_stat_database_numbackends # Gauge
pg_stat_statements_total_exec_time_seconds # Counter

Labels

{metric_name}{
cluster_name="production",
node_name="primary",
datname="myapp",
schemaname="public",
relname="users"
}

Storage requirements

Calculation

Storage = metrics_per_second × bytes_per_sample × retention_seconds

Typical values:
- metrics_per_second: 50-200 per database
- bytes_per_sample: 3-5 bytes (VictoriaMetrics compressed)
- retention: 1,209,600 seconds (14 days)

Example: 5 databases, 14-day retention
= 5 × 100 × 1.5 × 1,209,600
= ~900 MiB

Scaling factors

FactorImpact
More databasesLinear increase
More tables/indexesSublinear (only active tracked)
Longer retentionLinear increase
Shorter scrape intervalLinear increase

High availability

HA architecture

┌─────────────────────────────────────────────────────┐
│ Load Balancer │
└────────────────────────┬────────────────────────────┘

┌───────────────┼───────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│pgwatch-1│ │pgwatch-2│ │pgwatch-3│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└───────────────┼───────────────┘
│ remote_write

┌────────────────────▼────────────────────┐
│ VictoriaMetrics Cluster │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │vmstorage1│ │vmstorage2│ │vmstorage3│ │
│ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────┬────────────────────┘

┌────▼────┐
│ Grafana │
│ (HA) │
└─────────┘

Failure modes

Component failureImpactRecovery
Single pgwatchPartial data lossAutomatic failover
Single vmstorageNo data loss (replication)Automatic
All pgwatchCollection stopsManual restart
All vmstorageQuery unavailableRestore from backup
GrafanaUI unavailableLoad balancer failover

Data privacy — metadata only

PostgresAI monitoring collects only database metadata — no actual data or query parameters are ever accessed.

Collected data types

Data typeExampleStorage location
Database statisticsConnections, transactions, cache hit ratioPrometheus (VictoriaMetrics)
Normalized queriesselect * from users where id = $1PostgreSQL sink
Wait eventsCPU, IO, Lock, LWLockPrometheus (VictoriaMetrics)
Table statisticsRow count, dead tuples, last vacuumPrometheus (VictoriaMetrics)
Index statisticsSize, scans, tuples readPrometheus (VictoriaMetrics)
Column statisticsFrom pg_statistic for bloat estimatesPrometheus (VictoriaMetrics)

NOT collected

  • Actual table data (row contents)
  • Query parameter values ($1, $2 remain as placeholders)
  • Application secrets or credentials
  • Connection passwords

Metric definitions

Review exactly what is collected:

Verify monitoring database role and its permissions

# See exact SQL for creating monitoring role
npx postgresai@latest prepare-db --print-sql

Security architecture

Network

┌─────────────────────────────────────────────────────┐
│ DMZ / Public │
│ │
│ ┌─────────┐ │
│ │ Grafana │ ← HTTPS (443) │
│ └────┬────┘ │
└─────────────────────────┼───────────────────────────┘

┌─────────────────────────┼───────────────────────────┐
│ Internal Network │
│ │ │
│ ┌──────────────────────▼─────────────────────────┐ │
│ │ VictoriaMetrics │ │
│ │ (port 8428 internal) │ │
│ └──────────────────────┬─────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼─────────────────────────┐ │
│ │ pgwatch │ │
│ │ (port 8080 internal) │ │
│ └──────────────────────┬─────────────────────────┘ │
│ │ │
└─────────────────────────┬───────────────────────────┘

┌─────────────────────────┼───────────────────────────┐
│ Database Network │
│ │ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │PostgreSQL│ │PostgreSQL│ │PostgreSQL│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────┘

Credentials

ComponentCredential typeStorage
PostgreSQLPasswordEnvironment variable / secret
VictoriaMetricsBasic authConfig file
GrafanaOAuth/LDAPDatabase

Performance characteristics

Collection overhead

Metric typeQuery costFrequency
pg_stat_databaseLow15s
pg_stat_statementsMedium15s
pg_stat_user_tablesMedium60s
Bloat estimationHigh300s

Query performance

Query typeTypical latency
Instant query10-100ms
Range query (1h)50-200ms
Range query (24h)200-500ms
Range query (7d)500ms-2s

Resource usage

ComponentCPUMemoryDisk I/O
pgwatchLow256 MiBMinimal
VictoriaMetricsMedium2 GiB+Medium
GrafanaLow512 MiBLow