Architecture
Deep-dive into PostgresAI monitoring system components and data flow.
System overviewβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PostgresAI Monitoring β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β PostgreSQL β β PostgreSQL β β PostgreSQL β β
β β Cluster A β β Cluster B β β Cluster C β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β
β β β β β
β ββββββββββββββββββββΌβββββββββββββββββββ β
β β β
β ββββββββΌβββββββ β
β β pgwatch β β
β β (collector)β β
β ββββββββ¬βββββββ β
β β Prometheus format β
β ββββββββΌβββββββββ β
β βVictoriaMetricsβ β
β β (storage) β β
β ββββββββ¬βββββββββ β
β β PromQL β
β ββββββββΌβββββββββ β
β β Grafana β β
β β(visualization)β β
β βββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Componentsβ
pgwatch β Metrics collectorβ
Purpose: Collect PostgreSQL metrics and expose in Prometheus format
Key functions:
- Execute SQL queries against PostgreSQL
- Transform results to Prometheus metrics
- Expose
/metricsendpoint
Configuration:
| Setting | Default | Description |
|---|---|---|
| Scrape interval | 15s | Collection frequency |
| Statement timeout | 30s | Max query duration |
| Max connections | 3 | Connections per database |
Collected data sources:
| View | Metrics |
|---|---|
| pg_stat_statements | Query performance |
| pg_stat_activity | Session state, wait events |
| pg_stat_user_tables | Table access patterns |
| pg_stat_user_indexes | Index usage |
| pg_stat_database | Database-level stats |
| pg_stat_bgwriter | Checkpoint behavior |
VictoriaMetrics β Time-series databaseβ
Purpose: Store and query metrics
Key functions:
- Ingest metrics from pgwatch
- Compress and store time-series data
- Execute PromQL queries
Performance characteristics:
| Aspect | VictoriaMetrics | Prometheus |
|---|---|---|
| Compression | 10x better | Baseline |
| Query speed | 2-5x faster | Baseline |
| Memory usage | 3-5x lower | Baseline |
| High availability | Built-in clustering | Federation |
Storage model:
Data directory structure:
/var/lib/victoriametrics/
βββ data/
β βββ small/ # Recent data (in-memory)
β βββ big/ # Historical data (on-disk)
βββ indexdb/ # Label indexes
βββ snapshots/ # Point-in-time backups
Grafana β Visualizationβ
Purpose: Dashboard and alerting UI
Key functions:
- Render time-series charts
- Dashboard templating with variables
- Unified alerting
Dashboard structure:
PostgresAI dashboards:
βββ 01. Node overview (cluster-level)
βββ 02. Query analysis (top-N queries)
βββ 03. Single query (query deep-dive)
βββ 04. Wait events (session analysis)
βββ 05. Backups (WAL archiving)
βββ 06. Replication (lag monitoring)
βββ 07. Autovacuum (vacuum status)
βββ 08. Table stats (table analysis)
βββ 09. Single table (table deep-dive)
βββ 10. Index health (index analysis)
βββ 11. Single index (index deep-dive)
βββ 12. SLRU (cache stats)
βββ 13. Lock contention (lock waits)
βββ Self-monitoring (stack health)
Data flowβ
Collection flowβ
1. pgwatch connects to PostgreSQL
βββ Uses monitoring user credentials
βββ Executes metric collection queries
2. Query results transformed to metrics
βββ Column values β metric values
βββ Column names β labels
3. Metrics exposed on /metrics endpoint
βββ Prometheus exposition format
βββ Timestamp attached
4. VictoriaMetrics scrapes pgwatch
βββ HTTP GET /metrics
βββ Configurable interval (default 15s)
5. Metrics stored in VictoriaMetrics
βββ Compressed time-series storage
βββ Indexed by labels
Query flowβ
1. User opens Grafana dashboard
βββ Dashboard loads panel queries
2. Grafana sends PromQL to VictoriaMetrics
βββ Variables substituted
βββ Time range applied
3. VictoriaMetrics executes query
βββ Index lookup by labels
βββ Data retrieval from storage
βββ Aggregation/calculation
4. Results returned to Grafana
βββ Time series data
βββ Rendered as charts
Metric namingβ
Conventionβ
{source}_{view}_{metric}_{unit}_{type}
Examples:
pg_stat_database_xact_commit_total # Counter
pg_stat_database_numbackends # Gauge
pg_stat_statements_total_exec_time_seconds # Counter
Labelsβ
{metric_name}{
cluster_name="production",
node_name="primary",
datname="myapp",
schemaname="public",
relname="users"
}
Storage requirementsβ
Calculationβ
Storage = metrics_per_second Γ bytes_per_sample Γ retention_seconds
Typical values:
- metrics_per_second: 50-200 per database
- bytes_per_sample: 3-5 bytes (VictoriaMetrics compressed)
- retention: 1,209,600 seconds (14 days)
Example: 5 databases, 14-day retention
= 5 Γ 100 Γ 1.5 Γ 1,209,600
= ~900 MiB
Scaling factorsβ
| Factor | Impact |
|---|---|
| More databases | Linear increase |
| More tables/indexes | Sublinear (only active tracked) |
| Longer retention | Linear increase |
| Shorter scrape interval | Linear increase |
High availabilityβ
HA architectureβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Load Balancer β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
ββββββΌβββββ ββββββΌβββββ ββββββΌβββββ
βpgwatch-1β βpgwatch-2β βpgwatch-3β
ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ
β β β
βββββββββββββββββΌββββββββββββββββ
β remote_write
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
β VictoriaMetrics Cluster β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β βvmstorage1β βvmstorage2β βvmstorage3β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
ββββββββββββββββββββββ¬βββββββββββββββββββββ
β
ββββββΌβββββ
β Grafana β
β (HA) β
βββββββββββ
Failure modesβ
| Component failure | Impact | Recovery |
|---|---|---|
| Single pgwatch | Partial data loss | Automatic failover |
| Single vmstorage | No data loss (replication) | Automatic |
| All pgwatch | Collection stops | Manual restart |
| All vmstorage | Query unavailable | Restore from backup |
| Grafana | UI unavailable | Load balancer failover |
Data privacy β metadata onlyβ
PostgresAI monitoring collects only database metadata β no actual data or query parameters are ever accessed.
Collected data typesβ
| Data type | Example | Storage location |
|---|---|---|
| Database statistics | Connections, transactions, cache hit ratio | Prometheus (VictoriaMetrics) |
| Normalized queries | select * from users where id = $1 | PostgreSQL sink |
| Wait events | CPU, IO, Lock, LWLock | Prometheus (VictoriaMetrics) |
| Table statistics | Row count, dead tuples, last vacuum | Prometheus (VictoriaMetrics) |
| Index statistics | Size, scans, tuples read | Prometheus (VictoriaMetrics) |
| Column statistics | From pg_statistic for bloat estimates | Prometheus (VictoriaMetrics) |
NOT collectedβ
- Actual table data (row contents)
- Query parameter values (
$1,$2remain as placeholders) - Application secrets or credentials
- Connection passwords
Metric definitionsβ
Review exactly what is collected:
- Prometheus metrics: pgwatch-prometheus/metrics.yml
- PostgreSQL metrics (with query texts): pgwatch-postgres/metrics.yml
Verify monitoring database role and its permissionsβ
# See exact SQL for creating monitoring role
npx postgresai@latest prepare-db --print-sql
Security architectureβ
Networkβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DMZ / Public β
β β
β βββββββββββ β
β β Grafana β β HTTPS (443) β
β ββββββ¬βββββ β
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ
β Internal Network β
β β β
β ββββββββββββββββββββββββΌββββββββββββββββββββββββββ β
β β VictoriaMetrics β β
β β (port 8428 internal) β β
β ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββΌββββββββββββββββββββββββββ β
β β pgwatch β β
β β (port 8080 internal) β β
β ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ β
β β β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββΌββββββββββββββββ ββββββββββββ
β Database Network β
β β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β βPostgreSQLβ βPostgreSQLβ βPostgreSQLβ β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Credentialsβ
| Component | Credential type | Storage |
|---|---|---|
| PostgreSQL | Password | Environment variable / secret |
| VictoriaMetrics | Basic auth | Config file |
| Grafana | OAuth/LDAP | Database |
Performance characteristicsβ
Collection overheadβ
| Metric type | Query cost | Frequency |
|---|---|---|
| pg_stat_database | Low | 15s |
| pg_stat_statements | Medium | 15s |
| pg_stat_user_tables | Medium | 60s |
| Bloat estimation | High | 300s |
Query performanceβ
| Query type | Typical latency |
|---|---|
| Instant query | 10-100ms |
| Range query (1h) | 50-200ms |
| Range query (24h) | 200-500ms |
| Range query (7d) | 500ms-2s |
Resource usageβ
| Component | CPU | Memory | Disk I/O |
|---|---|---|---|
| pgwatch | Low | 256 MiB | Minimal |
| VictoriaMetrics | Medium | 2 GiB+ | Medium |
| Grafana | Low | 512 MiB | Low |