Multi-cluster monitoring
Centralized monitoring for multiple PostgreSQL clusters from a single Grafana instance.
Architecture options
Option 1: Single pgwatch, multiple targets
Best for: 5-20 clusters in the same network
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Cluster A │ │ Cluster B │ │ Cluster C │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────────┼───────────────────┘
│
┌──────▼──────┐
│ pgwatch │
└──────┬──────┘
│
┌──────▼────────┐
│VictoriaMetrics│
└──────┬────────┘
│
┌──────▼──────┐
│ Grafana │
└─────────────┘
Option 2: Distributed pgwatch, central storage
Best for: Clusters in different networks/regions
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Cluster A │ │ Cluster B │ │ Cluster C │
│ + pgwatch │ │ + pgwatch │ │ + pgwatch │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────────┼───────────────────┘
│ remote_write
┌──────▼────────┐
│VictoriaMetrics│
│ (central) │
└──────┬────────┘
│
┌──────▼──────┐
│ Grafana │
└─────────────┘
Configuration
Adding multiple clusters
docker-compose.yml approach:
services:
pgwatch:
environment:
# Use environment variable substitution for credentials
PW_TARGETS: |
postgresql://${PGWATCH_USER}:${PGWATCH_PASSWORD}@cluster-a:5432/postgres?cluster_name=cluster-a
postgresql://${PGWATCH_USER}:${PGWATCH_PASSWORD}@cluster-b:5432/postgres?cluster_name=cluster-b
postgresql://${PGWATCH_USER}:${PGWATCH_PASSWORD}@cluster-c:5432/postgres?cluster_name=cluster-c
Define PGWATCH_USER and PGWATCH_PASSWORD in your .env file or use Docker secrets for production deployments.
CLI approach:
# Add clusters one at a time
postgresai mon add-target \
--cluster-name "production-us" \
postgresql://user@prod-us:5432/postgres
postgresai mon add-target \
--cluster-name "production-eu" \
postgresql://user@prod-eu:5432/postgres
Cluster naming conventions
Use consistent, descriptive names:
| Pattern | Example | Use case |
|---|---|---|
| env-region | production-us-east | Multi-region |
| app-env | orders-prod | Per-application |
| team-purpose | platform-analytics | Per-team |
# Good
--cluster-name="production-us-east-1"
# Avoid - too generic
--cluster-name="db1"
Distributed collection
Remote write configuration
Each pgwatch instance writes to central VictoriaMetrics:
# pgwatch config at each site
remote_write:
url: https://central-vm.example.com/api/v1/write
basic_auth:
username: pgwatch
password: ${REMOTE_WRITE_PASSWORD} # Use environment variable
tls_config:
insecure_skip_verify: false
Never commit plaintext passwords. Use environment variables or a secrets manager.
Authentication
Use unique credentials per pgwatch instance:
# Central VictoriaMetrics
basic_auth_users:
- username: pgwatch-us-east
password: <BCRYPT_HASH> # Generate with: htpasswd -nbB pgwatch-us-east <password>
- username: pgwatch-eu-west
password: <BCRYPT_HASH>
Network considerations
| Requirement | Configuration |
|---|---|
| Firewall | Allow outbound 8428 from pgwatch |
| TLS | Use HTTPS for remote write |
| Compression | Enable gzip (remote_write.compress: true) |
| Buffering | Configure local queue for network failures |
Label strategy
Required labels
Every metric should include:
| Label | Purpose | Example |
|---|---|---|
| cluster_name | Primary identifier | production-us |
| node_name | Primary/replica distinction | primary, replica-1 |
| datname | Database name | orders |
Optional labels
| Label | Purpose | Example |
|---|---|---|
| region | Geographic region | us-east-1 |
| environment | env classification | production, staging |
| team | Ownership | platform |
Adding external labels
# pgwatch config
external_labels:
region: us-east-1
environment: production
team: platform
Dashboard configuration
Cluster selector variable
All dashboards include a cluster_name variable:
# Variable definition
name: cluster_name
query: label_values(pg_stat_database_xact_commit_total, cluster_name)
multi: true
include_all: true
Cross-cluster queries
Compare metrics across clusters:
# TPS comparison
sum by (cluster_name) (
rate(pg_stat_database_xact_commit_total[5m])
)
Alert on any cluster:
# Alert if any cluster has high connection usage
max by (cluster_name) (
pg_stat_database_numbackends / pg_settings_max_connections
) > 0.8
Cluster overview dashboard
Create a dashboard showing all clusters:
# Cluster health summary
# Status: 1 = healthy, 0 = issues
(
# Connection health
(pg_stat_database_numbackends / pg_settings_max_connections < 0.8)
and
# Recent activity
(time() - pg_stat_database_stats_reset < 3600)
)
# Note: For replication health, create a separate alert:
# pg_replication_lag_seconds > 60
High availability
Redundant pgwatch
Run multiple pgwatch instances for HA:
services:
pgwatch-1:
environment:
PW_INSTANCE_ID: pgwatch-1
PW_HA_MODE: active-passive
PW_HA_PEERS: pgwatch-1:8080,pgwatch-2:8080
pgwatch-2:
environment:
PW_INSTANCE_ID: pgwatch-2
PW_HA_MODE: active-passive
PW_HA_PEERS: pgwatch-1:8080,pgwatch-2:8080
VictoriaMetrics cluster
For large deployments, use VictoriaMetrics cluster mode:
services:
vmstorage-1:
image: victoriametrics/vmstorage
vmstorage-2:
image: victoriametrics/vmstorage
vminsert:
image: victoriametrics/vminsert
command:
- -storageNode=vmstorage-1:8400,vmstorage-2:8400
- -replicationFactor=2
vmselect:
image: victoriametrics/vmselect
command:
- -storageNode=vmstorage-1:8401,vmstorage-2:8401
Scaling considerations
Metrics volume
| Clusters | Estimated metrics/sec | VictoriaMetrics RAM |
|---|---|---|
| 1-5 | 100-500 | 2 GiB |
| 5-20 | 500-2000 | 4 GiB |
| 20-50 | 2000-5000 | 8 GiB |
| 50+ | 5000+ | 16 GiB+ |
Storage planning
Storage per cluster = (metrics/sec) × 4 bytes × retention_seconds # VictoriaMetrics compressed
Example: 10 clusters, 30-day retention
= 10 × 100 × 100 × 30 × 86400
= ~260 GiB
Troubleshooting
Cluster not appearing
- Check pgwatch logs for connection errors
- Verify cluster_name is set in connection string
- Check VictoriaMetrics is receiving data:
curl 'http://localhost:8428/api/v1/query?query=up{cluster_name="missing-cluster"}'
Mixed-up metrics
Symptoms: Metrics from one cluster appearing under another
Cause: Duplicate cluster_name labels
Solution: Ensure unique cluster_name per connection:
grep -r "cluster_name" /etc/pgwatch/
High latency for remote clusters
-
Enable compression:
remote_write:
compress: true -
Increase batch size:
remote_write:
queue_config:
max_samples_per_send: 5000 -
Consider regional VictoriaMetrics instances with federation