Performance tuning
Optimizing PostgresAI monitoring for better performance and lower resource usage.
Diagnosing performance issues​
Symptoms​
| Symptom | Likely cause |
|---|---|
| Slow dashboard loading | Complex queries, many time series |
| High CPU on target database | Expensive collection queries |
| VictoriaMetrics using high memory | Large cardinality, long retention |
| Grafana timeouts | Query timeout too short |
Quick diagnostics​
# Check resource usage
docker stats
# Check query times in Grafana
# Dashboard → Panel → Query Inspector → Stats
Target database impact​
Reducing collection overhead​
1. Increase collection interval:
# Default: 15s, increase for less load
PW_SCRAPE_INTERVAL=30s
2. Use appropriate preset:
| Preset | Load | Coverage |
|---|---|---|
| basic | Low | Essential metrics only |
| standard | Medium | Most use cases |
| full | Higher | Complete visibility |
postgresai mon local-install --preset basic
3. Disable expensive metrics:
PW_DISABLED_METRICS="bloat_tables,bloat_indexes"
Monitoring query overhead​
Check which queries monitoring runs:
select
query,
calls,
mean_exec_time,
total_exec_time
from pg_stat_statements
where query like '%pg_stat%'
order by total_exec_time desc
limit 10;
VictoriaMetrics tuning​
Memory optimization​
Reduce active time series:
# Limit cardinality
VM_STORAGE_MAX_UNIQUE_SERIES=1000000
Adjust cache sizes:
# Reduce if memory constrained
VM_STORAGE_CACHE_SIZE_STORAGE_TSID=128MB
VM_STORAGE_CACHE_SIZE_INDEX_DB=64MB
Query performance​
Increase query timeout:
VM_SEARCH_QUERY_TIMEOUT=60s
Limit concurrent queries:
VM_SEARCH_MAX_CONCURRENT_REQUESTS=8
Reduce query memory:
VM_SEARCH_MAX_MEMORY_PER_QUERY=256MB
Storage optimization​
Shorter retention:
VM_RETENTION_PERIOD=7d # Down from 14d
Enable compression:
VictoriaMetrics compresses by default. Check compression ratio:
curl http://localhost:8428/api/v1/status/tsdb
Grafana optimization​
Dashboard design​
Reduce panels per dashboard:
- Limit to 20-30 panels
- Use collapsed rows for less-used panels
Optimize panel queries:
- Use
rate()instead of raw counters - Limit time series with
topk()orbottomk() - Add
{cluster_name="production"}filters
Example — limit to top 10:
topk(10, rate(pg_stat_statements_calls_total[5m]))
Query caching​
Enable in Grafana:
[caching]
enabled = true
ttl = 60s
Data point reduction​
Limit data points returned:
# grafana.ini
[dataproxy]
max_idle_connections = 100
row_limit = 10000
pgwatch tuning​
Connection pooling​
Limit connections per database:
PW_MAX_PARALLEL_CONNECTIONS_PER_DB=2 # Down from 3
Increase connection timeout:
PW_CONNECT_TIMEOUT=15s
Collection scheduling​
Stagger collection times:
For multiple databases, avoid simultaneous collection:
# Configure different scrape offsets
databases:
- name: db1
scrape_offset: 0s
- name: db2
scrape_offset: 5s
- name: db3
scrape_offset: 10s
Resource allocation​
Minimum requirements​
| Component | CPU | Memory | Disk |
|---|---|---|---|
| pgwatch | 0.5 cores | 256 MiB | minimal |
| VictoriaMetrics | 1 core | 2 GiB | 10 GiB/week |
| Grafana | 0.5 cores | 512 MiB | 100 MiB |
Scaling recommendations​
Per monitored database:
- Add 50 MiB RAM to VictoriaMetrics
- Add 5 GiB storage per week
Example — 10 databases, 30-day retention:
VictoriaMetrics RAM: 2 GiB + (10 × 50 MiB) = 2.5 GiB
VictoriaMetrics Disk: 10 × 4 weeks × 5 GiB = 200 GiB
Docker resource limits​
# docker-compose.yml
services:
pgwatch:
deploy:
resources:
limits:
memory: 512M
cpus: '1.0'
victoriametrics:
deploy:
resources:
limits:
memory: 4G
cpus: '2.0'
grafana:
deploy:
resources:
limits:
memory: 1G
cpus: '1.0'
High cardinality issues​
Identify high cardinality​
curl http://localhost:8428/api/v1/status/tsdb | jq '.data.totalSeries'
Common cardinality sources​
| Source | Impact | Mitigation |
|---|---|---|
| queryid labels | High | Use query digest instead |
| Per-table metrics | Medium | Filter to important tables |
| Per-index metrics | Medium | Filter to important indexes |
| Multiple clusters | Additive | Separate VictoriaMetrics instances |
Reduce cardinality​
Drop high-cardinality labels:
# VictoriaMetrics relabel config
relabel_configs:
- action: labeldrop
regex: query # Drop full query text
Aggregate metrics:
# Instead of per-table
sum by (datname) (pg_stat_user_tables_seq_scan_total)
Monitoring the monitoring​
Use the Self-Monitoring dashboard to track:
- Collection latency
- Query durations
- Memory usage
- Disk usage
Set up alerts for monitoring health:
- alert: MonitoringCollectionSlow
expr: pgwatch_collection_duration_seconds > 30
for: 5m
labels:
severity: warning
Troubleshooting slow dashboards​
Step 1: Identify slow panels​
- Open dashboard
- Click panel → Inspect → Query
- Check "Query" tab for execution time
Step 2: Analyze query​
Look for:
- Missing time range filter
- High cardinality selectors
- Expensive aggregations
Step 3: Optimize​
# Before (slow)
sum(rate(pg_stat_statements_calls_total[5m]))
# After (faster - add filter)
sum(rate(pg_stat_statements_calls_total{cluster_name="$cluster"}[5m]))