Skip to main content
View rawEdit

Performance tuning

Optimizing PostgresAI monitoring for better performance and lower resource usage.

Diagnosing performance issues​

Symptoms​

SymptomLikely cause
Slow dashboard loadingComplex queries, many time series
High CPU on target databaseExpensive collection queries
VictoriaMetrics using high memoryLarge cardinality, long retention
Grafana timeoutsQuery timeout too short

Quick diagnostics​

# Check resource usage
docker stats

# Check query times in Grafana
# Dashboard → Panel → Query Inspector → Stats

Target database impact​

Reducing collection overhead​

1. Increase collection interval:

# Default: 15s, increase for less load
PW_SCRAPE_INTERVAL=30s

2. Use appropriate preset:

PresetLoadCoverage
basicLowEssential metrics only
standardMediumMost use cases
fullHigherComplete visibility
postgresai mon local-install --preset basic

3. Disable expensive metrics:

PW_DISABLED_METRICS="bloat_tables,bloat_indexes"

Monitoring query overhead​

Check which queries monitoring runs:

select
query,
calls,
mean_exec_time,
total_exec_time
from pg_stat_statements
where query like '%pg_stat%'
order by total_exec_time desc
limit 10;

VictoriaMetrics tuning​

Memory optimization​

Reduce active time series:

# Limit cardinality
VM_STORAGE_MAX_UNIQUE_SERIES=1000000

Adjust cache sizes:

# Reduce if memory constrained
VM_STORAGE_CACHE_SIZE_STORAGE_TSID=128MB
VM_STORAGE_CACHE_SIZE_INDEX_DB=64MB

Query performance​

Increase query timeout:

VM_SEARCH_QUERY_TIMEOUT=60s

Limit concurrent queries:

VM_SEARCH_MAX_CONCURRENT_REQUESTS=8

Reduce query memory:

VM_SEARCH_MAX_MEMORY_PER_QUERY=256MB

Storage optimization​

Shorter retention:

VM_RETENTION_PERIOD=7d  # Down from 14d

Enable compression:

VictoriaMetrics compresses by default. Check compression ratio:

curl http://localhost:8428/api/v1/status/tsdb

Grafana optimization​

Dashboard design​

Reduce panels per dashboard:

  • Limit to 20-30 panels
  • Use collapsed rows for less-used panels

Optimize panel queries:

  • Use rate() instead of raw counters
  • Limit time series with topk() or bottomk()
  • Add {cluster_name="production"} filters

Example — limit to top 10:

topk(10, rate(pg_stat_statements_calls_total[5m]))

Query caching​

Enable in Grafana:

[caching]
enabled = true
ttl = 60s

Data point reduction​

Limit data points returned:

# grafana.ini
[dataproxy]
max_idle_connections = 100
row_limit = 10000

pgwatch tuning​

Connection pooling​

Limit connections per database:

PW_MAX_PARALLEL_CONNECTIONS_PER_DB=2  # Down from 3

Increase connection timeout:

PW_CONNECT_TIMEOUT=15s

Collection scheduling​

Stagger collection times:

For multiple databases, avoid simultaneous collection:

# Configure different scrape offsets
databases:
- name: db1
scrape_offset: 0s
- name: db2
scrape_offset: 5s
- name: db3
scrape_offset: 10s

Resource allocation​

Minimum requirements​

ComponentCPUMemoryDisk
pgwatch0.5 cores256 MiBminimal
VictoriaMetrics1 core2 GiB10 GiB/week
Grafana0.5 cores512 MiB100 MiB

Scaling recommendations​

Per monitored database:

  • Add 50 MiB RAM to VictoriaMetrics
  • Add 5 GiB storage per week

Example — 10 databases, 30-day retention:

VictoriaMetrics RAM: 2 GiB + (10 × 50 MiB) = 2.5 GiB
VictoriaMetrics Disk: 10 × 4 weeks × 5 GiB = 200 GiB

Docker resource limits​

# docker-compose.yml
services:
pgwatch:
deploy:
resources:
limits:
memory: 512M
cpus: '1.0'

victoriametrics:
deploy:
resources:
limits:
memory: 4G
cpus: '2.0'

grafana:
deploy:
resources:
limits:
memory: 1G
cpus: '1.0'

High cardinality issues​

Identify high cardinality​

curl http://localhost:8428/api/v1/status/tsdb | jq '.data.totalSeries'

Common cardinality sources​

SourceImpactMitigation
queryid labelsHighUse query digest instead
Per-table metricsMediumFilter to important tables
Per-index metricsMediumFilter to important indexes
Multiple clustersAdditiveSeparate VictoriaMetrics instances

Reduce cardinality​

Drop high-cardinality labels:

# VictoriaMetrics relabel config
relabel_configs:
- action: labeldrop
regex: query # Drop full query text

Aggregate metrics:

# Instead of per-table
sum by (datname) (pg_stat_user_tables_seq_scan_total)

Monitoring the monitoring​

Use the Self-Monitoring dashboard to track:

  • Collection latency
  • Query durations
  • Memory usage
  • Disk usage

Set up alerts for monitoring health:

- alert: MonitoringCollectionSlow
expr: pgwatch_collection_duration_seconds > 30
for: 5m
labels:
severity: warning

Troubleshooting slow dashboards​

Step 1: Identify slow panels​

  1. Open dashboard
  2. Click panel → Inspect → Query
  3. Check "Query" tab for execution time

Step 2: Analyze query​

Look for:

  • Missing time range filter
  • High cardinality selectors
  • Expensive aggregations

Step 3: Optimize​

# Before (slow)
sum(rate(pg_stat_statements_calls_total[5m]))

# After (faster - add filter)
sum(rate(pg_stat_statements_calls_total{cluster_name="$cluster"}[5m]))