Skip to main content
View rawEdit

Performance tuning

Optimizing PostgresAI monitoring for better performance and lower resource usage.

Diagnosing performance issues

Symptoms

SymptomLikely cause
Slow dashboard loadingComplex queries, many time series
High CPU on target databaseExpensive collection queries
VictoriaMetrics using high memoryLarge cardinality, long retention
Grafana timeoutsQuery timeout too short

Quick diagnostics

# Check resource usage
docker stats

# Check query times in Grafana
# Dashboard → Panel → Query Inspector → Stats

Target database impact

Reducing collection overhead

1. Increase collection interval:

# Default: 15s, increase for less load
PW_SCRAPE_INTERVAL=30s

2. Use appropriate preset:

PresetLoadCoverage
basicLowEssential metrics only
standardMediumMost use cases
fullHigherComplete visibility
postgresai mon local-install --preset basic

3. Disable expensive metrics:

PW_DISABLED_METRICS="bloat_tables,bloat_indexes"

Monitoring query overhead

Check which queries monitoring runs:

select
query,
calls,
mean_exec_time,
total_exec_time
from pg_stat_statements
where query like '%pg_stat%'
order by total_exec_time desc
limit 10;

VictoriaMetrics tuning

Memory optimization

Reduce active time series:

# Limit cardinality
VM_STORAGE_MAX_UNIQUE_SERIES=1000000

Adjust cache sizes:

# Reduce if memory constrained
VM_STORAGE_CACHE_SIZE_STORAGE_TSID=128MB
VM_STORAGE_CACHE_SIZE_INDEX_DB=64MB

Query performance

Increase query timeout:

VM_SEARCH_QUERY_TIMEOUT=60s

Limit concurrent queries:

VM_SEARCH_MAX_CONCURRENT_REQUESTS=8

Reduce query memory:

VM_SEARCH_MAX_MEMORY_PER_QUERY=256MB

Storage optimization

Shorter retention:

VM_RETENTION_PERIOD=7d  # Down from 14d

Enable compression:

VictoriaMetrics compresses by default. Check compression ratio:

curl http://localhost:8428/api/v1/status/tsdb

Grafana optimization

Dashboard design

Reduce panels per dashboard:

  • Limit to 20-30 panels
  • Use collapsed rows for less-used panels

Optimize panel queries:

  • Use rate() instead of raw counters
  • Limit time series with topk() or bottomk()
  • Add {cluster_name="production"} filters

Example — limit to top 10:

topk(10, rate(pg_stat_statements_calls_total[5m]))

Query caching

Enable in Grafana:

[caching]
enabled = true
ttl = 60s

Data point reduction

Limit data points returned:

# grafana.ini
[dataproxy]
max_idle_connections = 100
row_limit = 10000

pgwatch tuning

Connection pooling

Limit connections per database:

PW_MAX_PARALLEL_CONNECTIONS_PER_DB=2  # Down from 3

Increase connection timeout:

PW_CONNECT_TIMEOUT=15s

Collection scheduling

Stagger collection times:

For multiple databases, avoid simultaneous collection:

# Configure different scrape offsets
databases:
- name: db1
scrape_offset: 0s
- name: db2
scrape_offset: 5s
- name: db3
scrape_offset: 10s

Resource allocation

Minimum requirements

ComponentCPUMemoryDisk
pgwatch0.5 cores256 MiBminimal
VictoriaMetrics1 core2 GiB10 GiB/week
Grafana0.5 cores512 MiB100 MiB

Scaling recommendations

Per monitored database:

  • Add 50 MiB RAM to VictoriaMetrics
  • Add 5 GiB storage per week

Example — 10 databases, 30-day retention:

VictoriaMetrics RAM: 2 GiB + (10 × 50 MiB) = 2.5 GiB
VictoriaMetrics Disk: 10 × 4 weeks × 5 GiB = 200 GiB

Docker resource limits

# docker-compose.yml
services:
pgwatch:
deploy:
resources:
limits:
memory: 512M
cpus: '1.0'

victoriametrics:
deploy:
resources:
limits:
memory: 4G
cpus: '2.0'

grafana:
deploy:
resources:
limits:
memory: 1G
cpus: '1.0'

High cardinality issues

Identify high cardinality

curl http://localhost:8428/api/v1/status/tsdb | jq '.data.totalSeries'

Common cardinality sources

SourceImpactMitigation
queryid labelsHighUse query digest instead
Per-table metricsMediumFilter to important tables
Per-index metricsMediumFilter to important indexes
Multiple clustersAdditiveSeparate VictoriaMetrics instances

Reduce cardinality

Drop high-cardinality labels:

# VictoriaMetrics relabel config
relabel_configs:
- action: labeldrop
regex: query # Drop full query text

Aggregate metrics:

# Instead of per-table
sum by (datname) (pg_stat_user_tables_seq_scan_total)

Monitoring the monitoring

Use the Self-Monitoring dashboard to track:

  • Collection latency
  • Query durations
  • Memory usage
  • Disk usage

Set up alerts for monitoring health:

- alert: MonitoringCollectionSlow
expr: pgwatch_collection_duration_seconds > 30
for: 5m
labels:
severity: warning

Troubleshooting slow dashboards

Step 1: Identify slow panels

  1. Open dashboard
  2. Click panel → Inspect → Query
  3. Check "Query" tab for execution time

Step 2: Analyze query

Look for:

  • Missing time range filter
  • High cardinality selectors
  • Expensive aggregations

Step 3: Optimize

# Before (slow)
sum(rate(pg_stat_statements_calls_total[5m]))

# After (faster - add filter)
sum(rate(pg_stat_statements_calls_total{cluster_name="$cluster"}[5m]))