Self-monitoring dashboard

Monitor the health of the monitoring stack itself.

Screenshot note

The screenshot shows a containerized demo environment. Some host-level panels (CPU, memory, disk, network) require node_exporter or cAdvisor with Docker socket access. In production environments with proper host metrics collection, all panels display data.

Purpose

Ensure the monitoring infrastructure is functioning correctly:

Metrics collection is working
Storage has capacity
No data gaps
Alert pipeline is healthy

When to use

Regular monitoring stack health checks
After monitoring stack updates
When dashboards show "No data"
Capacity planning for monitoring infrastructure

Key panels

Scrape success rate

What it shows:

Percentage of successful metric scrapes
Per-target breakdown

Healthy state:

100% success rate
Consistent scrape intervals

Warning signs:

Scrape failures — check target availability
Timeouts — target may be overloaded

Metrics ingestion rate

What it shows:

Samples ingested per second
Trend over time

Use for:

Capacity planning
Detecting metric explosion

Storage usage

What it shows:

VictoriaMetrics disk usage
Projected capacity based on retention

Warning threshold:

Alert when > 80% capacity

Active time series

What it shows:

Number of unique metric series
Growth trend

Monitoring series growth:

Sudden spikes may indicate cardinality explosion
Gradual growth expected as you add targets

Query performance

What it shows:

Grafana query latency
Slow queries

Variables

Variable	Purpose
`cluster_name`	Filter by monitored cluster

Health check commands

Check VictoriaMetrics status

curl http://localhost:8428/api/v1/status/tsdb

Check pgwatch status

docker compose logs pgwatch --tail=50

Check Prometheus/VM targets

curl http://localhost:8428/api/v1/targets

Verify metrics collection

curl 'http://localhost:8428/api/v1/query?query=up'

Common issues

Dashboards show "No data"

Check scrape targets are up:

curl http://localhost:8428/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'

Verify metric exists:

curl 'http://localhost:8428/api/v1/label/__name__/values' | jq '.data[]' | grep pg_

Check time range alignment

High storage growth

Check for cardinality explosion:

curl 'http://localhost:8428/api/v1/status/tsdb' | jq '.data.totalSeries'

Review high-cardinality metrics:

curl 'http://localhost:8428/api/v1/status/tsdb' | jq '.data.seriesCountByMetricName | to_entries | sort_by(-.value) | .[0:10]'

Adjust retention if needed:

# docker-compose.yml
victoriametrics:
  command:
    - "-retentionPeriod=30d"  # Reduce from 90d

Scrape timeouts

Increase scrape timeout:

# prometheus.yml
scrape_configs:
  - job_name: 'pgwatch'
    scrape_timeout: 30s

Check target database performance
Review pgwatch resource allocation

Capacity planning

Estimating storage needs

Factor	Impact
Number of databases	Linear increase
Scrape interval	Shorter = more data
Retention period	Longer = more storage
Query cardinality	High = more series

Formula:

Daily storage ≈ (series_count × samples_per_day × bytes_per_sample) / compression_ratio

Typical values:

Bytes per sample: ~2-4 (compressed)
Compression ratio: 10-15x
Samples per day at 60s interval: 1,440

Scaling recommendations

Databases	Recommended resources
1-5	2 CPU, 2 GiB RAM, 20 GiB disk
5-20	4 CPU, 4 GiB RAM, 100 GiB disk
20-50	8 CPU, 8 GiB RAM, 500 GiB disk

Target database health — 01. Node Overview

Purpose​

When to use​

Key panels​

Scrape success rate​

Metrics ingestion rate​

Storage usage​

Active time series​

Query performance​

Variables​

Health check commands​

Check VictoriaMetrics status​

Check pgwatch status​

Check Prometheus/VM targets​

Verify metrics collection​

Common issues​

Dashboards show "No data"​

High storage growth​

Scrape timeouts​

Capacity planning​

Estimating storage needs​

Scaling recommendations​

Related dashboards​

Purpose

When to use

Key panels

Scrape success rate

Metrics ingestion rate

Storage usage

Active time series

Query performance

Variables

Health check commands

Check VictoriaMetrics status

Check pgwatch status

Check Prometheus/VM targets

Verify metrics collection

Common issues

Dashboards show "No data"

High storage growth

Scrape timeouts

Capacity planning

Estimating storage needs

Scaling recommendations

Related dashboards