Alerting configuration

Configure alert rules and notification channels for PostgresAI monitoring.

Alert rule basics

PostgresAI includes pre-configured alert rules for common PostgreSQL issues.

Alert structure

groups:
  - name: postgresql_alerts
    rules:
      - alert: HighConnectionUsage
        expr: |
          sum(pg_stat_database_numbackends)
          /
          scalar(max(pg_settings_max_connections))
          > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Connection usage above 80%"
          description: "{{ $labels.cluster_name }} has {{ $value | humanizePercentage }} connections used"

Alert components

Component	Purpose
expr	PromQL expression that triggers alert
for	Duration condition must be true
labels	Metadata for routing and filtering
annotations	Human-readable alert details

Pre-configured alerts

Connection alerts

Alert	Condition	Severity
HighConnectionUsage	> 80% of max_connections	warning
CriticalConnectionUsage	> 95% of max_connections	critical
IdleInTransactionLong	Session idle in transaction > 5min	warning

Performance alerts

Alert	Condition	Severity
HighTransactionRollbackRate	Rollbacks > 5% of commits	warning
LowBufferCacheHitRatio	Buffer hit ratio < 95%	warning
HighDeadTupleRatio	Dead tuples > 20% of live	warning

Replication alerts

Alert	Condition	Severity
ReplicationLagHigh	Lag > 100MB	warning
ReplicationLagCritical	Lag > 1 GiB	critical
ReplicaDisconnected	Replica not in pg_stat_replication	critical

Storage alerts

Alert	Condition	Severity
TableBloatHigh	Estimated bloat > 50%	warning
IndexBloatHigh	Estimated bloat > 30%	warning
TempFileUsageHigh	Temp files > 1 GiB/hour	warning

Custom alert rules

Creating custom rules

Create rules file:

# custom-alerts.yml
groups:
  - name: custom_postgresql
    rules:
      - alert: SlowQueryDetected
        expr: |
          pg_stat_statements_mean_exec_time_seconds
          > 1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Slow query detected"
          description: "Query {{ $labels.queryid }} averaging {{ $value }}s"

Mount into container:

volumes:
  - ./custom-alerts.yml:/etc/prometheus/rules/custom-alerts.yml

Alert rule best practices

Use for duration wisely:

Too short — false positives from transient spikes
Too long — delayed notification

Recommended for values:

Alert type	Duration
Critical outages	1m
Performance issues	5m
Resource usage	10m
Trend alerts	30m

Notification channels

Email

receivers:
  - name: email-team
    email_configs:
      - to: dba-[email protected]
        from: [email protected]
        smarthost: smtp.example.com:587
        auth_username: [email protected]
        auth_password: ${SMTP_PASSWORD}  # Use environment variable

Security

Never hardcode SMTP passwords. Use environment variable interpolation or external secrets management.

Slack

receivers:
  - name: slack-alerts
    slack_configs:
      - api_url: https://hooks.slack.com/services/xxx/yyy/zzz
        channel: '#postgres-alerts'
        title: '{{ .GroupLabels.alertname }}'
        text: '{{ .Annotations.description }}'

PagerDuty

receivers:
  - name: pagerduty-critical
    pagerduty_configs:
      - service_key: <PAGERDUTY_SERVICE_KEY>
        severity: '{{ .Labels.severity }}'

OpsGenie

receivers:
  - name: opsgenie
    opsgenie_configs:
      - api_key: your-api-key
        priority: '{{ if eq .Labels.severity "critical" }}P1{{ else }}P3{{ end }}'

Alert routing

Route configuration

route:
  receiver: default
  group_by: [alertname, cluster_name]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

  routes:
    - match:
        severity: critical
      receiver: pagerduty-critical
      repeat_interval: 1h

    - match:
        severity: warning
      receiver: slack-alerts
      repeat_interval: 4h

Routing labels

Label	Purpose
severity	critical, warning, info
cluster_name	Target specific teams
team	Route to team channel

Silencing alerts

Temporary silence

# Via Alertmanager API
curl -X POST http://localhost:9093/api/v2/silences \
  -H "Content-Type: application/json" \
  -d '{
    "matchers": [
      {"name": "alertname", "value": "HighConnectionUsage"}
    ],
    "startsAt": "2024-01-15T00:00:00Z",
    "endsAt": "2024-01-15T06:00:00Z",
    "createdBy": "admin",
    "comment": "Planned maintenance"
  }'

Inhibition rules

Suppress dependent alerts:

inhibit_rules:
  - source_match:
      alertname: PostgresDown
    target_match:
      severity: warning
    equal: [cluster_name]

Grafana alerting

Creating Grafana alerts

Open panel edit mode
Click "Alert" tab
Configure conditions:

conditions:
  - evaluator:
      type: gt
      params: [0.8]
    query:
      params: [A, 5m, now]
    reducer:
      type: avg

Grafana contact points

apiVersion: 1
contactPoints:
  - orgId: 1
    name: slack
    receivers:
      - uid: slack-1
        type: slack
        settings:
          url: https://hooks.slack.com/xxx

Testing alerts

Dry run

# Check rule syntax
promtool check rules custom-alerts.yml

# Test PromQL expression
curl 'http://localhost:8428/api/v1/query?query=...'

Alert testing

# Fire test alert
curl -X POST http://localhost:9093/api/v2/alerts \
  -H "Content-Type: application/json" \
  -d '[{
    "labels": {"alertname": "TestAlert", "severity": "warning"},
    "annotations": {"summary": "Test alert"}
  }]'

Troubleshooting

Alert not firing

Check expression returns data:

curl 'http://localhost:8428/api/v1/query?query=<expression>'

Verify for duration has elapsed

Check Alertmanager received alert:

curl http://localhost:9093/api/v2/alerts

Alert not delivered

Check Alertmanager logs
Verify notification channel configuration

Test channel directly:

curl -X POST https://hooks.slack.com/xxx -d '{"text":"test"}'

Common issues

Issue	Cause	Solution
No alerts	Expression returns empty	Check metric exists and labels match
Too many alerts	Threshold too sensitive	Adjust threshold or add `for` duration
Duplicate alerts	Multiple Alertmanagers	Configure HA clustering

Alert rule basics​

Alert structure​

Alert components​

Pre-configured alerts​

Connection alerts​

Performance alerts​

Replication alerts​

Storage alerts​

Custom alert rules​

Creating custom rules​

Alert rule best practices​

Notification channels​

Email​

Slack​

PagerDuty​

OpsGenie​

Alert routing​

Route configuration​

Routing labels​

Silencing alerts​

Temporary silence​

Inhibition rules​

Grafana alerting​

Creating Grafana alerts​

Grafana contact points​

Testing alerts​

Dry run​

Alert testing​

Troubleshooting​

Alert not firing​

Alert not delivered​

Common issues​