Skip to main content
View rawEdit

Alerting configuration

Configure alert rules and notification channels for PostgresAI monitoring.

Alert rule basics​

PostgresAI includes pre-configured alert rules for common PostgreSQL issues.

Alert structure​

groups:
- name: postgresql_alerts
rules:
- alert: HighConnectionUsage
expr: |
sum(pg_stat_database_numbackends)
/
scalar(max(pg_settings_max_connections))
> 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Connection usage above 80%"
description: "{{ $labels.cluster_name }} has {{ $value | humanizePercentage }} connections used"

Alert components​

ComponentPurpose
exprPromQL expression that triggers alert
forDuration condition must be true
labelsMetadata for routing and filtering
annotationsHuman-readable alert details

Pre-configured alerts​

Connection alerts​

AlertConditionSeverity
HighConnectionUsage> 80% of max_connectionswarning
CriticalConnectionUsage> 95% of max_connectionscritical
IdleInTransactionLongSession idle in transaction > 5minwarning

Performance alerts​

AlertConditionSeverity
HighTransactionRollbackRateRollbacks > 5% of commitswarning
LowBufferCacheHitRatioBuffer hit ratio < 95%warning
HighDeadTupleRatioDead tuples > 20% of livewarning

Replication alerts​

AlertConditionSeverity
ReplicationLagHighLag > 100MBwarning
ReplicationLagCriticalLag > 1 GiBcritical
ReplicaDisconnectedReplica not in pg_stat_replicationcritical

Storage alerts​

AlertConditionSeverity
TableBloatHighEstimated bloat > 50%warning
IndexBloatHighEstimated bloat > 30%warning
TempFileUsageHighTemp files > 1 GiB/hourwarning

Custom alert rules​

Creating custom rules​

  1. Create rules file:
# custom-alerts.yml
groups:
- name: custom_postgresql
rules:
- alert: SlowQueryDetected
expr: |
pg_stat_statements_mean_exec_time_seconds
> 1
for: 10m
labels:
severity: warning
annotations:
summary: "Slow query detected"
description: "Query {{ $labels.queryid }} averaging {{ $value }}s"
  1. Mount into container:
volumes:
- ./custom-alerts.yml:/etc/prometheus/rules/custom-alerts.yml

Alert rule best practices​

Use for duration wisely:

  • Too short — false positives from transient spikes
  • Too long — delayed notification

Recommended for values:

Alert typeDuration
Critical outages1m
Performance issues5m
Resource usage10m
Trend alerts30m

Notification channels​

Email​

receivers:
- name: email-team
email_configs:
- to: dba-[email protected]
from: [email protected]
smarthost: smtp.example.com:587
auth_username: [email protected]
auth_password: ${SMTP_PASSWORD} # Use environment variable
Security

Never hardcode SMTP passwords. Use environment variable interpolation or external secrets management.

Slack​

receivers:
- name: slack-alerts
slack_configs:
- api_url: https://hooks.slack.com/services/xxx/yyy/zzz
channel: '#postgres-alerts'
title: '{{ .GroupLabels.alertname }}'
text: '{{ .Annotations.description }}'

PagerDuty​

receivers:
- name: pagerduty-critical
pagerduty_configs:
- service_key: <PAGERDUTY_SERVICE_KEY>
severity: '{{ .Labels.severity }}'

OpsGenie​

receivers:
- name: opsgenie
opsgenie_configs:
- api_key: your-api-key
priority: '{{ if eq .Labels.severity "critical" }}P1{{ else }}P3{{ end }}'

Alert routing​

Route configuration​

route:
receiver: default
group_by: [alertname, cluster_name]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h

routes:
- match:
severity: critical
receiver: pagerduty-critical
repeat_interval: 1h

- match:
severity: warning
receiver: slack-alerts
repeat_interval: 4h

Routing labels​

LabelPurpose
severitycritical, warning, info
cluster_nameTarget specific teams
teamRoute to team channel

Silencing alerts​

Temporary silence​

# Via Alertmanager API
curl -X POST http://localhost:9093/api/v2/silences \
-H "Content-Type: application/json" \
-d '{
"matchers": [
{"name": "alertname", "value": "HighConnectionUsage"}
],
"startsAt": "2024-01-15T00:00:00Z",
"endsAt": "2024-01-15T06:00:00Z",
"createdBy": "admin",
"comment": "Planned maintenance"
}'

Inhibition rules​

Suppress dependent alerts:

inhibit_rules:
- source_match:
alertname: PostgresDown
target_match:
severity: warning
equal: [cluster_name]

Grafana alerting​

Creating Grafana alerts​

  1. Open panel edit mode
  2. Click "Alert" tab
  3. Configure conditions:
conditions:
- evaluator:
type: gt
params: [0.8]
query:
params: [A, 5m, now]
reducer:
type: avg

Grafana contact points​

apiVersion: 1
contactPoints:
- orgId: 1
name: slack
receivers:
- uid: slack-1
type: slack
settings:
url: https://hooks.slack.com/xxx

Testing alerts​

Dry run​

# Check rule syntax
promtool check rules custom-alerts.yml

# Test PromQL expression
curl 'http://localhost:8428/api/v1/query?query=...'

Alert testing​

# Fire test alert
curl -X POST http://localhost:9093/api/v2/alerts \
-H "Content-Type: application/json" \
-d '[{
"labels": {"alertname": "TestAlert", "severity": "warning"},
"annotations": {"summary": "Test alert"}
}]'

Troubleshooting​

Alert not firing​

  1. Check expression returns data:

    curl 'http://localhost:8428/api/v1/query?query=<expression>'
  2. Verify for duration has elapsed

  3. Check Alertmanager received alert:

    curl http://localhost:9093/api/v2/alerts

Alert not delivered​

  1. Check Alertmanager logs
  2. Verify notification channel configuration
  3. Test channel directly:
    curl -X POST https://hooks.slack.com/xxx -d '{"text":"test"}'

Common issues​

IssueCauseSolution
No alertsExpression returns emptyCheck metric exists and labels match
Too many alertsThreshold too sensitiveAdjust threshold or add for duration
Duplicate alertsMultiple AlertmanagersConfigure HA clustering