Prometheus Alertmanager Routing
Alertmanager's tree-based routing. The patterns that work.
The route tree
Alertmanager routes form a tree. The top-level route matches everything; sub-routes match more specific labels and inherit settings from parents. Each route has matchers (label-based filters) and a receiver (where to send the notification); the tree shape is what makes complex routing tractable.
- Tree structure. Top-level route matches everything; sub-routes match more specific labels.
- Matchers and receiver. Each route has label-based filters and a destination receiver.
- Inheritance. Sub-routes inherit settings from parents; override specific fields at the sub-route level.
- Per-route documented intent. Each route has a comment explaining what it catches and why; supports investigation when routing breaks.
Matchers
Matchers are how routes select alerts. Label equality, label regex, and AND-combined multiple matchers cover most cases; continue: true lets a sub-route match without halting processing, which is what allows an alert to reach multiple receivers.
- Label equality.
severity = critical; the simplest matcher shape. - Label regex.
service =~ 'payment.*'; pattern-match across services. - AND-combined matchers. Multiple matchers all required; supports precise route selection.
- continue: true. Sub-route matches but processing continues; useful when an alert needs multiple receivers.
Receiver types
Receivers are where notifications land. PagerDuty, Slack, email, webhook each have their own configuration block; receivers can compose so a single named receiver fires to PagerDuty and Slack together; inhibition rules suppress lower-priority alerts when higher-priority ones are firing.
- PagerDuty, Slack, email, webhook. Each has its own configuration block; the receiver layer is pluggable.
- Composable receivers. A single named receiver fires to multiple destinations; PagerDuty AND Slack together.
- Inhibition rules. Lower-priority alerts suppressed when higher-priority firing; region-down inhibits per-pod alerts.
- Per-receiver test fixture. Routes tested against synthetic alerts; supports confidence before deploy.
Grouping
Grouping reduces alert spam during incidents. group_by labels combine alerts with matching values into one notification; group_wait sets how long to wait for additional alerts before sending; group_interval sets the cadence of updates for an existing group.
- group_by. Alerts with matching label values combine into one notification; reduces spam during incidents.
- group_wait. How long to wait for additional alerts before sending; 30 seconds typical.
- group_interval. How often to send updates for an existing group; 5 minutes typical.
- Per-group repeat_interval. Re-page cadence for unresolved alerts; supports steady reminder without page-flooding.
Operating the routing
Operating Alertmanager routing well is a discipline. amtool routes test verifies sample alerts route correctly; per-route delivery rate metrics surface underused or overused routes; configuration lives in git and deploys via CI so UI access stays read-only and changes go through review.
- amtool routes test. Verifies sample alerts route correctly; catches misconfigurations before deploy.
- Per-route delivery metrics. Surfaces routes that are underused (worth retiring) or overused (worth splitting).
- Config in git, deployed via CI. UI access read-only; changes go through review.
- Per-quarter routing review. Routes audited for fitness against the alert volume; supports continuous improvement.