Agentic SRE Advanced By Samson Tanimawo, PhD Published Apr 5, 2026 5 min read

Multi-Agent Workflows for Postmortem Generation

One agent gathers data. One writes. One reviews. One files. The workflow, with the inter-agent messages typed and bounded.

The pipeline

Four agents form the pipeline. Gatherer pulls timeline, metrics, deploy logs, action history (read-only, produces a structured incident object); Writer drafts the postmortem text from the incident object (produces sections 1-4 of the standard template); Reviewer reads the draft, identifies gaps, asks clarifying questions, edits where appropriate; Filer posts the final draft to the postmortem store, notifies the team, opens action-item tickets.

Gatherer. Read-only; pulls timeline, metrics, deploy logs, action history.
Writer. Drafts text from incident object; produces sections 1-4.
Reviewer. Identifies gaps; asks clarifying questions; edits where appropriate.
Filer. Posts to store; notifies team; opens action-item tickets.

Inter-agent messages

Each agent boundary has a typed schema. Gatherer to Writer: the incident object (timeline, metrics summary, services affected, actions taken, duration). Writer to Reviewer: the draft (text with section markers, confidence per section, gaps identified). Reviewer to Filer: the reviewed draft (text, action items, owner suggestions, follow-up dates). Each message is typed and validated; schema mismatches halt the pipeline.

Gatherer to Writer: incident object. Timeline, metrics summary, services, actions, duration.
Writer to Reviewer: draft. Text with section markers, confidence, gaps.
Reviewer to Filer: reviewed draft. Text, action items, owner suggestions, dates.
Schema-validated messages. Mismatches halt the pipeline; the contract is enforced.

Bounded inter-agent messages

Bounded message sizes prevent runaway context. Each message has a size cap (gatherer’s incident object at most 10k tokens, writer’s draft at most 5k tokens); caps prevent runaway context growth because without caps the pipeline accumulates state and slows down; when a message would exceed the cap, the producer summarises and the summary is part of the API.

10k token cap on incident object. Gatherer’s output bounded.
5k token cap on draft. Writer’s output bounded.
Caps prevent runaway. Without them, pipeline accumulates state and slows.
Producer summarises at cap. Summary is part of the API; the agent must be good at it.

Where humans intervene

Three points need human input. Gaps the writer flagged (sections it could not draft confidently get filled by humans); action items the reviewer suggested (humans pick which to actually file and assign owners); final approval before filing because the filer does not auto-publish and a human approves.

Writer-flagged gaps. Sections the agent couldn’t draft confidently; humans fill.
Action item selection. Reviewer suggests; humans pick which to file and assign.
Final approval before filing. Filer does not auto-publish; human approves.
Per-handoff documented decision. Each human decision captured; supports later review.

Speed-up vs solo human

The speed-up is real. Solo human: 4-6 hours from incident close to filed postmortem. Pipeline: 90 minutes (50 minutes of agent work plus 40 minutes of human review). The 4x speedup is real and survives quality review because drafts are not perfect but they are starting points and humans finish.

Solo human: 4-6 hours. Incident close to filed postmortem; the baseline.
Pipeline: 90 minutes. 50 agent + 40 human review.
4x speedup. Real; survives quality review.
Drafts as starting points. Not perfect; humans finish; the team scales.