On-Call & Team Health Practical By Samson Tanimawo, PhD Published Sep 8, 2025 4 min read

Acknowledgment Time SLA

< 5 min for sev 1.

Standard targets

Standard MTTA targets are the discipline of saying out loud what acknowledgement speed is required at each severity. Without published targets, slow ack slips silently and leadership only finds out during a customer escalation.

Sev 1. Under five minutes. Customer-impacting incidents need immediate response, and the page should reach a fully-awake responder within that window.
Sev 2. Under fifteen minutes. Significant but bounded; time to acknowledge is forgiving, but time to action is the metric that actually matters.
Sev 3. Under one hour during business hours; next business day off-hours. Issues worth knowing, not worth waking up for.
Published team target. The bar is written down and visible. New on-call engineers know what is expected; operational reviews have a number to measure against.

Measuring it

What you measure shapes the conclusion. Page-sent-to-ack timing, the p95 cut rather than the median, and multiple slices keep the metric honest rather than performative.

Page-to-ack timer. Measure from page-sent to ack. Starting from incident-detected conflates detection latency with responder behaviour and hides the actionable signal.
P95, not median. The pages that take thirty minutes to acknowledge are where things go wrong. Median hides them; p95 surfaces them.
Multiple cuts. Per-engineer, per-rotation, and per-time-of-day views. Each surfaces a different signal: skill, staffing, or shift quality.
Quarterly trend chart. The MTTA trajectory over time. Catches slow drift that any single quarter would call noise.

How to consistently hit

Hitting the target is its own discipline. Multi-channel paging, backup on-call, and quarterly drills are the operational machinery; without them, ack time depends on luck and individual conscientiousness.

Multi-channel paging. Phone, app push, SMS, and voice escalation. Each channel can fail independently; redundancy is what keeps the page from being missed.
Backup on-call. Explicit escalation policy: if primary does not ack in N minutes, secondary gets paged. Tested monthly so the path is known to work.
Quarterly path test. A synthetic page that exercises the full chain. Surprises during real incidents are expensive; surprises during drills are free.
Named secondary. Each rotation has a named backup engineer. Continuity through PTO, illness, and time-zone gaps becomes routine rather than improvised.

Reading the trends

Trends tell the truth. Up is bad, down is mostly good, and per-time-of-day cuts surface staffing or shift-quality issues that the aggregate metric hides.

MTTA trending up. Degradation signal. Common causes: paging tool issues, on-call burnout, rotation understaffing. Investigate the tool first, then the people.
MTTA trending down. Mostly good, but watch for over-eager ack. Track time-to-action separately so engineers cannot game the metric by acking without engaging.
Per-time-of-day. Night shifts often run higher MTTA than day; expected and acceptable. Consistent night degradation that worsens over quarters suggests staffing gaps that need addressing.
Per-engineer outlier check. Personal MTTA per rotation. Catches struggling engineers early so support arrives before burnout, not after.

When MTTA is bad

When MTTA misses target, walk the layers in order. Tool, rotation, process. Skipping straight to "people need to be faster" is the cheap answer and almost always the wrong one.

Tool reliability first. Delivery-rate dashboard for the paging vendor. Lost pages mean missed acknowledgements; the engineer is not the problem.
Rotation health second. Burnout, vacation gaps, off-hours coverage. Survey, staff up, and address root causes before pushing harder on individuals.
Process third. Routing accuracy and severity classification. Often the MTTA target is reachable with better routing rather than more responsiveness.
Post-incident MTTA review. Each significant miss includes an MTTA-driver line in the postmortem. Patterns emerge across incidents that no single review would catch.