Deliberately introducing failures (latency, errors, dropped packets) into a system to see how it behaves under stress.
Fault injection is the technical core of chaos engineering: tools that introduce specific failures into a running system, kill a pod (k8s), drop 30% of packets to a service (toxiproxy, Chaos Mesh), inject 500ms of latency on a downstream call (Istio fault injection), force-fail a region's API. The injected fault is observable, scoped (blast-radius limited), and reversible. The team measures whether the system degraded gracefully or cascaded.
Disaster-recovery tabletop exercises are theory; fault injection is practice. The first time a team injects a region failure into staging and watches the failover succeed (or fail) is the first time their DR plan is real. Make fault injection a standing part of pre-launch readiness for any service that needs better than 99.9%.
See the part of the platform that handles fault injection in production.