Chaos Engineering – Instrumenting Failure To Prevent Failure
Chaos Engineering is a modern software development practice pioneered by Netflix, which intentionally injects controlled failures into a system to test the resiliency and second-order effects of given failure cases.
While deployment of systems into The Cloud is intended to solve some of these problems, unexpected outages do occur and have caused billions of dollars in losses as recently as April 2011, when a configuration error in an AWS Availability Zone caused the network connectivity to failover to the wrong link, causing cascading failures within the entire region. Hundreds of companies were offline or experienced severely degraded service during the outage.
Critics say the real responsibility lies with those companies – the applications running on the AWS infrastructure were never properly tested for such a failure. Designing for failure may save those organizations which are increasingly reliant on cloud infrastructure.
How did Netflix weather the April 2011 outage? Well enough. Netflix “Chaos Monkey,” and larger “Simian Army” services constantly inject failure conditions into their production infrastructure. When Netflix noticed increasing failure in the affected region, they were able to shift their workloads to unaffected regions in an expedient manner with minimal disruption to their customers.
Jenn Bergstrom takes us on a tour of the history of Chaos Engineering, where we’re implementing it here at Parsons, and how you can use Chaos Engineering as a force-multiplier for assessing risk in software development. One thing is for certain – Failures will happen. Will your system survive them?
Meet the Presenter
Jenn Bergstrom is a software engineer at Parsons. She came to Parsons through an acquisition as an enterprise-scale Java application developer and has supported a broad range of projects within the company. Jenn has nine AWS Certifications and an Azure certification. She builds secure cloud-agnostic solutions for internal and external customers and uses DevSecOps and Chaos Engineering to prove out the reliability and interoperability of the solutions. In her free time, Jenn enjoys creating art and music and cheering her kiddos on in their endeavors.