Skip to content

Site Reliability Engineering

by Niall Richard Murphy, Betsy Beyer, Chris Jones, and Jennifer Petoff
Feb 1, 2024

How Google runs production systems.

Site Reliability Engineering (SRE) is Google’s handbook for running production systems at massive scale. It introduces the concept of reliability as a feature and shows how software engineering practices can improve operations.

A must-read for engineers interested in reliability, monitoring, incident response, and automation.


Previous Book
Cloud Native DevOps with Kubernetes