Principles of SRE

From https://sre.google/workbook/how-sre-relates/ Chapter 1 - How SRE Relates to DevOps

Operations Is a Software Problem

The basic tenet of SRE is that doing operations well is a software problem. SRE should therefore use software engineering approaches to solve that problem.

Manage by Service Level Objectives (SLOs)

SRE does not attempt to give everything 100% availability.

Instead, the product team and the SRE team select an appropriate availability target for the service and its user base, and the service is managed to that SLO.9 Deciding on such a target requires strong collaboration from the business. SLOs have cultural implications as well: as collaborative decisions among stakeholders, SLO violations bring teams back to the drawing board, blamelessly.

Work to Minimize Toil

if a machine can perform a desired operation, then a machine often should.

This is a distinction (and a value) not often seen in other organizations, where toil is the job, and that’s what you’re paying a person to do.

For SRE in the Google context, toil is not the job—it can’t be. Any time spent on operational tasks means time not spent on project work—and project work is how we make our services more reliable and scalable.

Automate This Year’s Job Away

The real work in this area is determining what to automate, under what conditions, and how to automate it.

Over time, an SRE team winds up automating all that it can for a service, leaving behind things that can’t be automated (the Murphy-Beyer effect).

Move Fast by Reducing the Cost of Failure

The later in the product lifecycle a problem is discovered, the more expensive it is to fix. SREs are specifically charged with improving undesirably late problem discovery, yielding benefits for the company as a whole.

Share Ownership with Developers

Use the Same Tooling, Regardless of Function or Job Title

The more divergence you have, the less your company benefits from each effort to improve each individual tool.

https://sre.google/workbook/how-sre-relates