Site reliability engineering (SRE) is the practice of applying software engineering expertise to DevOps and operations problems. SRE, which was popularized by the 2016 publication of Site Reliability Engineering: How Google Runs Production Systems, often means proactively writing code and developing internal applications to combat reliability and performance concerns.
In SRE, service levels describe services provided to users within a given period of time in measurable terms. Service level objectives (SLOs) are the goals set for the availability expected out of a system. Service level indicators (SLIs) are the key measurements and metrics to determine the availability of a system. Service level agreements (SLAs) are the legal contracts that explain what is agreed upon and what happens if systems don’t meet SLOs.