When a company started to embrace DevOps, they will begin to realize the need and benefits of adopting site reliability engineering (SRE) into the practice. SRE has no clear definition, but, Ben Treynor Sloss, a Google’s VP of engineering or specifically the man behind the introduction of SRE coined the term as “It is what happens when you ask a software engineer to design an operation function”. How does SRE work with DevOps ?
While both SRE and DevOps aim to bridge the gap between development and operations, SRE is more focused on system availability and reliability. In other words, SRE is the application of DevOps principle with its main objectives pivoted to services reliability. The following describes how SRE works with DevOps.
Eliminating silos
SRE team helps solve issues created by DevOps team through engineering practices. By using the same tools as the DevOps teams, the SRE teams share the responsibility to reduce bugs during the sprints.
Accepting failure as normal
Like DevOps, SRE accepts the possibility of failures. SRE practices error budget which is an approach that tolerates acceptable failure risk. With this practice, based on SLO, the team will obtain a clear view of acceptable risks thus having the ability to focus on improving reliability.
Making gradual changes
Both SRE and DevOps promote continual improvement through change. SRE need small and frequent changes so that inevitable changes can be tested and implemented at any time. In addition, SRE focuses in detecting and solving the issues as early as possible by practicing rollback early, roll back often. Through this practice, the team rollbacks first, and explore the problem second when an error or risk is detected in a release. As a result, the mean time to recovery (MTTR) is reduced.
Leveraging tools
Site reliability engineers share the same responsibilities in product success by applying the same tools as the DevOps team. There are various SRE tools that can be applied into the stages in DevOps pipelines. For example, project management and tracking tools such as JIRA in ‘plan‘ stage and container orchestration services like Kubernetes or Mesosphere in ‘package‘ stage.
Measure everything
Both SRE and DevOps are data-driven approach. DevOps treats metrics as crucial where every change is measured to ensure the outcome is obtained as expected. SRE measure the system’s success availability by providing three metrics as below.
- Service level indicator (SLI) – Quantitative measure of service performance
- Service-level objective (SLO) – Key threshold values for every SLI that determine the service availability and quality
- Service level agreement (SLA) – Agreement between the customer and service provider about the ability of products in meeting certain level over a certain period.
In summary, the concept of SRE is not introduced to challenge DevOps popularity as an approach in the modern software engineering. It is an approach that further strengthen the capabilities of DevOps in improving customer experiences. As a conclusion, SRE is the solution that changes the mindset of prioritising speed and innovation over reliability.
E-SPIN Group in the business of enterprise ICT solutions supply, consulting, project management, training and maintenance, for multinational corporations and government agencies across the region E-SPIN did business, since 2005. Feel free to contact E-SPIN for your requirements and project inquiry.