"One accurate measurement is worth a thousand expert opinions," said computer scientist Grace Hopper. When it comes to software development and delivery, developers can never be fast enough—because of that, measuring performance is key to keeping up with the constant pressure to deliver at high speeds.
That is where DORA metrics enters the picture. The DORA metrics consist of four key traits that serve as a guideline for evaluating DevOps teams’ performance and helping them achieve higher performance rates. This article will escort you through these DORA metrics in detail and the challenges that come with them.
DORA, the DevOps Research and Assessment, was a startup obtained by Google in 2008 and provided assessments and reports on companies’ DevOps efforts. The mechanism behind DORA's studies to pinpoint the most efficient ways for software development and delivery lies in behavioral science.
It represents over six years of research along with 31,000 data points as the longest-running academically rigorous research program of its kind. In addition, DORA strives to help DevOps teams achieve high performance and improve capabilities through its metrics.
DORA identified four metrics to assess the performance of DevOps teams. These are:
The first two metrics, Deployment Frequency and Mean Lead Time for Changes, measure the speed aspect, whereas the Change Failure Rate and Time to Recovery evaluate the stability in DevOps.
By analyzing how teams score in these four aspects, you can categorize them into Elite, High-performing, Medium, and Low-performing, using the performance benchmarks for each metric. Let’s explore each of these metrics in more detail.
Fundamentally, deployment frequency is how long it takes for your organization to get a change in production. That is, how often the code is successfully released to the end user or deployed to production. It is an essential metric that evaluates the average throughput and displays the cadence of delivering value to the client. Continuous development, where changes are shipped in small, ongoing batches, is pivotal to achieving a high Deployment Frequency.
A higher Deployment Frequency grants you the benefit of receiving consistent feedback and securing customer retention by shipping values to the customers more quickly. DevOps elite performers continuously strive for continuous delivery every day, whereas low-performing teams often score a Deployment Frequency of about once per month.
The Mean Lead Time for Changes is the time it takes for the DevOps team to get a committed code to successfully running state in the production state. It aids DevOps teams in determining how healthy their cycle time is and whether the team can handle a high volume of requests. For instance, average-performing teams take about one week to complete this process successfully, whereas elite performers often achieve this within less than a day.
Having separate test teams and shared test environments are often responsible for diminishing a team's lead time. If you need to identify and narrow down the obstacles preventing your DevOps team from improving its lead time, try analyzing the different metrics such as Time to Merge, Time to Open, and Time to First Review in the development pipeline.
Bugs and failures are inevitable in a production environment where changes are frequent. The Change Failure Rate is a percentage value calculated by deriving the ratio of the total number of deployments to the total number of failures. It is a unique metric that indicates the number of changes that count toward failures in production. The failures mentioned here can be failed deployments, rollbacks, or incidents that require fixing.
As for the benchmark values, elite, high, and average performers score around 0 - 15% Change Failure Rates, whereas low-performers often have a Change Failure Rate of about 40% - 65%. Maintaining a low Change Failure Rate is essential to ensure that your DevOps team delivers quality code.
Teams with more automation tools have smaller Change Failure Rates because their development process is usually more established and consistent. The lack of automation often results in a higher Change Failure Rate because of the increased number of small changes needed.
Time to Recovery is a critical metric that directly impacts DevOps stability. Essentially, it is the time consumed to restore the service when a defect, accidental outage, or service impairment occurs. Usually, the Mean Time to Recovery (MTTR) for elite teams is less than one hour, whereas low-performing teams can take from one week to one month. Improving the observability and monitoring in DevOps teams will help ensure that failures are identified and recovered as soon as possible.
Goals and evidence are essential components a company requires to establish progress and stability. DORA metrics are a starting point that aids in determining where your team currently is and where they need to be by providing insights and evidence of the team's performance corresponding with the industry standards. These help the DevOps leads identify bottlenecks to achieve high performance and stability and improve decision-making.
Despite its many benefits, DORA metrics are a mixed blessing that needs constant attention to prevent adverse side effects. Each enterprise is different, and its metric goals, problems, and delivery environments can be different too. The scores you get need to be contextualized according to company needs and objectives.
Then there's the data. You need to collect and analyze vast amounts of data from various sources to get your scores. Collecting the raw data, followed by transforming this raw data into calculable units, can be a tumultuous process.
Lastly, DevOps isn't a linear process. It’s practical to turn these metrics into KPIs and shift your focus to getting high DORA metric scores. You may even be tempted to compare your teams’ scores with other companies’ scores to see how well you’re doing in the market. However, you cannot measure success only by evaluating numbers, and customers are the ones evaluating your product. Plus, many other factors must be considered in DevOps. One such primary concern is security.
DORA metrics are concentrated purely around DevOps performance speed and stability. But if you focus solely on the DORA metrics to evaluate the software development and delivery process, you are unintentionally compromising security. This will lead to more bugs and vulnerabilities in the application at later stages. Such a situation burdens your team with unnecessary workloads, worsens the user experience, and results in developer burnout.
Plus, the security world is ever-changing, and your security measures need to adapt to it. Long gone are the days when a year-long plan works. You must be agile to protect your software, get the right tools at the right time, and adapt to market changes. DORA metrics don’t adapt to the times, but you can, especially when combining them with value stream management. Value streaming is about continuous improvement - adjusting to customer needs, unforeseen problems, and market opportunities.
A big part of this is continuous security - ensuring that your code is protected from development all the way to production. Following DevSecOps practices, such as implementing shift-left security, will help you avoid ending up in a disadvantageous cycle that strips the organization of its productivity and puts your software at risk.
DORA metrics are a stepping stone for an organization to achieve high performance and stability in software delivery. But a robust DevSecOps practice is a must-have to fulfill DORA metrics’ missing elements. How about we simplify continuous security for you? Jit orchestrates and unifies security tools and controls into all stages of your CI/CD pipeline. When a pull request is created, Jit runs the relevant security tools so your developers can act upon the detected issues using our guidelines or automated remediation. Easy, isn’t it? You can get started for free.
About the Author: Ariel Beck:
Ariel is a Software Architect at Jit.io. He holds a B.Sc. in Computer Science from the Academic College of Tel Aviv-Yaffo. He has over 10 years of experience as a software architect in various fields and technologies, focusing on cloud-based solutions. Ariel is skilled in Kubernetes and AWS serverless technologies and has led multiple teams in adopting microservices. He is dedicated to helping JIT build a scalable and well-architected solution.