Since my organization moved to DevOps, we have struggled to implement metrics that address operational and cultural aspects of the model in order to drive improvement.
The struggle is an entirely normal part of the transition process, but in hindsight, some mistakes we made prolonged our suffering, including, which metrics we chose to measure our performance.
Pain is inevitable, suffering is optional.
My organization didn’t make all of the mistakes in this article, but, sharing what I learned along the way, including the antidotes, may help you avoid a fatal mistake.
Ultimately, the cost of employing the wrong metrics can devastate to your team culture, your business or both if you do not stay engaged and manage them properly.
Implementing the wrong metrics can create dangerous anti-patterns with devastating effects. Continued use of the wrong metrics (aka “tuning”) will rust the anti-patterns into place, reinforcing culture decline in the form of vanity, conflict and bureaucracy.
The first law of holes, states that “if you find yourself in a hole, stop digging”
If you recognize any of the following performance metrics being used in your culture and process, I suggest a coalition review to determine if those metrics are helping or hurting.
Vanity metrics are only skin deep, actionable metrics are forever.
They emphasize volume activity over quality activity and promote hero mentality in individuals. Certain individuals will excel or game this system to get ahead while others falter. This fosters a toxic team culture with hollow results.
Examples of vanity:
- total lines of code
- total tests added
- total deployments
Energy spent measuring vanity drives your team away from quality and is a complete waste of your time.
Implement and promote metrics that focus your team on collaborative quality activities like reducing tech debt, simplifying design or automating a manual process.
Metrics that pit teams or individuals against each other undermine team affinity. Nothing kills affinity faster in a group than a DevOps metric that promotes the individual over the team.
Examples of conflict:
- ranking teams based on failure metrics (broken builds, etc.)
- ranking individuals based on failure metrics
- ranking individuals using different standards (aka “making excuses”)
- rewarding top performers who do not collaborate
Any incentive that promotes self-preservation will introduce artificial restrictions on your innovation.
Sports teams with “a cancer in the locker room” do not win because the team’s potential is destroyed from the inside by conflict. The same is true for engineering teams. The wrong metrics allow this cancer to metastasize across the entire team.
This is an urgent issue that must be tackled immediately.
This metric sometimes manifests as a motivational strategy to drive overall performance by inciting the competitive juices of your individual players, but, this frequently backfires on cross-functional teams that must work closely to succeed.
Implement and promote metrics that focus your team on performance improvement activities like demonstrating how career and business goals coexist, providing training and mentorship and increased responsibilities.
Last and certainly not least is the most common mistake: employing traditional KPI in your DevOps model.
The impact is two-fold – not only are these metrics not designed to span teams or cross-functional areas (i.e. dev, test, ops, projects, performance, etc.) they put the wrong levers in the hands of the executive staff who are trying fix problems.
Examples of bureaucracy:
- metrics that focus on business velocity over quality or culture
- metrics that are difficult to maintain
- metrics that are familiar (aka “traditional”) because promoting the new model is “too hard” (aka “lack of commitment”)
By the time you realize your traditional agile (or gasp! waterfall) metrics do not fit the DevOps model, it might be too late. Making the right choices at the outset of your journey is critical.
Be sure to do your homework – understand the DevOps model and how it can help your business is crucial for success. You may have to defend your metrics against short-sighted or unfounded suggestions from within your coalition or executive staff.
Educate yourself on the right metrics that support your operations and your culture, especially as they relate to your business goals.
The best antidote for bad metrics is picking good ones. However, if your criteria for picking metrics is poor, you will prolong your suffering.
Good metrics should be:
- traceable in all environments
- behind a single pane of glass
Personally, I like to organize my metrics into categories so they can align with the familiar objectives and goals construct. Additionally, having a metric for each goal is a way to demonstrate thoroughness and depth of your coverage and commitment:
DevOps teams that focus on the speed of their development, delivery and response times for issues affecting customers will address business opportunity.
Consider the following:
- Lead Time: time it takes to go from dev to release; indicates process efficiency. highlights areas of improvement; should trend down over time
- Change Complexity: ability to track complexity of projects for comparison; indication of project difficulty; trends may spike but should level off over time
- Deployment Frequency: new code release frequency; indicates throughput; should trend up or remain stable over time.
- MTTR (Mean Time to Recovery): time it takes for team to recover from an incident; indicates agility; trend should decrease over time.
DevOps teams that focus on continuous improvements in quality will address business durability.
Consider the following:
- Deployment Success Rate: percentage of Prod deployments without an outage/rollback; indicates release quality; should trend up over time
- Application Error Rate: total unique errors per application, total applications with errors; indicates application health; should trend down over time
- Escaped Defects: total issues found in Prod; indicates release quality; trends down over time.
DevOps teams that focus on proactive monitoring of their platforms will address business resiliency.
- Availability: ability to meet demands without interruption; indicates customer impact; trends up or levels off at SLA.
- Scalability: ability to meet normal/spike load demands; indicates resilience; trends up and levels off over time
- Latency: event processing durations; indicates robustness; trends down over time.
- Resource Utilization: usage patterns and deviations; indicates raw performance; trends down or levels off over time.
DevOps teams that focus on customers, both internal and external, will address business capability.
- Usability: engagement and ease-of-use after a deployment
- Defect Age: number of customer submitted defects older than self-defined thresholds; ability to prioritize customer issues over internal defects.
- Subscription Renewals: customers vote with their clicks and their wallets; Deployment Success and Lead Time latency for defects/suggestions boost confidence in your ability to deliver on your commitments (aka your power rating).
When in doubt, keep it simple! Taking the initiative by picking the low hanging fruit gets results in your hands faster and provides much needed momentum.
No matter what, experiment! You need a clear understanding of your business to address your organization’s performance issues, but, taking a stroll outside the box might reveal additional insight that’s invaluable to your business.