2025 April 13
You can measure stuff
In 2019 I read How to Measure Anything by Douglas Hubbard. At the time, I worked on an employee engagement metrics SaaS product at Emplify (later acquired by 15Five). The book is dense with practical statistics. I think I have just enough statistics background (3 college courses) that I could have worked through the math. But instead, I read it as a business book and glazed over some of the finer details.
As a business book, my biggest takeaway is a paraphrase of the book's thesis:
If something is important to you, you can observe it. If you can observe it, you can measure it.
The impact of that message for me was this:
- Regarding my job at the time: A renewed belief that the "intangibles" of employee engagement were actually observable and thus measurable.
- Regarding my profession: An antidote against the deflating effect of Goodhart's law, which in the extreme implies that all measurements are worthless.
3 years later, I was managing the Cloud IDE team at dbt Labs. I tried to get buy-in from the team to elevate "PRs merged per week" to our primary metric. In my mind, this was a "golden metric." Totally ungameable. If individuals decided to juice the metric by making their PRs even smaller, all the better! More frequent, lower risk deployments.
I failed to get buy-in. They were still worried about the perverse incentives this metric could create. They were also rightfully concerned that shipping frequently does not mean you are shipping anything valuable. They were much more interested in holding themselves accountable to user-centric or product-centric. Which - I mean - what an awesome team. I think they were on the right track with that kind of thinking.
That said, I do still believe PRs merged or deployments per week is a solid metric. But it was lacking guardrails.
Nate Berkopec linked to the post Goodhart's Law Isn't as Useful as You Might Think by Cedric Chin in his "4 Lines Friday" newsletter. Cedric quotes academics and practitioners throughout. One quote that encapsulates my main takeaway is this:
Make it difficult to distort the system. Make it difficult to distort the data. Give people the slack necessary to improve the system.
I trusted my team to not distort the system. And they had every reason to trust themselves. But if I wanted buy-in on the metric's value, then I needed guardrails that protected other valuable pieces of our process.
I now realize I was suggesting we measure only 1 of the 4 DORA metrics, and didn't have a proposal to measure our impact on the business or customers. At the very least, measuring "change fail percentage" would have been a safeguard against gaming the system by quickly shipping low quality code.
dbt Labs and I have both matured a lot since I was a manager in 2021. Next time I lead a team, I intend to find ways to measure what is important in a way that incentivizes system improvement.