What makes a good metric?

I got into a discussion at work today about metrics – a discussion about correctness vs utility – and I wrote something that I thought would be of general interest.

------

The important feature of metrics is that they are useful, which generally means the following:

a) Sensitive to the actual thing that you are trying to measure (ie when the underlying value changes, the metric changes).

b) Positively correlated with the thing you are trying to measure (a change in the underlying value produces a move in the correct direction of the metric).

c) Not unduly influenced by other factors outside of the underlying value (ie a change in the underlying usage does not have a significant effect on the metric).

Those give you a decent measure. It’s nice to have other things – linearity, where a 10% in the underlying value results in a 10% move in the metric – but they aren’t a requirement for utility in many cases.

To determine utility, you typically do a static analysis, where you look at how the metric is calculated, figure out how that relates to what you are trying to measure, and generally try to come up scenarios that would break it. And you follow that up with empirical analysis, where you look at how it behaves in the field and see if it is generating the utility that you need.

The requirements for utility vary drastically across applications. If you are doing metrics to drive an automated currency trading system, then you need a bunch of analysis to decide that a metric works correctly. In a lot of other cases, a very gross metric is good enough – it all depends on what use you are going to make of it.

------

Two things to add to what I wrote:

Some of you have undoubtedly noticed that my definition for the goodness of a metric – utility – is the same definition that is use on scientific theories. That makes me happy, because science has been so successful, and then it makes me nervous, because it seems a bit too convenient.

The metrics I was talking about were ones coming out of customer telemetry, so the main factors I was worried about were how closely the telemetry displayed actual customer behavior and whether we were making realistic simplifying assumptions in our data processing. Metrics come up a lot in the agile/process world, and in those cases confounding factors are your main issue; people are very good at figuring out how to drive your surrogate measure in artificial ways without actually driving the underlying thing that you value.