I don’t know about you but I have serious doubts about any decision we made using scorecards after going through so many iterations adjusting weights like that.
To be clear, I’m not saying there’s no valid combination of scores that actually correlates with ideal outcomes, but when was the last time you actually checked that correlation?
In fact, there are so many things wrong with this approach that it might even be driving you to make worse decisions than just a pure gut feel. Let’s explore all the ways things can go wrong.
Problem 1 – Bad Math
Ordinal Values
Believe it or not, the scores you assign aren’t quantitative values. They are what are known as ordinal numbers. This means that the value 4 is more than 2, but not necessarily twice as much as 2. The math of addition and multiplication that you’re used to simply does not apply to ordinal values. Adding and multiplying these numbers has unintended consequences and will more likely add errors to your analysis ¹.
Problem 2 – Extreme Rounding Errors
Range Compression
Let’s say one of your evaluation columns is the return on investment. You may assign 2 projects the score of 5 on return on investment (ROI), but in reality one of the projects may have 3x higher ROI. When you’re using an ordinal scale, you have no choice but to map the underlying real-world values to a small range of numbers. This “compresses” the range of possible outcomes in a way that does not reflect their real-world distance. The range compression effect is something ordinal scales suffer from and they can get amplified if you’re t-shirt sizing or using a small scale like 1-5.
Problem 3 – Ambiguous Labels
The Illusion of Communication
Although everyone was nodding when you say something like “Impact”, these words mean completely different things to different people. Maybe “Impact” means “revenue” to a CFO, or maybe it means “Reduced Downtime” to an engineer. Even if you Agree on your definitions upfront, there is so much variation in people’s perspectives that you can never really derive the same exact meaning.
Similarly, who’s to say that a score of 4 for me is the same as a 4 for you? If we’re talking about costs, I could be thinking $10,000 and you could be thinking $20,000 and we’d never know the difference cause we seemingly agreed with each other.
Even if you clearly define and document rules for what constitutes each value of the scores, your decisions will still suffer from range compression, mentioned above.
Problem 4 – Avoiding Extremes
Centering Bias
To make matters worse, people tend to have a centering bias² causing their responses to fall around only a few values, compressing the range even further! You might notice this phenomenon when you see around 75% of the values fall between 3-4 on a 5 point scale because the participants were “reserving” the most extreme outcomes.
Problem 5 – The Biggest Problem of All
Pseudo-Quantitative Results
The biggest trouble at the heart of scorecards is that they don’t represent quantitative values you care about.
Do I care about a score?
No, I care about profit, risk, costs, etc. Instead of making up a number to represent those things, we can actually measure and model the values we’re actually interested in!
Does your project increase efficiency? Quantify the actual time saved by employees.
Does your strategy reduce the risk of failure? Measure that in expected dollars lost.
What to do instead
There are a number of alternative techniques you can use to model such decisions. While the level of complexity is slightly elevated, at least they won’t add more error to your analysis.
Slightly Better – Sum of Z-Scores
The simplest modification you can make is to make a small transformation to your scores to convert them to standardized z-scores. We take the values for one attribute among all of the evaluated options and creates a normalized distribution for it so that the average is zero and each value is converted to a number of standard deviations above or below the mean. Finally, all the numbers are added up to the final result.
Initially proposed by Robyn Dawes, This method removes some of the problems with scorecards associated with varying scales of the input attributes (for example, if Impact is on a scale of 1-5 but cost in on a scale of 1-10).
Let’s see it in action for our example above.
- Convert your attributes to cardinal values instead of ordinal scores. For example, for “Impact”, “Effort”, and “Cost”, we could use “Additional Revenue”, “Implementation Time” and “Cost of Implementation” estimates. This alone will greatly improve the decision from this model because you are less susceptible to the effects of ambiguity, range compression, and centering bias. For the sake of simplifying comparisons, I will keep the scores as they are.
- Compute the mean of each attribute across all items. For example, mean Impact in excel would be calculated as
=AVERAGE(4,10,7) = 7
- Compute the standard deviation for each column. You can do this in excel using the formula
=STDEVP(4,10,7) = 2.4
- For each value in the column, calculate the z-score by subtracting the mean and dividing by the standard deviation
- The final outcome for each column is a score as low as -3 and as high as +3. Simply adding these up will give you a better relative score you can use to compare options