2899: Goodhart's Law
Goodhart's Law |
Title text: [later] I'm pleased to report we're now identifying and replacing hundreds of outdated metrics per hour. |
Explanation
This explanation may be incomplete or incorrect: Created by a BOT - Please change this comment when editing this page. Do NOT delete this tag too soon. If you can address this issue, please edit the page! Thanks. |
Transcript
This transcript is incomplete. Please help editing it! Thanks. |
- [Cueball is talking to White Hat. White Hat has a hand on his chin.]
- Cueball: When a metric becomes a target, it ceases to be a good metric.
- White Hat: Sounds bad. Let's offer a bonus to anyone who identifies a metric that has become a target.
Discussion
I don't think there's anything else that could be included in the transcript, so i'm deleting the incomplete tag. if anyone has an idea to make it better, just add it. i know it seems too soon, but there's really nothing else to the comic. New editor (talk) 22:17, 26 February 2024 (UTC)
This happens all the time. For instance, a call center whose metric-turned-target is number of calls handled per hour (which sounds good in theory) is incentivised to hang up on callers, who then call back - increasing their "performance" as measured by the target, as it both decreases the time each call takes (thus making time for more calls) and increases the volume of incoming calls. Of course, the side effect is ticked-off customers heading to competitors instead. (Which often doesn't affect the call center as it's a third party.) If the metric-turned-target is getting a good survey response at the end of the call, treating the customer so badly they hang up (and thus don't take the survey) for any call that is going poorly becomes a viable way of improving the measurement of their performance. Creating good targets is HARD. 172.70.43.157 22:38, 26 February 2024 (UTC)
Moderator (talk) 23:12, 26 February 2024 (UTC) Moderator (talk) 23:12, 26 February 2024 (UTC) Moderator (talk) 23:12, 26 February 2024 (UTC)
- The above, by 'Moderator' appears to be a meta-joke. i.e. trying to enhance 'times signed', which of course isn't even a useful measure, at the expense of bringing anything useful to the situation. It was even done in just one edit, so didn't even increase the standard 'contributions' measure that an actual target-hitter might try to hit.
- Either that or they messed up/have other machinations in mind. But I just thought I'd 'dissect the frog' for future readers. 172.70.91.165 04:19, 27 February 2024 (UTC)
The main problem with metrics is that there can be too many (everything is a metric, you're chasing targets even if just trying to be the most average and not to be an outlier) or there are too few (everything is 'boiled down' to a single figure of 'success', with no nuance available to work out why it's marked as "good" rather than "excellant ). Or both at the same time! That said, I think changing a target-system to be a less-worse-target-system is often the worst of all worlds, as every meaningful measure is changed, and/or the means to measure them are changed, all this impinging upon the actual job of work that was actually always supposed to be done, regardless... 172.70.91.165 04:19, 27 February 2024 (UTC)
- Probably the worst metric/target is the perpetual growth delusion. Your office furniture sales figures are down fifteen percent from this month last year. Nevermind that they were up three thousand percent last year because your biggest customer had to replace the furniture lost in a fire. 172.71.26.16 06:33, 27 February 2024 (UTC)
Feels like this comic is really about how incentives are difficult. A metric only becomes a target if there's an incentive, and that's only a problem if the incentive is poorly conceived. For anyone who hasn't spent a lot of time thinking about metrics and reads this comic and thinks that metrics are the crux of the issue, they're not; incentive design is. Laser813 (talk) 11:53, 27 February 2024 (UTC)
- Yes, and no. Metrics in and of themselves have a psychological power and tend to direct attention, and therefore action, to the things being measured. So good incentive design (and other psychological framing) is then needed to counteract that biasing effect.172.70.90.28 14:08, 27 February 2024 (UTC)
- The issue comes in the moment the incentive is to "improve the metric" rather than "improve the thing the metric is intended to indicate." For example, there's the Hot Waitress Economic Index, whereby the sexier the average waitress, the worse the economy is doing (as attractive women usually have no problem getting jobs in sales when the economy is doing well). If someone comes up with the brilliant idea of fixing the economy by recruiting more unattractive waitresses, the metric no longer measures the thing it is supposed to at all. 172.69.247.49 18:22, 27 February 2024 (UTC)
- Exactly. It can be incredibly mundane things - a store I worked in encouraged the inclusion of accessories with main purchases, obviously, but also used to discourage us from selling accessories if customers remembered as they were leaving, after the main sale. If we "allowed" it, the Average Transaction Value and Items Per Basket indicators would both be down. Same stuff being sold, but if it was sold separately from the thing it supplemented, that was a bad thing.
- It can also be much bigger, more important things - good figures for DEI targets doesn't necessarily mean attitudes towards people from traditionally disadvantaged demographics have improved, it just means firms have been told to employ more of them. If somebody is given a leg up but you only measure how many are sitting up high...how do you tell if the need for a leg up is lessening? And are you really combating the wider need for legups to be given if you keep giving them to ensure targets are met? What's the incentive for improving the big picture if the obsession is with improving a few small details? Yorkshire Pudding (talk) 22:47, 27 February 2024 (UTC)
- Ugh, yes. Some companies meet their DEI targets by interviewing people based on DEI criteria instead of looking at skills and experience. I was once denied an interview that way - being a white male candidate, the hiring manager explicitly told me I couldn't be considered until all the "diversity candidates" had been rejected. 172.70.42.150 00:05, 29 February 2024 (UTC)
In early days of computer programming managers tried to assess the performance of programmers in a way that they would assess the performance of assembly line workers and decided to use the metric of "lines of code per day". The results were laughable. There was also the, possibly apocryphal, story from the old Soviet Union where the government rewarded automobile plants for meeting certain quotas for number of cars produced, and rewarded scrap metal facilities for meeting certain quotas for number of cars demolished, and it wasn't long before the facilities figured out that delivering the cars of dubious value straight to junk yards was the most efficient and rewarding way to operate. Rtanenbaum (talk) 21:08, 27 February 2024 (UTC)
- Unfortunately, some languages including Java still have build-in support for such measurements. -- Hkmaly (talk) 15:26, 29 February 2024 (UTC)
Need to remove some text
The whole section from the headline "Discussion of the promises and perils of operational measurement" up to the transcript should be eliminated. This page is supposed to be an explanation of a comic, not an exposition on operational management! If such an exposition is really needed (IMO it's not, but there's room for disagreement), please just put in a link to one, don't copy it here. DKMell (talk) 04:09, 29 February 2024 (UTC)