Difference between revisions of "2899: Goodhart's Law"
(→Explanation) |
(Typo) |
||
Line 12: | Line 12: | ||
{{incomplete|Created by a METRIC OF METRICS PER METRIC METRIC - Please change this comment when editing this page. Do NOT delete this tag too soon.}} | {{incomplete|Created by a METRIC OF METRICS PER METRIC METRIC - Please change this comment when editing this page. Do NOT delete this tag too soon.}} | ||
− | A metric or {{w|performance indicator}} is a measure used to determine if a system is working as intended. {{w|Goodhart's law}} is the idea that when the metric becomes the only thing people focus on, all efforts will be directed to improving that metric rather than improving the system. For example, the number of cars sold could be used to measure how healthy a particular car company's business is, because the company makes money with each car sold. However, if sales people and dealerships are only measured by the number of cars sold, this could cause them to offer excessive discounts to make more sales, even if the sales are not profitable. This causes the metric to lose its correlation with the thing it was indirectly measuring (how much money the company was making based on how | + | A metric or {{w|performance indicator}} is a measure used to determine if a system is working as intended. {{w|Goodhart's law}} is the idea that when the metric becomes the only thing people focus on, all efforts will be directed to improving that metric rather than improving the system. For example, the number of cars sold could be used to measure how healthy a particular car company's business is, because the company makes money with each car sold. However, if sales people and dealerships are only measured by the number of cars sold, this could cause them to offer excessive discounts to make more sales, even if the sales are not profitable. This causes the metric to lose its correlation with the thing it was indirectly measuring (how much money the company was making based on how many cars it sold) and the metric ceases to be a 'good' metric. |
Metrics can still be useful indicators but are, by their very nature, a narrow view of a more complicated situation. A school's exam results may ''suggest'' how well the school works with its pupils, but may lead to rigidly "teaching to the exams" and lead to less enjoyment and ability of life-long learning, or even flexibility in non-academic activities. A hospital may initially assess all its incoming patients within very strict deadlines, but then end up having misdiagnosed many of them. A country's exports could go beyond expectations, but that could be associated with its own currency being effectively worthless on the world market. | Metrics can still be useful indicators but are, by their very nature, a narrow view of a more complicated situation. A school's exam results may ''suggest'' how well the school works with its pupils, but may lead to rigidly "teaching to the exams" and lead to less enjoyment and ability of life-long learning, or even flexibility in non-academic activities. A hospital may initially assess all its incoming patients within very strict deadlines, but then end up having misdiagnosed many of them. A country's exports could go beyond expectations, but that could be associated with its own currency being effectively worthless on the world market. |
Revision as of 11:22, 27 February 2024
Goodhart's Law |
Title text: [later] I'm pleased to report we're now identifying and replacing hundreds of outdated metrics per hour. |
Explanation
This explanation may be incomplete or incorrect: Created by a METRIC OF METRICS PER METRIC METRIC - Please change this comment when editing this page. Do NOT delete this tag too soon. If you can address this issue, please edit the page! Thanks. |
A metric or performance indicator is a measure used to determine if a system is working as intended. Goodhart's law is the idea that when the metric becomes the only thing people focus on, all efforts will be directed to improving that metric rather than improving the system. For example, the number of cars sold could be used to measure how healthy a particular car company's business is, because the company makes money with each car sold. However, if sales people and dealerships are only measured by the number of cars sold, this could cause them to offer excessive discounts to make more sales, even if the sales are not profitable. This causes the metric to lose its correlation with the thing it was indirectly measuring (how much money the company was making based on how many cars it sold) and the metric ceases to be a 'good' metric.
Metrics can still be useful indicators but are, by their very nature, a narrow view of a more complicated situation. A school's exam results may suggest how well the school works with its pupils, but may lead to rigidly "teaching to the exams" and lead to less enjoyment and ability of life-long learning, or even flexibility in non-academic activities. A hospital may initially assess all its incoming patients within very strict deadlines, but then end up having misdiagnosed many of them. A country's exports could go beyond expectations, but that could be associated with its own currency being effectively worthless on the world market.
In the comic, White Hat suggests addressing the problem of 'bad metrics' by creating a metric of how many metrics have become a target, and immediately suggests a way to direct all efforts to improving that metric (to give a bonus to anyone who finds a metric that has been targeted), thereby making it, by design, not a good metric. The financial incentive to find metrics that are targets, presumedly to then remove them, will have people concentrating more on interpreting metrics in the worst possible way (to be rewarded and/or to reduce the amount of rewards given out), to the detriment of properly fulfilling the tasks for which the metrics were originally designed to measure (especially if they are now no longer even being measured).
The title text continues the joke, by presenting a new metric (changes per hour) in a pleased manner, although that many changes (and offered bonuses per hour) would likely be ruinous to their organization, as "change for change's sake" becomes the overriding factor in every aspect of its operation. Indeed, that they are replacing metrics implies that any actual attempt at streamlining operations has become secondary to useless cyclic activity.
Transcript
- [Cueball and White Hat are standing and talking, White Hat with hand on his chin.]
- Cueball: When a metric becomes a target, it ceases to be a good metric.
- White Hat: Sounds bad. Let's offer a bonus to anyone who identifies a metric that has become a target.
Discussion
I don't think there's anything else that could be included in the transcript, so i'm deleting the incomplete tag. if anyone has an idea to make it better, just add it. i know it seems too soon, but there's really nothing else to the comic. New editor (talk) 22:17, 26 February 2024 (UTC)
This happens all the time. For instance, a call center whose metric-turned-target is number of calls handled per hour (which sounds good in theory) is incentivised to hang up on callers, who then call back - increasing their "performance" as measured by the target, as it both decreases the time each call takes (thus making time for more calls) and increases the volume of incoming calls. Of course, the side effect is ticked-off customers heading to competitors instead. (Which often doesn't affect the call center as it's a third party.) If the metric-turned-target is getting a good survey response at the end of the call, treating the customer so badly they hang up (and thus don't take the survey) for any call that is going poorly becomes a viable way of improving the measurement of their performance. Creating good targets is HARD. 172.70.43.157 22:38, 26 February 2024 (UTC)
Moderator (talk) 23:12, 26 February 2024 (UTC) Moderator (talk) 23:12, 26 February 2024 (UTC) Moderator (talk) 23:12, 26 February 2024 (UTC)
- The above, by 'Moderator' appears to be a meta-joke. i.e. trying to enhance 'times signed', which of course isn't even a useful measure, at the expense of bringing anything useful to the situation. It was even done in just one edit, so didn't even increase the standard 'contributions' measure that an actual target-hitter might try to hit.
- Either that or they messed up/have other machinations in mind. But I just thought I'd 'dissect the frog' for future readers. 172.70.91.165 04:19, 27 February 2024 (UTC)
The main problem with metrics is that there can be too many (everything is a metric, you're chasing targets even if just trying to be the most average and not to be an outlier) or there are too few (everything is 'boiled down' to a single figure of 'success', with no nuance available to work out why it's marked as "good" rather than "excellant ). Or both at the same time! That said, I think changing a target-system to be a less-worse-target-system is often the worst of all worlds, as every meaningful measure is changed, and/or the means to measure them are changed, all this impinging upon the actual job of work that was actually always supposed to be done, regardless... 172.70.91.165 04:19, 27 February 2024 (UTC)
- Probably the worst metric/target is the perpetual growth delusion. Your office furniture sales figures are down fifteen percent from this month last year. Nevermind that they were up three thousand percent last year because your biggest customer had to replace the furniture lost in a fire. 172.71.26.16 06:33, 27 February 2024 (UTC)
Feels like this comic is really about how incentives are difficult. A metric only becomes a target if there's an incentive, and that's only a problem if the incentive is poorly conceived. For anyone who hasn't spent a lot of time thinking about metrics and reads this comic and thinks that metrics are the crux of the issue, they're not; incentive design is. Laser813 (talk) 11:53, 27 February 2024 (UTC)
- Yes, and no. Metrics in and of themselves have a psychological power and tend to direct attention, and therefore action, to the things being measured. So good incentive design (and other psychological framing) is then needed to counteract that biasing effect.172.70.90.28 14:08, 27 February 2024 (UTC)
- The issue comes in the moment the incentive is to "improve the metric" rather than "improve the thing the metric is intended to indicate." For example, there's the Hot Waitress Economic Index, whereby the sexier the average waitress, the worse the economy is doing (as attractive women usually have no problem getting jobs in sales when the economy is doing well). If someone comes up with the brilliant idea of fixing the economy by recruiting more unattractive waitresses, the metric no longer measures the thing it is supposed to at all. 172.69.247.49 18:22, 27 February 2024 (UTC)
- Exactly. It can be incredibly mundane things - a store I worked in encouraged the inclusion of accessories with main purchases, obviously, but also used to discourage us from selling accessories if customers remembered as they were leaving, after the main sale. If we "allowed" it, the Average Transaction Value and Items Per Basket indicators would both be down. Same stuff being sold, but if it was sold separately from the thing it supplemented, that was a bad thing.
- It can also be much bigger, more important things - good figures for DEI targets doesn't necessarily mean attitudes towards people from traditionally disadvantaged demographics have improved, it just means firms have been told to employ more of them. If somebody is given a leg up but you only measure how many are sitting up high...how do you tell if the need for a leg up is lessening? And are you really combating the wider need for legups to be given if you keep giving them to ensure targets are met? What's the incentive for improving the big picture if the obsession is with improving a few small details? Yorkshire Pudding (talk) 22:47, 27 February 2024 (UTC)
- Ugh, yes. Some companies meet their DEI targets by interviewing people based on DEI criteria instead of looking at skills and experience. I was once denied an interview that way - being a white male candidate, the hiring manager explicitly told me I couldn't be considered until all the "diversity candidates" had been rejected. 172.70.42.150 00:05, 29 February 2024 (UTC)
In early days of computer programming managers tried to assess the performance of programmers in a way that they would assess the performance of assembly line workers and decided to use the metric of "lines of code per day". The results were laughable. There was also the, possibly apocryphal, story from the old Soviet Union where the government rewarded automobile plants for meeting certain quotas for number of cars produced, and rewarded scrap metal facilities for meeting certain quotas for number of cars demolished, and it wasn't long before the facilities figured out that delivering the cars of dubious value straight to junk yards was the most efficient and rewarding way to operate. Rtanenbaum (talk) 21:08, 27 February 2024 (UTC)
- Unfortunately, some languages including Java still have build-in support for such measurements. -- Hkmaly (talk) 15:26, 29 February 2024 (UTC)
Need to remove some text
The whole section from the headline "Discussion of the promises and perils of operational measurement" up to the transcript should be eliminated. This page is supposed to be an explanation of a comic, not an exposition on operational management! If such an exposition is really needed (IMO it's not, but there's room for disagreement), please just put in a link to one, don't copy it here. DKMell (talk) 04:09, 29 February 2024 (UTC)