Editing 2899: Goodhart's Law

{{comic
| number    = 2899
| date      = February 26, 2024
| title     = Goodhart's Law
| image     = goodharts_law_2x.png
| imagesize = 295x321px
| noexpand  = true
| titletext = [later] I'm pleased to report we're now identifying and replacing hundreds of outdated metrics per hour.
}}

==Explanation==
{{incomplete|Created by a METRIC OF METRICS PER METRIC METRIC (IN <s>IMPERIAL</s> US CUSTOMARY UNITS) - Please change this comment when editing this page. Do NOT delete this tag too soon.}}

In this comic, [[White Hat]] suggests creating a metric, "number-of-metrics-that-have-become-targets," and making it a target.

First, the comic introduces and defines {{w|Goodhart's Law}}, which is the observation that when a metric — a {{w|performance indicator|measure of performance}} — becomes a goal, efforts will be unhelpfully directed to improving that ''metric'' at the expense of systemic objectives. 

For example, imagine a scenario in which a car dealership is looking to grow profits, and its managers decide to focus on increasing a component metric of profit: how many cars it sells. So they offer a bonus to their salespeople to sell more cars. But then the salespeople offer deep discounts to rack up sales, rendering the car sales unprofitable. This example shows how a ''metric'' (cars sold) can become the ''target'', replacing the real target, profit growth, if individual incentives are not properly managed.

White Hat's suggestion could be a good or a bad idea. It all depends on how the incentive is implemented:

* A '''good implementation''' would award bonuses only for finding metrics which truly aren't serving their purpose, so the organization's managers could fix the measurement issues. If bonuses are awarded only for approved submissions and the identifications result in real improvements, the organization will benefit from each misaligned incentive that is removed.

* A '''bad implementation''' would offer a bonus to every identification — regardless of quality. This would incentivise the removal of metrics that do still serve a useful purpose, despite the development of target behaviour around them (reality is not quite as black and white as Goodhart's Law suggests), and perhaps even the ''creation'' of new metrics-as-targets for the sole purpose of then removing them and collecting the bounty.

The title text imagines this '''bad implemention''', leading to the creation of a new metric (metric changes per hour) and the organization identifying — and ''replacing'' — hundreds of metrics per hour, crowding out actual focus on the organization's true goals. It's the ultimate example of "change for change's sake." Ironically, none of the replaced metrics seem to be the one causing the issue — White Hat's original suggestion.

This comic illustrates that the thoughtless combination of Goodhart's Law and poorly designed incentives can have ruinous results for an organization.

The proper usage of organizational metrics and incentives is the focus of {{w|managerial accounting}}, a field within organizational management.

===Additional examples of Goodhart's Law===
* A school's exam results may ''suggest'' how well the school works with its pupils, but may lead to rigidly "teaching to the exams" and lead to less enjoyment and ability of life-long learning, or even flexibility in non-academic activities. 
* A hospital measures inpatient ''Length of Stay'' because shorter stays save money and also free up beds for any admitted patients waiting in the ER. But if improperly incentivized, doctors may discharge inpatients too early, causing some to "bounce back" to the hospital as a costly readmission.
* A call center measures the number of calls handled per hour, but poorly decides to overly incentivize this metric to make the workers more productive; that leads workers to cut calls short, frustrating customers.

==Transcript==
:[Cueball and White Hat are standing and talking, White Hat with hand on his chin.]
:Cueball: When a metric becomes a target, it ceases to be a good metric.
:White Hat: Sounds bad. Let's offer a bonus to anyone who identifies a metric that has become a target.

{{comic discussion}}

[[Category:Comics featuring Cueball]]
[[Category:Comics featuring White Hat]]
[[Category:Statistics]]
@@ Line 10: / Line 10: @@
 ==Explanation==
+{{incomplete|Created by a METRIC OF METRICS PER METRIC METRIC (IN <s>IMPERIAL</s> US CUSTOMARY UNITS) - Please change this comment when editing this page. Do NOT delete this tag too soon.}}
-In this comic, [[White Hat]] suggests creating a meta-metric, "number-of-metrics-that-have-become-targets," and making it a target.
+In this comic, [[White Hat]] suggests creating a metric, "number-of-metrics-that-have-become-targets," and making it a target.
-First, Cueball introduces and defines {{w|Goodhart's Law}}, which is the observation that when a metric — a {{w|performance indicator|measure of performance}} — becomes a goal, efforts will be unhelpfully directed to improving that ''metric'' at the expense of systemic objectives.
+First, the comic introduces and defines {{w|Goodhart's Law}}, which is the observation that when a metric — a {{w|performance indicator|measure of performance}} — becomes a goal, efforts will be unhelpfully directed to improving that ''metric'' at the expense of systemic objectives.
 For example, imagine a scenario in which a car dealership is looking to grow profits, and its managers decide to focus on increasing a component metric of profit: how many cars it sells. So they offer a bonus to their salespeople to sell more cars. But then the salespeople offer deep discounts to rack up sales, rendering the car sales unprofitable. This example shows how a ''metric'' (cars sold) can become the ''target'', replacing the real target, profit growth, if individual incentives are not properly managed.
-Hearing about Goodhart's Law, White Hat suggests eliminating metrics that have become targets.
+White Hat's suggestion could be a good or a bad idea. It all depends on how the incentive is implemented:
-White Hat's suggestion could be a good or a bad idea. It all depends on how the bonus incentive is awarded:
+* A '''good implementation''' would award bonuses only for finding metrics which truly aren't serving their purpose, so the organization's managers could fix the measurement issues. If bonuses are awarded only for approved submissions and the identifications result in real improvements, the organization will benefit from each misaligned incentive that is removed.
-* A '''well-designed implementation''' would award bonuses only for finding metrics which truly aren't serving their purpose, so the organization's managers could fix the measurement issues (assuming the fix isn't worse than the status quo), and would employ sufficient management oversight to discourage trivial submissions. If submissions are in good faith, bonuses are awarded only for approved submissions, and the identifications result in real improvements, the organization will likely be better off.
+* A '''bad implementation''' would offer a bonus to every identification — regardless of quality. This would incentivise the removal of metrics that do still serve a useful purpose, despite the development of target behaviour around them (reality is not quite as black and white as Goodhart's Law suggests), and perhaps even the ''creation'' of new metrics-as-targets for the sole purpose of then removing them and collecting the bounty.
-* A '''poorly-designed implementation''' would offer a bonus to every identification, regardless of quality. This would incentivize the identification of even quite useful metrics — and perhaps even the ''creation'' of new metrics-as-targets for the sole purpose of then removing them and collecting the bounty.
+The title text imagines this '''bad implemention''', leading to the creation of a new metric (metric changes per hour) and the organization identifying — and ''replacing'' — hundreds of metrics per hour, crowding out actual focus on the organization's true goals. It's the ultimate example of "change for change's sake." Ironically, none of the replaced metrics seem to be the one causing the issue — White Hat's original suggestion.
-The title text imagines this '''poorly-designed implementation''', leading to the creation of a new metric (metric changes per hour) and the organization identifying — and ''replacing'' — hundreds of metrics per hour, crowding out actual focus on the organization's true goals. It's the ultimate example of "change for change's sake."
-Part of the joke is that White Hat's original suggestion — the new metric causing the issue and one that ''should'' be replaced — seems to be ironically surviving the replacement of hundreds of other metrics.
 This comic illustrates that the thoughtless combination of Goodhart's Law and poorly designed incentives can have ruinous results for an organization.
 The proper usage of organizational metrics and incentives is the focus of {{w|managerial accounting}}, a field within organizational management.
-===Discussion of the promises and perils of operational measurement===
-While there is a temptation to game any metric, measurement is the main objective way of describing the success of an activity and assessing the effect of changes. "Data-driven" or "evidence-based" approaches are used to drive measurable improvements in various areas of society.
-Discussions of Goodhart's Law have noted [https://commoncog.com/goodharts-law-not-useful/] that people may respond to a metric by either (1) improving the system, (2) distorting that system (examples below), or (3) distorting the data (e.g., governments publishing false or cherry-picked economic data). Channeling energy toward improvement requires an organization to make (1) more appealing (flexibility and culture) and the others less (transparency, culture, reduced pressure to meet unrealistic goals). Figuring out how to do that involves a slow and thoughtful process unlike White Hat's kneejerk jump to a new metric.
 ===Additional examples of Goodhart's Law===
-* The classical example of Goodhart's Law is the {{w|Perverse_incentive#The_original_cobra_effect|Cobra Effect}}: anecdotally the British rule in India paid bounties for dead cobras as a pest control effort. People quickly realized that more cobras allowed them to harvest more for the bounty, and began actively breeding cobras.
+* A school's exam results may ''suggest'' how well the school works with its pupils, but may lead to rigidly "teaching to the exams" and lead to less enjoyment and ability of life-long learning, or even flexibility in non-academic activities.
-* School test scores are intended as a metric for how well a school is teaching its students. When that becomes an incentivized target, schools are forced to design their curriculum around the exams, which can create a more rigid system which fails to engage students and teachers. In extreme cases, this can motivate decisions to remove underperforming students from school districts, or encourage teachers to allow or even facilitate cheating.
+* A hospital measures inpatient ''Length of Stay'' because shorter stays save money and also free up beds for any admitted patients waiting in the ER. But if improperly incentivized, doctors may discharge inpatients too early, causing some to "bounce back" to the hospital as a costly readmission.
-* A hospital measures inpatient ''Length of Stay'' because shorter stays save money and free up beds for other patients. But this metric, on its own, may encourage doctors to discharge patients too soon. This not only puts patients at risk, but can also result in costly re-admissions.
+* A call center measures the number of calls handled per hour, but poorly decides to overly incentivize this metric to make the workers more productive; that leads workers to cut calls short, frustrating customers.
-* A call center measures the number of calls handled per hour as a measure of worker productivity. This can drive workers to rush through calls, terminating them as quickly as possible, which can lead to short, frustrating interactions.
-* The hypothetical {{w|Instrumental convergence#Paperclip maximizer|Paperclip Maximizer}} concept demonstrates how having a seemingly benign metric as a goal might still result in almost unlimited adverse effects, if unchecked.
 ==Transcript==