Editing 2610: Assigning Numbers

{{comic
| number    = 2610
| date      = April 22, 2022
| title     = Assigning Numbers
| image     = assigning_numbers.png
| titletext = Gödel should do an article on which branches of math have the lowest average theorem number.
}}

==Explanation==
{{incomplete|Created by a DATA YOU CAN DO *MATH* ON - Please change this comment when editing this page. Do NOT delete this tag too soon.}}

Kurt Gödel introduced {{w|Gödel numbering}} with his landmark {{w|incompleteness theorems}}. It numbered twelve basic arithmetic and logical operations from one to twelve, and then used this numbering system and prime numbers to create a way to write any logical or mathematical statement as a single number. This made it possible to create statements about mathematics from mathematics. Gödel then introduced a statement that essentially said "this statement has no proof". If the statement could be proved, the statement would be false, there should be no proof, and mathematics would be inconsistent. The only other possibility is that the statement is true without a mathematical proof, and mathematics is incomplete. Gödel's theorem led to a fundamental reckoning in the world of mathematics when it was published.

{{w|Data science}} tries to extract knowledge and insights from noisy data. The comic expresses the irony that this mechanism that underlies one of the most profound theorems of 20th century mathematics is also used to implement all bad data science. While it's possible to assign numeric values to random pieces of data, these numbers are generally not meaningful enough to compute with and draw inferences from. It is generally only possible to perform statistical analysis on actual measurements, not arbitrarily-assigned values.

Machine learning algorithms, which are commonly used by data scientists, typically require all their inputs to be numerical. However, most datasets contains categorical features (e.g. the description of a piece of furniture: chair, table, ...). Data scientists therefore use encoding techniques to convert these categorical features to a numerical form so they can be used as inputs to a machine learning model. For instance, label encoding consists in arbitrarily assigning an integer to a category (chair=0, table=1, ...) which may appear meaningless to most observers.

The title text suggests that Gödel should perform such an analysis on different branches of mathematics, by calculating the average of all the fields' theorems' Gödel numbers. This is nonsensical for a number of reasons: 
:1) Gödel is long dead, so he can't write an article{{citation needed}};
:2) Gödel numbers grow very large very quickly, and depend heavily on the specific values assigned to each logical operator. Therefore the results could be manipulated simply by changing the numbering order of each operator;
:3) It may be very hard to gather all theorems in a field, or even a representative sample;
:4) Different fields of science, like biology or human behaviour, may not be able to write their theorems in the the mathematical language of Gödel's incompleteness theorem
If anyone were to attempt this form of analysis, it would be an example of the bad data science described in the caption.

==Transcript==

Cueball thinking: If I assign numbers to each of these things, then it becomes *data*, and I can do *math* on it!

Caption: The same basic idea underlies Gödel's Incompleteness Theorem and all bad data science.


{{incomplete transcript|Do NOT delete this tag too soon.}}

{{comic discussion}}
@@ Line 8: / Line 8: @@
 ==Explanation==
+{{incomplete|Created by a DATA YOU CAN DO *MATH* ON - Please change this comment when editing this page. Do NOT delete this tag too soon.}}
-'''This explanation is by mathematical necessity either incomplete or incorrect.'''
+Kurt Gödel introduced {{w|Gödel numbering}} with his landmark {{w|incompleteness theorems}}. It numbered twelve basic arithmetic and logical operations from one to twelve, and then used this numbering system and prime numbers to create a way to write any logical or mathematical statement as a single number. This made it possible to create statements about mathematics from mathematics. Gödel then introduced a statement that essentially said "this statement has no proof". If the statement could be proved, the statement would be false, there should be no proof, and mathematics would be inconsistent. The only other possibility is that the statement is true without a mathematical proof, and mathematics is incomplete. Gödel's theorem led to a fundamental reckoning in the world of mathematics when it was published.
-[[Cueball]] is falling into a common trap, because a little knowledge is a dangerous thing. Faced with some sort of information, of an unknown kind but seemingly not intrinsically mathematical in nature, he has decided that one possible way to proceed is to somehow translate everything into values which can be combined and compared numerically.
+{{w|Data science}} tries to extract knowledge and insights from noisy data. The comic expresses the irony that this mechanism that underlies one of the most profound theorems of 20th century mathematics is also used to implement all bad data science. While it's possible to assign numeric values to random pieces of data, these numbers are generally not meaningful enough to compute with and draw inferences from. It is generally only possible to perform statistical analysis on actual measurements, not arbitrarily-assigned values.
-This is a very common thing to do, in fields as diverse as {{w|computational linguistics}} or {{w|sports analytics}}, and can be a powerful tool for understanding and learning new things about a subject as {{w|Data science}} tries to extract knowledge and insights from potentially noisy and disordered facts. But it is also used to implement bad science by using incorrect or misguided ideas about how to represent the source material. While it's possible to casually assign numeric values to random pieces of data, these numbers are generally not meaningful enough to compute with and draw any useful inferences from. It is generally possible to perform statistical analysis only on actual measurements, not on what may effectively be arbitrarily-assigned values.
+Machine learning algorithms, which are commonly used by data scientists, typically require all their inputs to be numerical. However, most datasets contains categorical features (e.g. the description of a piece of furniture: chair, table, ...). Data scientists therefore use encoding techniques to convert these categorical features to a numerical form so they can be used as inputs to a machine learning model. For instance, label encoding consists in arbitrarily assigning an integer to a category (chair=0, table=1, ...) which may appear meaningless to most observers.
-Machine learning algorithms, which are commonly used by data scientists, typically require all their inputs to be numerical. However, most datasets contains categorical features (e.g. the description of a piece of furniture: chair, table, ...). Data scientists therefore use encoding techniques to convert these categorical features to a numerical form so they can be used as inputs to a machine learning model. For instance, label encoding consists of arbitrarily assigning an integer to a category (chair=0, table=1, ...) which may appear meaningless to most observers. In various cases, they may be right.
-So, as well as being the mechanism that underlies one of the most profound theorems of 20th century mathematics, it can be mis-used for all kinds of bad or misguided science. From Cueball's attitude, it is far from clear that his attempt will reliably translate his project into a numerical system, nor that his attempt to "do math on it!" will be any more competent.
-One of the major characters who looked at the concept is Kurt Gödel. He introduced the idea of {{w|Gödel numbering}} with his landmark {{w|incompleteness theorems}}. In it a unique natural number is assigned to each axiom, statement, and proof, which might otherwise be difficult to accurately process in any other kind of approach. Instead, it is now possible to create metamathematical statements in the language of mathematics.
-This allowed Gödel to make the statement "This statement cannot be proven based on the axioms provided" in a mathematically rigorous way. A simple proof by contradiction shows that the statement cannot be false, and therefore (in most logical systems) must be true. The proof goes as follows: 1. Assume that "This statement cannot be proven from the axioms" (Call this statement G) is false.<ref>Call this assumption A.</ref> 2. Therefore G can be proven from the axioms.<ref>Because the negation of the negation is an affirmation.  Based only on A.</ref> 3. The axioms exist.<ref>Call this assumption B</ref> 4. Therefore, G is true.<ref>via {{w|Modus ponens}} applied to 2 and 3, based on A and B</ref> 5. Therefore, G and also not G.<ref>via {{w|Conjunction introduction}} applied to 1 and 4, based on A and B</ref> 6.  This is a contradiction, and therefore A (that is, 'not G') or B (ZFC) must be wrong. We are not willing to sacrifice assumption B, so we must conclude that A is false, given B.<ref>{{w|Reductio ad absurdum}} applied to 1,3, and 5</ref> 7.  Therefore, G.
-===Explanatory footnotes for the above===
-<references />
-Notice that the truth of Gödel's statement does not depend on any particular set of axioms, and adding axioms (such as "Gödel's particular statement is true") only opens up new iterations of the statement which cannot be proven based on the expanded set of axioms (A statement such as "All statements of a similar nature to Gödel's particular statement" is not precise enough to serve as an axiom.).  As such, with a little more legwork, it can be proven that any logical system robust enough to accommodate arithmetic must necessarily contain facts that are true within the system but cannot be proven or disproven within the system.  The importance of this result cannot be understated, as it upended the entire philosophy of mathematics.  {{w|David Hilbert}}'s famous proclamation "We must know, we will know" is simply incorrect. ... Either that, or (ironically) Gödel used an "inconsistent" or "incomplete" system to produce his result.
 The title text suggests that Gödel should perform such an analysis on different branches of mathematics, by calculating the average of all the fields' theorems' Gödel numbers. This is nonsensical for a number of reasons:
-:1) Gödel is long dead, and dead people can't write articles;{{Dubious}}<sup> - see [[599: Apocalypse]]</sup>
+:1) Gödel is long dead, so he can't write an article{{citation needed}};
 :2) Gödel numbers grow very large very quickly, and depend heavily on the specific values assigned to each logical operator. Therefore the results could be manipulated simply by changing the numbering order of each operator;
-:3) It may be very hard to gather all theorems in a field, or even a representative sample
+:3) It may be very hard to gather all theorems in a field, or even a representative sample;
+:4) Different fields of science, like biology or human behaviour, may not be able to write their theorems in the the mathematical language of Gödel's incompleteness theorem
 If anyone were to attempt this form of analysis, it would be an example of the bad data science described in the caption.
 ==Transcript==
-:[Cueball holds a hand up to his chin while he ponders the contents of what may be a whiteboard. There are five general lines of unreadable scribbling on the board, and between the two bottom lines, there is a square frame to the right with another scribble to the left. Cueball's thoughts are shown above him in a large thought bubble.]
-:Cueball's thinking: If I assign numbers to each of these things, then it becomes '''''data''''', and I can do '''''math''''' on it!
-:[Caption  beneath the panel:]
+Cueball thinking: If I assign numbers to each of these things, then it becomes *data*, and I can do *math* on it!
-:The same basic idea underlies Gödel's Incompleteness Theorem and all bad data science.
+Caption: The same basic idea underlies Gödel's Incompleteness Theorem and all bad data science.
+{{incomplete transcript|Do NOT delete this tag too soon.}}
 {{comic discussion}}
-[[Category:Math]]
-[[Category:Comics featuring Cueball]]
-[[Category:Logic]]