Editing 2610: Assigning Numbers
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 8: | Line 8: | ||
==Explanation== | ==Explanation== | ||
+ | {{incomplete|Created by a DATA YOU CAN DO *MATH* ON - Please change this comment when editing this page. Do NOT delete this tag too soon.}} | ||
− | + | Kurt Gödel introduced {{w|Gödel numbering}} with his landmark {{w|incompleteness theorems}}. It numbered twelve basic arithmetic and logical operations from one to twelve, and then used this numbering system and prime numbers to create a way to write any logical or mathematical statement as a single number. This made it possible to create statements about mathematics from mathematics. Gödel then introduced a statement that essentially said "this statement has no proof". If the statement could be proved, the statement would be false, there should be no proof, and mathematics would be inconsistent. The only other possibility is that the statement is true without a mathematical proof, and mathematics is incomplete. Gödel's theorem led to a fundamental reckoning in the world of mathematics when it was published. | |
− | + | {{w|Data science}} tries to extract knowledge and insights from noisy data. The comic expresses the irony that this mechanism that underlies one of the most profound theorems of 20th century mathematics is also used to implement all bad data science. While it's possible to assign numeric values to random pieces of data, these numbers are generally not meaningful enough to compute with and draw inferences from. It is generally only possible to perform statistical analysis on actual measurements, not arbitrarily-assigned values. | |
− | + | Machine learning algorithms, which are commonly used by data scientists, typically require all their inputs to be numerical. However, most datasets contains categorical features (e.g. the description of a piece of furniture: chair, table, ...). Data scientists therefore use encoding techniques to convert these categorical features to a numerical form so they can be used as inputs to a machine learning model. For instance, label encoding consists in arbitrarily assigning an integer to a category (chair=0, table=1, ...) which may appear meaningless to most observers. | |
− | |||
− | Machine learning algorithms, which are commonly used by data scientists, typically require all their inputs to be numerical. However, most datasets contains categorical features (e.g. the description of a piece of furniture: chair, table, ...). Data scientists therefore use encoding techniques to convert these categorical features to a numerical form so they can be used as inputs to a machine learning model. For instance, label encoding consists | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
The title text suggests that Gödel should perform such an analysis on different branches of mathematics, by calculating the average of all the fields' theorems' Gödel numbers. This is nonsensical for a number of reasons: | The title text suggests that Gödel should perform such an analysis on different branches of mathematics, by calculating the average of all the fields' theorems' Gödel numbers. This is nonsensical for a number of reasons: | ||
− | :1) Gödel is long dead, | + | :1) Gödel is long dead, so he can't write an article{{citation needed}}; |
:2) Gödel numbers grow very large very quickly, and depend heavily on the specific values assigned to each logical operator. Therefore the results could be manipulated simply by changing the numbering order of each operator; | :2) Gödel numbers grow very large very quickly, and depend heavily on the specific values assigned to each logical operator. Therefore the results could be manipulated simply by changing the numbering order of each operator; | ||
− | :3) It may be very hard to gather all theorems in a field, or even a representative sample | + | :3) It may be very hard to gather all theorems in a field, or even a representative sample; |
+ | :4) Different fields of science, like biology or human behaviour, may not be able to write their theorems in the the mathematical language of Gödel's incompleteness theorem | ||
If anyone were to attempt this form of analysis, it would be an example of the bad data science described in the caption. | If anyone were to attempt this form of analysis, it would be an example of the bad data science described in the caption. | ||
==Transcript== | ==Transcript== | ||
− | |||
− | |||
− | : | + | Cueball thinking: If I assign numbers to each of these things, then it becomes *data*, and I can do *math* on it! |
− | :The same basic idea underlies Gödel's Incompleteness Theorem and all bad data science. | + | |
+ | Caption: The same basic idea underlies Gödel's Incompleteness Theorem and all bad data science. | ||
+ | |||
+ | |||
+ | {{incomplete transcript|Do NOT delete this tag too soon.}} | ||
{{comic discussion}} | {{comic discussion}} | ||
− | |||
− | |||
− |