Editing 2435: Geothmetic Meandian
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 6: | Line 6: | ||
| titletext = Pythagorean means are nice and all, but throwing the median in the pot is really what turns this into random forest statistics: applying every function you can think of, and then gradually dropping the ones that make the result worse. | | titletext = Pythagorean means are nice and all, but throwing the median in the pot is really what turns this into random forest statistics: applying every function you can think of, and then gradually dropping the ones that make the result worse. | ||
}} | }} | ||
+ | |||
==Explanation== | ==Explanation== | ||
− | |||
− | There are a number of different ways to identify the | + | There are a number of different ways to identify the '{{w|average}}' value of a series of values, the most common unweighted methods being the {{w|median}} (take the central value from the ordered list of values if there are an odd number - or the value half-way between the two that straddle the divide between two halves if there are an even number) and the {{w|arithmetic mean}} (add all the numbers up, divide by the number of numbers). The {{w|geometric mean}} is less well-known but works similarly to the arithmetic mean. To take the geometric mean of 'n' values, they are multiplied and then the 'n'th root is taken. It will be seen that for purely identical values this returns the single value as the singular average, as would the arithmetic calculation with serial addition then re-division, but it reacts differently to any perturbed values. You might also consider operating arithmetically upon logarithms of the list, then re-exponate the result. |
− | The geometric mean, arithmetic mean | + | The geometric mean, arithmetic mean and {{w|harmonic mean}} (not shown) are collectively known as the {{w|Pythagorean means}}, as specific modes of a greater and more generalised mean formula that extends arbitrarily to various other possible nuances of mean-value rationisations (cubic, etc). |
− | {{w|Outlier}}s and internal biases within the original sample can make boiling down a set of values into a single 'average' sometimes overly biased by flaws in the data, with your choice of which method to use perhaps resulting in a value that is misleading, | + | {{w|Outlier}}s and internal biases within the original sample can make boiling down a set of values into a single 'average' sometimes overly biased by flaws in the data, with your choice of which method to use perhaps resulting in a value that is misleading, exagerating or suppressing the significance of any blips. |
+ | |||
+ | <!-- Either here or after the next paragraph, demonstrate how (1,1,2,3,5) resolves in each individual method, perhaps? --> | ||
In this depiction, the three named methods of averaging are embedded within a single function that produces a sequence of three values - one output for each of the methods. Being a series of values, Randall suggests that this is ideally suited to being ''itself'' subjected to the comparative 'averaging' method. Not just once, but as many times as it takes to narrow down to a sequence of three values that are very close to one another. | In this depiction, the three named methods of averaging are embedded within a single function that produces a sequence of three values - one output for each of the methods. Being a series of values, Randall suggests that this is ideally suited to being ''itself'' subjected to the comparative 'averaging' method. Not just once, but as many times as it takes to narrow down to a sequence of three values that are very close to one another. | ||
Line 19: | Line 21: | ||
It can be shown that the xkcd value of 2.089 for GMDN(1,1,2,3,5) is validated: | It can be shown that the xkcd value of 2.089 for GMDN(1,1,2,3,5) is validated: | ||
− | {|- | + | {|- |
− | + | | F0 || 1 || 1 || 2 || 3 || 5 | |
− | |||
− | |||
− | |||
|- | |- | ||
− | + | | || Ave || Geomean || Median || | |
− | | | ||
|- | |- | ||
− | + | | F1 || 2.4 || 1.974350486 || 2 | |
− | | 2. | ||
|- | |- | ||
− | + | | F2 || 2.124783495 || 2.116192461 || 2 | |
− | | | ||
|- | |- | ||
− | + | | F3 || 2.080325319 || 2.079536819 || 2.116192461 | |
− | | 2. | ||
|- | |- | ||
− | + | | F4 || 2.0920182 || 2.091948605 || 2.080325319 | |
− | | | ||
|- | |- | ||
− | + | | F5 || 2.088097374 || 2.088090133 || 2.091948605 | |
− | | 2. | ||
|- | |- | ||
− | + | | F6 || 2.089378704 || 2.089377914 || 2.088097374 | |
− | | | ||
|- | |- | ||
− | + | | F7 || 2.088951331 || 2.088951244 || 2.089377914 | |
− | | 2. | ||
|- | |- | ||
− | + | | F8 || 2.089093496 || 2.089093487 || 2.088951331 | |
− | | | ||
|- | |- | ||
− | + | | F9 || 2.089046105 || 2.089046103 || 2.089093487 | |
− | | 2.089061898 || 2.089061898 || '''2.089046105''' | + | |- |
+ | | F10 || '''2.089061898''' || '''2.089061898''' || '''2.089046105''' | ||
|} | |} | ||
− | The function GMDN in the comic is properly defined | + | The function GMDN in the comic is not properly defined since F acts on a vector to produce another three vector, so repeated applications of F will always result in a 3 vector for which the ave, geomean and median can be iterated again. However GMDN is shown to produce a single real number rather than a vector. It is thus missing a final operation of returning any of the values of the components of the vector. Each row shows the set Fn(..) composed of the average, geomean and median computed on the previous row, with the sequence {1,1,2,3,5} as the initial F0. Since the average, geomean and median are all forms of averaging, and the composition of averages can be shown to be equivalent to a smoothing function, the value of GMDN will converge to a singular value for any set of starting values. This can be interpreted as similar to a heat equation which approaches equilibrium. |
The comment in the title text about suggests that this will save you the trouble of committing to the 'wrong' analysis as it gradually shaves down any 'outlier average' that is unduly affected by anomalies in the original inputs. It is a method without any danger of divergence of values, since all three averaging methods stay within the interval covering the input values (and two of them will stay strictly within that interval). | The comment in the title text about suggests that this will save you the trouble of committing to the 'wrong' analysis as it gradually shaves down any 'outlier average' that is unduly affected by anomalies in the original inputs. It is a method without any danger of divergence of values, since all three averaging methods stay within the interval covering the input values (and two of them will stay strictly within that interval). | ||
Line 62: | Line 53: | ||
The title text may also be a sly reference to an actual mathematical theorem, namely that if one performs this procedure only using the arithmetic mean and the harmonic mean, the result will converge to the geometric mean. Randall suggests that the (non-Pythagorean) median, which does not have such good mathematical properties with relation to convergence, is, in fact, the secret sauce in his definition. | The title text may also be a sly reference to an actual mathematical theorem, namely that if one performs this procedure only using the arithmetic mean and the harmonic mean, the result will converge to the geometric mean. Randall suggests that the (non-Pythagorean) median, which does not have such good mathematical properties with relation to convergence, is, in fact, the secret sauce in his definition. | ||
− | + | There does exist an {{w|arithmetic-geometric mean}}, which is defined identically to this except with the arithmetic and geometric means, and sees some use in calculus. In some ways it's also philosophically similar to the {{w|truncated mean}} (extremities of the value range, e.g. the highest and lowest 10%s, are ignored as not acceptable and not counted) or {{w|Winsorized mean}} (instead of ignored, the values are readjusted to be the chosen floor/ceiling values that they lie beyond, to still effectively be counted as 'edge' conditions), only with a strange dilution-and-compromise method rather than one where quantities can be culled or neutered just for being unexpectedly different from most of the other data. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | There does exist an {{w|arithmetic-geometric mean}}, which is defined identically to this except with the arithmetic and geometric means, and sees some use in calculus. In some ways it's also philosophically similar to the {{w|truncated mean}} (extremities of the value range, e.g. the highest and lowest 10%s, are ignored as not acceptable and not counted) or {{w|Winsorized mean}} (instead of ignored, the values are readjusted to be the chosen floor/ceiling values that they lie beyond, to still effectively be counted as | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | The following python code (inefficiently) implements the above algorithm: | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | The following | ||
<pre> | <pre> | ||
from functools import reduce | from functools import reduce | ||
+ | from itertools import count | ||
Line 128: | Line 73: | ||
− | + | max_number_of_iterations = 10 | |
− | + | l0 = [1, 1, 2, 3, 5] | |
− | for iterations in range( | + | l = l0 |
+ | for iterations in range(max_number_of_iterations): | ||
fst, *rest = l | fst, *rest = l | ||
if all((abs(r - fst) < 0.00000001 for r in rest)): | if all((abs(r - fst) < 0.00000001 for r in rest)): | ||
Line 136: | Line 82: | ||
l = f(*l) | l = f(*l) | ||
print(l[0], iterations) | print(l[0], iterations) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
</pre> | </pre> | ||
Line 183: | Line 100: | ||
} | } | ||
+ | The input sequence of numbers (1,1,2,3,5) chosen by Randall is also the opening of the {{w|Fibonacci sequence}}. This may have been selected because the Fibonacci sequence also has a convergent property: the ratio of two adjacent numbers in the sequence approaches the [https://en.wikipedia.org/wiki/Golden_ratio#Relationship_to_Fibonacci_sequence golden ratio] as the length of the sequence approaches infinity. | ||
+ | |||
+ | ==Transcript== | ||
+ | {{incomplete transcript|Do NOT delete this tag too soon.}} | ||
+ | |||
+ | F(x1,x2,...xn)=({x1+x2+...+xn/n [bracket: arithmetic mean]},{nx,x2...xn, [bracket: geometric mean]} {x n+1/2 [bracket: median]}) | ||
+ | |||
+ | Gmdn(x1,x2,...xn)={F(F(F(...F(x1,x2,...xn)...)))[bracket: geothmetic meandian]} | ||
+ | |||
+ | Gmdn(1,1,2,3,5) [equals about sign] 2.089 | ||
+ | |||
+ | Caption: Stats tip: If you aren't sure whether to use the mean, median, or geometric mean, just calculate all three, then repeat until it converges | ||
{{comic discussion}} | {{comic discussion}} | ||
Line 197: | Line 126: | ||
[[Category:Math]] | [[Category:Math]] | ||
[[Category:Statistics]] | [[Category:Statistics]] | ||
− | |||
− |