Editing 2435: Geothmetic Meandian

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 6: Line 6:
 
| titletext = Pythagorean means are nice and all, but throwing the median in the pot is really what turns this into random forest statistics: applying every function you can think of, and then gradually dropping the ones that make the result worse.
 
| titletext = Pythagorean means are nice and all, but throwing the median in the pot is really what turns this into random forest statistics: applying every function you can think of, and then gradually dropping the ones that make the result worse.
 
}}
 
}}
 +
 
==Explanation==
 
==Explanation==
This is another one of [[Randall|Randall's]] [[:Category:Tips|Tips]], this time a stats tip. This came as the first tip comic after the statistics tip in [[2400: Statistics]].
+
{{incomplete|Created by a MEAN MEDIAN. What, actually, is the joke? Do NOT delete this tag too soon.}}
  
There are a number of different ways to identify the "{{w|average}}" value of a series of values, the most common unweighted methods being the {{w|median}} (take the central value from the ordered list of values if there are an odd number - or the value half-way between the two that straddle the divide between two halves if there are an even number) and the {{w|arithmetic mean}} (add all the numbers up, divide by the number of numbers). The {{w|geometric mean}} is less well-known but works similarly to the arithmetic mean. The geometric mean of ''n'' positive numbers is the ''n''th root of the product of those numbers. If all of the numbers in a sequence are identical, then its arithmetic mean, geometric mean and median will be identical, since they would all be equal to the common value of the terms of the sequence. However, if the sequence is not constant, then {{w|Inequality_of_arithmetic_and_geometric_means#Geometric_interpretation|the arithmetic mean will be greater than the geometric mean}}, and the median may be different than either of those means.
+
Geothm means "counting earths" (From Ancient Greek γεω- (geō-), combining form of γῆ (gê, “earth”) and ἀριθμός arithmos, 'counting').  Geothmetic means "art of Geothming" based on the etymology of Arithmetic (from Ancient Greek ἀριθμητική (τέχνη) (arithmētikḗ (tékhnē), “(art of) counting”). This is an exciting new terminology that is eminently suitable for modern cosmology & high energy physics - particularly when doing math on the multiverse.  However, it is unlikely this etymology is related to the term "geothmetic meandian" as coined by Randall, as it can be more simply explained as a portmanteau of the three averages in its construction: '''geo'''metric mean, ari'''thmetic mean''', and me'''dian'''.
  
The geometric mean, arithmetic mean, and the {{w|harmonic mean}} (not shown) are collectively known as the {{w|Pythagorean means}}, as specific modes of a greater and more generalized mean formula that extends arbitrarily to various other possible nuances of mean-value rationisations (cubic, etc.).
+
There are a number of different ways to identify the '{{w|average}}' value of a series of values, the most common unweighted methods being the {{w|median}} (take the central value from the ordered list of values if there are an odd number - or the value half-way between the two that straddle the divide between two halves if there are an even number) and the {{w|arithmetic mean}} (add all the numbers up, divide by the number of numbers). The {{w|geometric mean}} is less well-known but works similarly to the arithmetic mean. To take the geometric mean of 'n' values, they are multiplied and then the 'n'th root is taken. It will be seen that for purely identical values this returns the single value as the singular average, as would the arithmetic calculation with serial addition then re-division, but it reacts differently to any perturbed values. You might also consider operating arithmetically upon logarithms of the list, then re-exponate the result.
  
{{w|Outlier}}s and internal biases within the original sample can make boiling down a set of values into a single 'average' sometimes overly biased by flaws in the data, with your choice of which method to use perhaps resulting in a value that is misleading, exaggerating or suppressing the significance of any blips.
+
The geometric mean, arithmetic mean and {{w|harmonic mean}} (not shown) are collectively known as the {{w|Pythagorean means}}, as specific modes of a greater and more generalised mean formula that extends arbitrarily to various other possible nuances of mean-value rationisations (cubic, etc).
 +
 
 +
{{w|Outlier}}s and internal biases within the original sample can make boiling down a set of values into a single 'average' sometimes overly biased by flaws in the data, with your choice of which method to use perhaps resulting in a value that is misleading, exagerating or suppressing the significance of any blips.
 +
 
 +
<!-- Either here or after the next paragraph, demonstrate how (1,1,2,3,5) resolves in each individual method, perhaps? -->
  
 
In this depiction, the three named methods of averaging are embedded within a single function that produces a sequence of three values - one output for each of the methods. Being a series of values, Randall suggests that this is ideally suited to being ''itself'' subjected to the comparative 'averaging' method. Not just once, but as many times as it takes to narrow down to a sequence of three values that are very close to one another.  
 
In this depiction, the three named methods of averaging are embedded within a single function that produces a sequence of three values - one output for each of the methods. Being a series of values, Randall suggests that this is ideally suited to being ''itself'' subjected to the comparative 'averaging' method. Not just once, but as many times as it takes to narrow down to a sequence of three values that are very close to one another.  
Line 19: Line 24:
 
It can be shown that the xkcd value of 2.089 for GMDN(1,1,2,3,5) is validated:
 
It can be shown that the xkcd value of 2.089 for GMDN(1,1,2,3,5) is validated:
  
{|-border =1 width=100% cellpadding=5 class="wikitable"
+
{|-
!
+
| F0 || 1 || 1 || 2 || 3 || 5  
! Arithmetic mean
 
! Geometric mean
 
! Median
 
 
  |-
 
  |-
! F1
+
  |   || Arithmean || Geomean || Median ||
  | 2.4 || 1.974350486 || 2
 
 
  |-
 
  |-
! F2
+
  | F1 || 2.4 || 1.974350486 || 2
  | 2.124783495 || 2.116192461 || 2
 
 
  |-
 
  |-
! F3
+
  | F2 || 2.124783495 || 2.116192461 || 2
  | '''2.080325319''' || 2.079536819 || 2.116192461
 
 
  |-
 
  |-
! F4
+
  | F3 || 2.080325319 || 2.079536819 || 2.116192461
  | 2.0920182 || 2.091948605 || '''2.080325319'''
 
 
  |-
 
  |-
! F5
+
  | F4 || 2.0920182 || 2.091948605 || 2.080325319
  | '''2.088097374''' || 2.088090133 || 2.091948605
 
 
  |-
 
  |-
! F6
+
  | F5 || 2.088097374 || 2.088090133 || 2.091948605
  | 2.089378704 || 2.089377914 || '''2.088097374'''
 
 
  |-
 
  |-
! F7
+
  | F6 || 2.089378704 || 2.089377914 || 2.088097374
  | '''2.088951331''' || 2.088951244 || 2.089377914
 
 
  |-
 
  |-
! F8
+
  | F7 || 2.088951331 || 2.088951244 || 2.089377914
  | 2.089093496 || 2.089093487 || '''2.088951331'''
 
 
  |-
 
  |-
! F9
+
  | F8 || 2.089093496 || 2.089093487 || 2.088951331
  | '''2.089046105''' || 2.089046103 || 2.089093487
 
 
  |-
 
  |-
  ! F10
+
  | F9 || 2.089046105 || 2.089046103 || 2.089093487
  | 2.089061898 || 2.089061898 || '''2.089046105'''
+
|-
 +
  | F10 || '''2.089061898''' || '''2.089061898''' || '''2.089046105'''
 
  |}
 
  |}
  
The function GMDN in the comic is properly defined in the second row since F acts on a vector to produce another three vector, however GMDN in the last line is shown to produce a single real number rather than a vector and is thus missing a final operation of returning a single component. Each row in this table shows the set Fn(..) composed of the average, geomean and median computed on the previous row, with the sequence {1,1,2,3,5} as the initial F0. While GMDN is not differentiable, due to the median, this can be interpreted as somewhat similar to a heat equation which approaches equilibrium through averaging. Interestingly, the maximum value alternates between the average and the median (highlighted in bold in the table), while the minimum value alternates between the geomean and the median. This holds for many inputs thus providing the basis for a possible proof-by-induction of convergence on the range (see discussions).
+
The function GMDN in the comic is not properly defined since F acts on a vector to produce another three vector, so repeated applications of F will always result in a 3 vector for which the ave, geomean and median can be iterated again. However GMDN is shown to produce a single real number rather than a vector. It is thus missing a final operation of returning any of the values of the components of the vector. Each row shows the set Fn(..) composed of the average, geomean and median computed on the previous row, with the sequence {1,1,2,3,5} as the initial F0. Since the average, geomean and median are all forms of averaging, and the composition of averages can be shown to be equivalent to a smoothing function, the value of GMDN will converge to a singular value for any set of starting values. This can be interpreted as similar to a heat equation which approaches equilibrium.
  
 
The comment in the title text about suggests that this will save you the trouble of committing to the 'wrong' analysis as it gradually shaves down any 'outlier average' that is unduly affected by anomalies in the original inputs. It is a method without any danger of divergence of values, since all three averaging methods stay within the interval covering the input values (and two of them will stay strictly within that interval).
 
The comment in the title text about suggests that this will save you the trouble of committing to the 'wrong' analysis as it gradually shaves down any 'outlier average' that is unduly affected by anomalies in the original inputs. It is a method without any danger of divergence of values, since all three averaging methods stay within the interval covering the input values (and two of them will stay strictly within that interval).
Line 62: Line 56:
 
The title text may also be a sly reference to an actual mathematical theorem, namely that if one performs this procedure only using the arithmetic mean and the harmonic mean, the result will converge to the geometric mean. Randall suggests that the (non-Pythagorean) median, which does not have such good mathematical properties with relation to convergence, is, in fact, the secret sauce in his definition.
 
The title text may also be a sly reference to an actual mathematical theorem, namely that if one performs this procedure only using the arithmetic mean and the harmonic mean, the result will converge to the geometric mean. Randall suggests that the (non-Pythagorean) median, which does not have such good mathematical properties with relation to convergence, is, in fact, the secret sauce in his definition.
  
The question of being unsure of which mean to use is especially relevant for the arithmetic and harmonic means in following example.
+
There does exist an {{w|arithmetic-geometric mean}}, which is defined identically to this except with the arithmetic and geometric means, and sees some use in calculus. In some ways it's also philosophically similar to the {{w|truncated mean}} (extremities of the value range, e.g. the highest and lowest 10%s, are ignored as not acceptable and not counted) or {{w|Winsorized mean}} (instead of ignored, the values are readjusted to be the chosen floor/ceiling values that they lie beyond, to still effectively be counted as 'edge' conditions), only with a strange dilution-and-compromise method rather than one where quantities can be culled or neutered just for being unexpectedly different from most of the other data.
  * Cueball has some US Dollars and wishes to buy Euros. Suppose the bank will exchange US Dollars to Euros at a rate of €5 for $6 (about 0.83333€/$ or 1.20000$/€).
 
  * Megan  has some Euros and wishes to buy US Dollars. Suppose the bank will exchange Euros to US Dollars at a rate of $7 for €6 (about 0.85714€/$ or 1.16667$/€).
 
[[Cueball]] and [[Megan]] decide to complete the exchange between themselves in order to save the {{w|Bid-ask spread}} of the {{w|Exchange rate}} which is the cost the bank imposes on Cueball and Megan for its service as a {{w|Market maker}}.
 
  * Cueball offers to split the difference by averaging the rates €5:$6 and €6:$7 yielding a rate of €71:$84 (about 0.84524€/$ or 1.18310$/€).
 
  * Megan  offers to split the difference by averaging the rates $6:€5 and $7:€6 yielding a rate of €60:$71 (about 0.84507€/$ or 1.18333$/€).
 
In one direction (€/$), Cueball is using the arithmetic mean but Megan is using the harmonic mean while in the other direction ($/), Megan is using the arithmetic mean but Cueball is using the harmonic mean. This creates two new exchange rates which are closer than the orginal rates, but the new rates are still different for each other. Megan and Cueball can then iterate this process and the rates will converge to the geometric mean of the original rates, namely:
 
  * sqrt((5/6)*(6/7)) = sqrt(5/7) = 0.84515€/$ or
 
  * sqrt((6/5)*(7/6)) = sqrt(7/5) = 1.18322$/€.
 
  
There does exist an {{w|arithmetic-geometric mean}}, which is defined identically to this except with the arithmetic and geometric means, and sees some use in calculus.  In some ways it's also philosophically similar to the {{w|truncated mean}} (extremities of the value range, e.g. the highest and lowest 10%s, are ignored as not acceptable and not counted) or {{w|Winsorized mean}} (instead of ignored, the values are readjusted to be the chosen floor/ceiling values that they lie beyond, to still effectively be counted as "edge" conditions), only with a strange dilution-and-compromise method rather than one where quantities can be culled or neutered just for being unexpectedly different from most of the other data.
+
The input sequence of numbers (1,1,2,3,5) chosen by Randall is also the opening of the {{w|Fibonacci sequence}}.  This may have been selected because the Fibonacci sequence also has a convergent property: the ratio of two adjacent numbers in the sequence approaches the [https://en.wikipedia.org/wiki/Golden_ratio#Relationship_to_Fibonacci_sequence golden ratio] as the length of the sequence approaches infinity.
 
 
The input sequence of numbers (1, 1, 2, 3, 5) chosen by Randall is also the opening of the {{w|Fibonacci sequence}}.  This may have been selected because the Fibonacci sequence also has a convergent property: the ratio of two adjacent numbers in the sequence approaches the [https://en.wikipedia.org/wiki/Golden_ratio#Relationship_to_Fibonacci_sequence golden ratio] as the length of the sequence approaches infinity.
 
 
 
Here is a table of averages classified by the various methods referenced:
 
 
 
{|border =1 width=100% cellpadding=5 class="wikitable"
 
|+ averages using various methods
 
! Method
 
! Value
 
! Formula
 
|-
 
! Arithmetic
 
| 2.4 ||
 
|-
 
! Geometric
 
| 1.9743504858348
 
| Multiply all numbers, then take it to the nth root, where n is the number of terms.
 
|-
 
! Median
 
| 2 ||
 
|-
 
! GMDN
 
| 2.089 ||
 
|}
 
  
 
==Transcript==
 
==Transcript==
 +
{{incomplete transcript|Do NOT delete this tag too soon.}}
  
 
F(x1,x2,...xn)=({x1+x2+...+xn/n [bracket: arithmetic mean]},{nx,x2...xn, [bracket: geometric mean]} {x n+1/2 [bracket: median]})
 
F(x1,x2,...xn)=({x1+x2+...+xn/n [bracket: arithmetic mean]},{nx,x2...xn, [bracket: geometric mean]} {x n+1/2 [bracket: median]})
Line 107: Line 70:
  
 
Caption: Stats tip: If you aren't sure whether to use the mean, median, or geometric mean, just calculate all three, then repeat until it converges
 
Caption: Stats tip: If you aren't sure whether to use the mean, median, or geometric mean, just calculate all three, then repeat until it converges
 +
  
 
==Trivia==
 
==Trivia==
Geothm means "counting earths" (From Ancient Greek γεω- (geō-), combining form of γῆ (gê, “earth”) and ἀριθμός arithmos, 'counting').  Geothmetic means "art of Geothming" based on the etymology of Arithmetic (from Ancient Greek ἀριθμητική (τέχνη) (arithmētikḗ (tékhnē), “(art of) counting”).  This is an exciting new terminology that is eminently suitable for modern cosmology & high energy physics - particularly when doing math on the multiverse.  However, it is unlikely this etymology is related to the term "geothmetic meandian" as coined by Randall, as it can be more simply explained as a portmanteau of the three averages in its construction: '''geo'''metric mean, ari'''thmetic mean''', and me'''dian'''.
+
The following python code (inefficiently) implements the above algorithm:
 
 
The following Python code (inefficiently) implements the above algorithm:
 
  
 
<pre>
 
<pre>
 
from functools import reduce
 
from functools import reduce
 +
from itertools import count
  
  
Line 128: Line 91:
  
  
max_iterations = 10
+
max_number_of_iterations = 10
l = [1, 1, 2, 3, 5]
+
l0 = [1, 1, 2, 3, 5]
for iterations in range(max_iterations):
+
l = l0
 +
for iterations in range(max_number_of_iterations):
 
     fst, *rest = l
 
     fst, *rest = l
 
     if all((abs(r - fst) < 0.00000001 for r in rest)):
 
     if all((abs(r - fst) < 0.00000001 for r in rest)):
Line 136: Line 100:
 
     l = f(*l)
 
     l = f(*l)
 
print(l[0], iterations)
 
print(l[0], iterations)
</pre>
 
Here is a slightly more efficient version of the Python code:
 
 
<pre>
 
from scipy.stats.mstats import gmean
 
import numpy as np
 
 
 
def get_centers(a, tol=0.00001, print_rows = True):
 
    a = np.array(a)
 
    l_of_a = len(a)
 
    if l_of_a == 1:
 
        return a[0]
 
    elif l_of_a > 2:
 
        result = all(
 
            (
 
                np.abs(a[0] / a[1]) <= tol,
 
                np.abs(a[0] / a[2]) <= tol,
 
                np.abs(a[1] / a[2]) <= tol,
 
            )
 
        )
 
        if result:
 
            return a[0]
 
    res = [np.mean(a), np.median(a), gmean(a)]
 
 
    if print_rows:
 
        print(res)
 
    return get_centers(res, tol)
 
 
 
</pre>
 
</pre>
  
Line 198: Line 133:
 
[[Category:Statistics]]
 
[[Category:Statistics]]
 
[[Category:Portmanteau]]
 
[[Category:Portmanteau]]
[[Category:Tips]]
 

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)