Talk:2048: Curve-Fitting

2018-09-24T21:44:04Z

JohnWhoIsNotABot:

House of Cards: Not a real method, but a common consequence of mis-application of statistical methods: a curve can be generated that fits the data extremely well, but immediately becomes absurd as soon as one glances outside the training data sample range, and your analysis comes crashing down "like a house of cards". This is a type of _overfitting_

I'm pretty sure it refers to the TV show house of cards, the dots representing the quality of the series increasing until Netflix renewed it a bit too much {{unsigned ip|172.68.26.65}}
:This was my initial interpretation as well, since you can hypothetically extend a literal house of cards indefinitely.[[Special:Contributions/172.68.58.83|172.68.58.83]] 14:23, 20 September 2018 (UTC)

Could someone familiar with the show expand on this? ''Also a potential reference to the TV show, House of Cards ("WAIT NO, NO, DON'T EXTEND IT!").'' Some context on what that line meant in House of Cards would be helpful. - [[User:CRGreathouse|CRGreathouse]] ([[User talk:CRGreathouse|talk]]) 14:20, 21 September 2018 (UTC)

I'm a little mystified by the alt-text. Cauchy and Lorentz both seem like mathematically capable people. What am I missing? [[Special:Contributions/172.69.62.226|172.69.62.226]] 17:46, 19 September 2018 (UTC)

: Google-Fu reveals that it's a continuous probability distribution. This isn't bad per se, but it is quite visually distinctive and also can be quite...concerning if the data set isn't one where probability should be an issue. [[User:Werhdnt|Werhdnt]] ([[User talk:Werhdnt|talk]]) 18:00, 19 September 2018 (UTC)

:: This is not the issue, but the fact that the moments (such as mean and variance) of the distribution don't exist = converge. See edited explanation. So if you wanted to estimate the parameters of the distribution, taking the sample mean for example will not converge with the number of data points, and is therefore bad to attempt. It is more mathematically alarming than alarmingly mathematical. [[User:GamesAndMath|GamesAndMath]]

:: My own Google-Fu brought me to a page with this information: “The distribution is important in physics as it is the solution to the differential equation describing forced resonance, while in spectroscopy it is the description of the line shape of spectral lines.” (from here: https://www.boost.org/doc/libs/1_53_0/libs/math/doc/sf_and_dist/html/math_toolkit/dist/dist_ref/dists/cauchy_dist.html) [[User:Justinjustin7|Justinjustin7]] ([[User talk:Justinjustin7|talk]]) 18:09, 19 September 2018 (UTC)

:: True, but the "check what field I originally worked in" indicates that there might be something else going on with the meaning. [[Special:Contributions/108.162.237.238|108.162.237.238]] 12:47, 20 September 2018 (UTC)

:: I believe the point of "check what field I originally worked in" is that if somebody wasn't trained in statistics using an exotic distribution is highly suspect and suggest that either they are torturing the data to get desired results or have no idea what they are doing. [[Special:Contributions/108.162.246.11|108.162.246.11]] 05:19, 21 September 2018 (UTC)

To be honest, I'm a bit disappointed. I kinda expected a special comic with such a nice round number.. Been counting down since comic #2000... [[Special:Contributions/162.158.92.184|162.158.92.184]] 18:14, 19 September 2018 (UTC)

Different anon here, I think this is very special and if Randall makes a poster available I will be buying several to give away. Of course, part of my business is experimental data analysis and modeling...and this is a fantastic summary of common errors. {{unsigned ip|162.158.75.22}}

: Agreed. This is a very special comic, and a highly subtle title text. Direct any of your friends who do data analysis here. Sort of the next stage from the classic "correlation is not causation" comic https://xkcd.com/552/ . {{unsigned|GamesAndMath}}

'''Curve-Fitting'''

How fitting works needs to be explained. f(x)=mx+b works fine for single values, but how do we get that red line from the data set? --[[User:Dgbrt|Dgbrt]] ([[User talk:Dgbrt|talk]]) 20:12, 19 September 2018 (UTC)

:Generally, you decide for some error function and then search for parameters where the sum of errors for all data points is minimal. -- [[User:Hkmaly|Hkmaly]] ([[User talk:Hkmaly|talk]]) 22:07, 19 September 2018 (UTC)

:A typical error function is the square of the difference between the fit and the actual data point, hence "sum of squares" method. There are well-known standard formulas for finding m and b in the case of linear regression. In a linear algebra class, I saw a general method that would work for several of these (any where the fit is y = af(x)+bg(x)+...+ch(x), which includes log, exponential, quadratic, cubic, etc). I wish I could remember it. [[User:Blaisepascal|Blaisepascal]] ([[User talk:Blaisepascal|talk]]) 22:39, 19 September 2018 (UTC)
::I'm still looking for an easy example. Let's say five points (x/y) and then calculating the straight line (without and maybe with the zero-point because this is often the assumed start). Just be simple, everything else derives from that. --[[User:Dgbrt|Dgbrt]] ([[User talk:Dgbrt|talk]]) 21:00, 20 September 2018 (UTC)

:I wish we could include the graphics at the top of [https://en.wikipedia.org/wiki/Linear_regression#Introduction] and [https://en.wikipedia.org/wiki/Linear_regression#Interpretation] in the explanation. A lot of people are going to look at this one. [[Special:Contributions/172.68.133.168|172.68.133.168]] 17:51, 20 September 2018 (UTC)
::I've included one picture with a small explanation to the linear regression section. I think that explains it well. --[[User:Dgbrt|Dgbrt]] ([[User talk:Dgbrt|talk]]) 21:00, 20 September 2018 (UTC)

The data points do not have error bars, which makes the choice of fit even more ludicrous, in my opinion. If the data are that good, then I don't believe there is a correlation, it's random with some distribution. I might hang this up at work...[[User:Arppix|Arppix]] ([[User talk:Arppix|talk]]) 02:46, 20 September 2018 (UTC)
:And of course in serious science data points have error bars. This makes the fitting even more complicated and should be mentioned at the explanation. Because Randall doesn't use error bars I'm sure he refers to presentations not based on real science. Also this should be mentioned here. --[[User:Dgbrt|Dgbrt]] ([[User talk:Dgbrt|talk]]) 21:06, 20 September 2018 (UTC)

I hate to be negative here, as obviously some users have put a lot of effort into explaining the details behind each of the curve-fitting methods, but there's absolutely no explanation for Randall's comments on each method. While someone might learn something about the various methods by reading the explanation, they would not gain any insight on what Randall is saying about each method. In addition, the Connecting Lines explanation totally missed the fact that this isn't really even a curve-fitting method - it's just a feature of graphing software (in this case, Excel) where a smooth line is drawn through each data point from left to right rather than an example of overfitting to the data set. I think we could do better. [[User:Ianrbibtitlht|Ianrbibtitlht]] ([[User talk:Ianrbibtitlht|talk]]) 02:53, 21 September 2018 (UTC)
:You're not negative, Randall's comments are missing which I've just added into the incomplete reason. And sure other explanations still need a review. --[[User:Dgbrt|Dgbrt]] ([[User talk:Dgbrt|talk]]) 20:32, 21 September 2018 (UTC)

Everyone is missing the deeper trolling here of the fisheries community at large, which shall become blindingly clear here. First, this is cartoon number 2048 (2^11), a highly interesting number. Notably, this is the year all fisheries were projected to be collapsed by Worm et al. (2006) Science 314:787-790, a prediction which gained huge attention in the media and took on a life of its own. The prediction was based on fitting a power curve to some data on collapses in catch trends. Numerous rebuttals followed, one of which pointed out that a linear fit to the data is a better fit, and predicts all fisheries collapsed in 2114 (Jaenike et al. 2007, Science 316:1285a). A list of rebuttals is found here: https://sites.google.com/a/uw.edu/most-cited-fisheries/controversies/2048-projection. Later work by the same author and critics found a different prediction and showed rebuilding of fisheries is likely (Worm et al. 2009 Science 325:578-585). Second, lest you think this is a conspiracy theory, I note that in xkcd cartoon 887, Munroe specifically notes this prediction "The future according to google search results... 2048: "Salt-water fish extinct from overfishing" https://xkcd.com/887/. Third, this kind of model-fitting exercise has long plagued fisheries researchers attempting to predict recruitment from spawning biomass. {{unsigned ip|108.162.246.11}}

"Ad hoc filter: Drawing a bunch of different lines by hand, keeping in only the data points perceived as "good". Also not useful. " – I guess it rather refers to data filtering, where for each point you take several points around and try to calculate some kind of mean, e.g. by rejecting most extreme points, or calculating median (see https://en.wikipedia.org/wiki/Median_filter). So it is an algorithm, not actually drawing lines by hand. Still it is tricky to draw conclusions and you can easily fool yourself with this method. {{unsigned ip|162.158.93.21}}

Anyways, what is the actual regression of the plot? {{unsigned ip|162.158.154.241}}
:This also must be better explained: We don't know what the points represent. The fraction of apples vs. bananas harvested by time, the position of stars in the sky, on a logarithmic scale, linear, or maybe the height of mountains in New Jersey... There are just some dots on paper with no further meaning. Thus everything Randall presents is valid by some means but an actual regression does not exist. --[[User:Dgbrt|Dgbrt]] ([[User talk:Dgbrt|talk]]) 20:32, 21 September 2018 (UTC)

Just want to note that the Piecewise models is actually a type of modelling often used in housing economics. It has been used to check if different types of housing are priced according to different rules. [[Special:Contributions/172.68.34.34|172.68.34.34]] 22:05, 21 September 2018 (UTC)

Excel's "smooth lines" are actually splines ([https://blog.splitwise.com/2012/01/31/mystery-solved-the-secret-of-excel-curved-line-interpolation/ third-order Bezier splines, apparently]) so they're not completely without mathematical merit. Still wildly unsuited for extrapolation, but often very well suited to interpolation. [[User:JohnWhoIsNotABot|JohnWhoIsNotABot]] ([[User talk:JohnWhoIsNotABot|talk]]) 21:44, 24 September 2018 (UTC)

Talk:2026: Heat Index

2018-08-03T18:11:44Z

JohnWhoIsNotABot:

Look at the formula, then at the table and try to tell with straight face that those tables were computed from the formulae and not the other way around. -- [[User:Hkmaly|Hkmaly]] ([[User talk:Hkmaly|talk]]) 22:38, 30 July 2018 (UTC)
: The Wikipedia page explicitely says that the various formulaes try to approximate the table. Can't be more explicit. [[Special:Contributions/172.69.226.119|172.69.226.119]] 06:36, 31 July 2018 (UTC)
What confuses me is that even at 40% humidity the heat index is a lot hotter than the actual temperature. If 110 degrees at the lowest humidity that occurs commonly feels like 130 degrees, then what does it mean to feel like 110 degrees?[[User:Probably not Douglas Hofstadter|Probably not Douglas Hofstadter]] ([[User talk:Probably not Douglas Hofstadter|talk]]) 15:46, 31 July 2018 (UTC)

Ironically, it is actually when humidity is at 100% (your sweat can't evaporate) that you feel the actual temperature. The lower humidity makes you feel cooler than the actual temperature. Similarly the windier it is (in cold weather) the more body heat is removed and the closer the actual temperature is to what it feels like.
[[Special:Contributions/162.158.62.141|162.158.62.141]] 21:31, 2 August 2018 (UTC)

I don't think it's accurate that "Human skin does not directly detect temperature - only the rate of heat gain or loss." Isn't it more that skin temperature is a dynamic equilibrium between the body's internal temperature and heat loss? I.e., skin feels colder in cold water than cold air because its equilibrium temperature is lower with faster heat loss? Also, is skin temperature even relevant here, or is it more about core temperature? [[User:JohnWhoIsNotABot|JohnWhoIsNotABot]] ([[User talk:JohnWhoIsNotABot|talk]]) 18:11, 3 August 2018 (UTC)

explain xkcd - User contributions [en]

Talk:2048: Curve-Fitting

Talk:2026: Heat Index