Curve-Fitting |
Title text: Cauchy-Lorentz: "Something alarmingly mathematical is happening, and you should probably pause to Google my name and check what field I originally worked in." |
Explanation
A illustration of several plots of the same data with curves fitted to the points, paired with conclusions that you might draw about the person who made them.
When modeling a phenomenon statistically, it is common to search for trends, and fitted curves can help reveal these trends. Much of the work of a data scientist or statistician is knowing which fitting method to use for the data in question.
In general, the researcher will specify the form of an equation for the line to be drawn, and an algorithm will produce the actual line.
- Linear: f(x) = mx + b
- Quadratic: f(x) = ax^2 + bx + c
- Logarithmic: f(x) = a*log_b(x) + c
- Exponential: f(x) = a*b^x + c
- Loess: w(x) = (1-|d|^3)^3
- Linear, No Slope: f(x) = c
- Logistic: f(x) = L / (1 + e^(-k(x-b)))
- Confidence Interval: not a type of curve fitting, but a method of depicting the predictive power of a curve
- Piecewise: Mapping different curves to different segments of the data. This is a legitimate strategy, but the different segments should be meaningful, such as if they were pulled from different populations.
- Connecting lines: Not useful whatsoever, but it looks nice!
- Ad-Hoc Filter: Drawing a bunch of different lines by hand. Also not useful.
- House of Cards: Not a real method, but a common consequence of mis-application of statistical methods: a curve can be generated that fits the data extremely well, but immediately becomes absurd as soon as one glances outside the training data sample range, and your analysis comes crashing down "like a house of cards". This is a type of _overfitting_
Transcript
- Curve-Fitting Methods
- and the messages they send
- [In a single frame twelve scatter plots with unlabeled x- and y-axes are shown. Each plot consists of the same data-set of approximately thirty points located all over the plot but slightly more distributed around the diagonal. Every plot shows in red a different fitting method which is labeled on top in gray.]
- [The first plot shows a line starting at the left bottom above the x-axis rising towards the points to the right.]
- Linear
- "Hey, I did a regression."
- [The second plot shows a curve falling slightly down and then rising up to the right.]
- Quadratic
- "I wanted a curved line, so I made one with Math."
- [At the third plot the curve starts near the left bottom and increases more and more less to the right.]
- Logarithmic
- "Look, it's tapering off!"
- [The fourth plot shows a curve starting near the left bottom and increases more and more steeper to the right.]
- Exponential
- "Look, it's growing uncontrollably!"
- [The fifth plot uses a fitting to match many points. It starts at the left bottom, increases, then decreases, then rapidly increasing again, and finally reaching a plateau.]
- Loess
- "I'm sophisticated, not like those bumbling polynomial people."
- [The sixth plot simply shows a line above but parallel to the x-axis.]
- Linear, no slope
- "I'm making a scatter plot but I don't want to."
- [At plot #7 starts at a plateau above the x-axis, then increases, and finally reaches a higher plateau.]
- Logistic
- "I need to connect these two lines, but my first idea didn't have enough Math."
- [Plot #8 shows two red lines embedding most points and the area between is painted as a red shadow.]
- Confidence interval
- "Listen, science is hard. But I'm a serious person doing my best."
- [Plot #9 shows two not connected lines, one at the lower left half, and one higher at the right. Both have smaller curved lines in light red above and below.]
- Piecewise
- "I have a theory, and this is the only data I could find."
