Difference between revisions of "Talk:1571: Car Model Names"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
Line 1: Line 1:
Interestingly, "xkcd" has a high score of 4.1.
 
 
 
Suzuki Sexism kinda has a ring to it... [[User:Bbruzzo|Bbruzzo]] ([[User talk:Bbruzzo|talk]]) 14:39, 31 August 2015 (UTC)
 
Suzuki Sexism kinda has a ring to it... [[User:Bbruzzo|Bbruzzo]] ([[User talk:Bbruzzo|talk]]) 14:39, 31 August 2015 (UTC)
  
Worth noting that there actually was an engine manufacturer named "Coventry Climax", who produced a range of racing engines and specialty machinery like forklift trucks.  Coventry Climax's engine works were eventually bought out by Jaguar Cars in the 1960s.
+
Worth noting that there actually was an engine manufacturer named "Coventry Climax", who produced a range of racing engines and specialty machinery like forklift trucks.  Coventry Climax's engine works were eventually bought out by Jaguar Cars in the 1960s. {{unsigned ip|141.101.98.154}}
  
 
Considering the existence of the Civic RX and the CR-V EX, Cervixxx should have been a Honda model. - [[User:Frankie|Frankie]] ([[User talk:Frankie|talk]]) 16:44, 2 September 2015 (UTC)
 
Considering the existence of the Civic RX and the CR-V EX, Cervixxx should have been a Honda model. - [[User:Frankie|Frankie]] ([[User talk:Frankie|talk]]) 16:44, 2 September 2015 (UTC)
Line 10: Line 8:
 
Run it with your favorite Lua interpreter, and it should ask for a name. [[Special:Contributions/108.162.216.160|108.162.216.160]] 03:01, 3 September 2015 (UTC)
 
Run it with your favorite Lua interpreter, and it should ask for a name. [[Special:Contributions/108.162.216.160|108.162.216.160]] 03:01, 3 September 2015 (UTC)
  
== Scores ==
+
Interestingly, "xkcd" has a high score of 4.1. {{unsigned ip|199.27.129.59}}
 +
 
 +
;Scores
  
 
Anyone know how the averages are calculated? I tried a couple but I don't arrive at the same numbers:
 
Anyone know how the averages are calculated? I tried a couple but I don't arrive at the same numbers:
Line 55: Line 55:
 
:Forgot to add what I meant to put here...  Apostrophes.  Very rare in car names (just the {{w|Kia_Cee%27d|Kia Cee'd}}), fairly often (over)used in standard English text.  I wonder what its value is?  (Not as easily 'assume it's a letter' as the x/times symbol.) [[Special:Contributions/141.101.99.108|141.101.99.108]] 01:44, 1 September 2015 (UTC)
 
:Forgot to add what I meant to put here...  Apostrophes.  Very rare in car names (just the {{w|Kia_Cee%27d|Kia Cee'd}}), fairly often (over)used in standard English text.  I wonder what its value is?  (Not as easily 'assume it's a letter' as the x/times symbol.) [[Special:Contributions/141.101.99.108|141.101.99.108]] 01:44, 1 September 2015 (UTC)
  
===Order of the scores===
+
;Order of the scores
 
There are two possible explanations
 
There are two possible explanations
 
;Score(x) = Frequency_in_cars(x) - Frequency_in_English(x)
 
;Score(x) = Frequency_in_cars(x) - Frequency_in_English(x)
Line 82: Line 82:
 
...which was looked less useful and too wordy even for me, but might also be a useful fragment to consider. [[Special:Contributions/141.101.99.108|141.101.99.108]] 15:09, 1 September 2015 (UTC)
 
...which was looked less useful and too wordy even for me, but might also be a useful fragment to consider. [[Special:Contributions/141.101.99.108|141.101.99.108]] 15:09, 1 September 2015 (UTC)
  
== Typo or Deliberate? ==
+
;Typo or Deliberate?
  
 
Randall gave REV-4 as an example car name. Did he accidentally misspell the (Toyota) RAV4, or was this a deliberate reference to chapter 4 of Revelations?--[[Special:Contributions/173.245.54.26|173.245.54.26]] 02:31, 1 September 2015 (UTC)
 
Randall gave REV-4 as an example car name. Did he accidentally misspell the (Toyota) RAV4, or was this a deliberate reference to chapter 4 of Revelations?--[[Special:Contributions/173.245.54.26|173.245.54.26]] 02:31, 1 September 2015 (UTC)
  
== Old Goths ==
+
;Old Goths
  
 
49 is a reasonable age for those who grew up Goth in the 80s, just sayin'.
 
49 is a reasonable age for those who grew up Goth in the 80s, just sayin'.
Line 94: Line 94:
 
-- [[Special:Contributions/108.162.229.157|108.162.229.157]] 11:28, 1 September 2015 (UTC)
 
-- [[Special:Contributions/108.162.229.157|108.162.229.157]] 11:28, 1 September 2015 (UTC)
  
== 'Quick' and Dirty Car Data ==
+
;'Quick' and Dirty Car Data
 
Examining {{w|List_of_automobile_sales_by_model|this page}}, which has notable exceptions (I specifically looked for the Toyota '''Y'''aris and the Kia Cee'd, neither of which were there), using a quick script to isolate the car names, a lengthy ''manual'' process of sanitising all the exceptions the quick script couldn't handle and then another script to analyse letter frequencies of the model names (''not'' the make/marque part), I came up with the following undefinitive data, that is almost certainly flawed but may yet be useful:
 
Examining {{w|List_of_automobile_sales_by_model|this page}}, which has notable exceptions (I specifically looked for the Toyota '''Y'''aris and the Kia Cee'd, neither of which were there), using a quick script to isolate the car names, a lengthy ''manual'' process of sanitising all the exceptions the quick script couldn't handle and then another script to analyse letter frequencies of the model names (''not'' the make/marque part), I came up with the following undefinitive data, that is almost certainly flawed but may yet be useful:
 
  <spaces> = 85 (but this count of whitespace may not be accurate and is superfluous...
 
  <spaces> = 85 (but this count of whitespace may not be accurate and is superfluous...

Revision as of 11:11, 3 September 2015

Suzuki Sexism kinda has a ring to it... Bbruzzo (talk) 14:39, 31 August 2015 (UTC)

Worth noting that there actually was an engine manufacturer named "Coventry Climax", who produced a range of racing engines and specialty machinery like forklift trucks. Coventry Climax's engine works were eventually bought out by Jaguar Cars in the 1960s. 141.101.98.154 (talk) (please sign your comments with ~~~~)

Considering the existence of the Civic RX and the CR-V EX, Cervixxx should have been a Honda model. - Frankie (talk) 16:44, 2 September 2015 (UTC)

A simple Lua script I wrote to calculate these ratings: http://pastebin.ubuntu.com/12259822/ Run it with your favorite Lua interpreter, and it should ask for a name. 108.162.216.160 03:01, 3 September 2015 (UTC)

Interestingly, "xkcd" has a high score of 4.1. 199.27.129.59 (talk) (please sign your comments with ~~~~)

Scores

Anyone know how the averages are calculated? I tried a couple but I don't arrive at the same numbers:

HONDA { -44 -80 -46 -21 -14 } Sum: -205 Avg: -41
2CHAINZ { +6 +27 -44 -14 -21 -46 +83 } Sum: -9 Avg: -1.2857142857142857142857142857143
Combined: (-205 -9) / (5 + 7) = -17.833333333333333333333333333333

SG 01 (talk) 15:29, 31 August 2015 (UTC)


I think only the model should be considered. Xhfz (talk) 15:36, 31 August 2015 (UTC)

2CHAINZ { +6 +27 -44 -14 -21 -46 +83 } Sum: -9 Avg: -1.29 Index: -0.13
CLIMAX { +27 +12 -21 +19 -14 +126} Sum: 149 Avg: 24.83 Index: 2.48

Obvioulsy it's the average divided by 10. Xhfz (talk) 15:44, 31 August 2015 (UTC)

Ah, it's so obvious now, thanks :) SG 01 (talk) 16:00, 31 August 2015 (UTC)

I worked it out to be average divided by 10 early on but why divided by 10? Is it because each category has 10 cars listed? This is the piece I've been stuck at. Understanding that part of the logic. --R0hrshach (talk) 16:05, 31 August 2015 (UTC)

The only thing I can think of is to make the numbers be below 10 as a lot of scoring is done in that scale, then again, that doesn't include numbers below 1 usually (On a scale from 1 - 10). Oh, also the 3x3cutrix, the i is worth -21, not -45 (which is E), the x in 3x3 is treated as a normal x with score 126

3X3CUTRIX { +55 -126 +55 +27 -68 -18 8 -21 +126 } Sum: 290 Avg: 32.222... Index: 3.22

SG 01 (talk) 16:17, 31 August 2015 (UTC)

OK, my mistake. Thanks. Xhfz (talk) 16:27, 31 August 2015 (UTC) BTW: 3X3CUTRIX { +55 +126 +55 +27 -68 -18 +8 -21 +126 } Sum: 290

Yea, made a typo there originally, did edit-fix it ^^ Also SIXAXLE4x4 { +15 -21 +126 -14 +126 +12 -45 +35 +126 +35 } Sum: 395 Avg: 39.5 Index: 3.95 (which is the number next to it)

SG 01 (talk) 16:33, 31 August 2015 (UTC)

Mercedes 3X-WIF3 scores a decent 3,33 198.41.243.9 18:46, 31 August 2015 (UTC)

Anyone want a Porsche 911? Mikemk (talk) 18:53, 31 August 2015 (UTC)

The Saab Y. Worst possible car name. The Oldsmobile XXX. Best possible car name. 173.245.54.4 19:33, 31 August 2015 (UTC)

Seems worth mentioning somewhere that 3x3cutrix is semi leet/133+ for the English word executrix, the feminine form of executor, but I don't know quite where it belongs. Miamiclay (talk) 20:49, 31 August 2015 (UTC)

"The letters F and B, with scores of 5 and -5, respectively, are about as common in English as in car models." Looked odd, at first reading. May need re-writing to point out that ±5 is as close to zero (parity between English and car-speak) as you get in this example. Perhaps "...scores of merely +5 and -5, respectively", or similar? But that also seems too brief. 141.101.99.108 01:37, 1 September 2015 (UTC)

Forgot to add what I meant to put here... Apostrophes. Very rare in car names (just the Kia Cee'd), fairly often (over)used in standard English text. I wonder what its value is? (Not as easily 'assume it's a letter' as the x/times symbol.) 141.101.99.108 01:44, 1 September 2015 (UTC)
Order of the scores

There are two possible explanations

Score(x) = Frequency_in_cars(x) - Frequency_in_English(x)

I'm pretty sure it's a comparative scale between cars and English, not just a car-like/not-car-like scale.

Randall uses positive numbers if a letter is more common in car models than in typical English (as X) which he then calls carlike. He used negative numbers if a letter's relative frequency in car models is lower than in typical English (as O) and he calls it English-like (more suitable for readable text). The letters F and B, with scores of 5 and -5, respectively, are about as common in English as in car models. With this nomenclature, the most English-like letter is Y because, while not the most common English letter, it is apparently extremely rare in car models.
Score(x) = Frequency_in_cars(x)

English has no relationship with the score

It seems that Randall arbitrarily used positive and negative numbers: if a letter is very common in car models (as X) he calls it carlike. If a letter is very uncommon in car models (as O) he calls it English-like. With this nomenclature the most English-like letter is Y, but actually Y is the least carlike letter. The most common letter in ordinary English is E. Y on the other hand is just in the middle (place 13), which can't be called English-like.

Xhfz (talk) 12:56, 1 September 2015 (UTC)

"Y (...) can't be called English-like". Well, it can be, as it's not uncommon. And on the relative scale, it's much more indicative of being English than it is of being a car. And I'm going to give the explanation a further tweak, I think, hopefully small and agreeable. Also don't think the reversion helped (without checking the edit-changes), it was almost right. 141.101.99.108 13:24, 1 September 2015 (UTC)

Now I understood your idea. I think I tweaked it to be more understandable. X is a letter that supports your claim. Xhfz (talk) 13:41, 1 September 2015 (UTC)

I'd like to suggest a third possibility, I figured it was a ratio: Score(x) = 100*(Frecuency_in_cars(x) / Frequency_in_English(x) - 1). This allows numbers to be negative or positive and would explain the questions raised above. Djbrasier (talk) 13:53, 1 September 2015 (UTC)

Well, my "little tweak" became a big overhaul, then edit-conflicted. For the record, it became the following monstrosity:

Scores for letters and numbers are presumably taken from their frequency in car models. Randall doubtless analysed a car-name database, in a manner similar to that used to derive the letter frequency statistics for written English against which the former seems to have been compared.  From these, letters that appeared equally commonly in both lists (either rare or frequent, but consistently between the two) would have been given a hypothetical value of zero, whilst ones that were almost exclusively in one medium would have a high-magnitude score; positive for more car-like and negative for more English-like.
Without the raw car-letter frequency data it's hard to derive the exact formula used, but taking the mathematical log value of a ratio would give us zero for 1:1 (equally car-like and English-like) and high positive/negative values for comparisons more skewed more towards the former/latter.
The closest letters to zero in the comic are F at +5 and B at -5 and may hover somewhere around the same ratios in car-names as in English (around 2.2% and 1.4% of total usage in the above link), with just a slight car/English dominance.  The most 'car-like' letter is X, that seems to be quite common in cars whilst very rare (<1% of usage) in English.
The most 'English-like' letter in the comic is Y with a score of -90.  Y is not common in English (~2%), but presumably even more disproportionately uncommon in car names.  The next most 'English-like' letter, O, with a given score of -80.  It is significantly more frequent in English (~7.5%, and perhaps the fourth most encountered individual letter), and so is likely also more frequent in the raw car-name data, alone, albeit similarly much less than 'expected' from its English occurances.
It makes some sense that rarer English letters are over-chosen (for the novelty and stand-out effect) for car names, at the general expense of several commoner English letters without particular bias, thus the highest positive peak is greater in magnitude than the lowest negative trough.  Although you could also point out that 'x' (used for 'times') is also a more useful car-name 'letter', whilst the letter O might be surpressed in alphanumeric sequences so as not to be confused with a zero.
When looking at the numbers in the table, Randall's analysis may have dealt with the decimal digits entirely seperately, based upon something like Benford's Law for the natural occurance of numbers in common data, rather than from their disproportionately rare occurance within largely alphabetic English.  It is thus not unexpected that the 1 that is most common in data is underepresented within numbers in car-names, whilst sub-avearge 5 becomes a 'power number' in the world of cars, and the third most car-like character in the comic.
There are 19 positive scores and 17 negative scores.  They each add up to a score of 735 and -722, respectively, with the grand total being +13, suggesting that without rounding errors the whole system could have a neutral score.  The numbers alone  give a total offset of -0, the letters alone thus account for a not particularly unreasonable +0.5 'error' per character, and may also support the idea of separate analyses of these two sets.

...there was no easy way to resolve the differences, so the above is FYI. (TLDR: perhaps it's a Log function?) In editing it down, I'd also had another bit:

The letters I and T may appear in non-word model-name strings to represent "Injection" and "Turbo", respectively, but with their overwhelming commonality already in English text they still appear more more in English than in cars.

...which was looked less useful and too wordy even for me, but might also be a useful fragment to consider. 141.101.99.108 15:09, 1 September 2015 (UTC)

Typo or Deliberate?

Randall gave REV-4 as an example car name. Did he accidentally misspell the (Toyota) RAV4, or was this a deliberate reference to chapter 4 of Revelations?--173.245.54.26 02:31, 1 September 2015 (UTC)

Old Goths

49 is a reasonable age for those who grew up Goth in the 80s, just sayin'. --141.101.99.123 08:47, 1 September 2015 (UTC)

I thought this too. It could be a joke on a youth sub-culture growing up (old). -- 108.162.229.157 11:28, 1 September 2015 (UTC)

'Quick' and Dirty Car Data

Examining this page, which has notable exceptions (I specifically looked for the Toyota Yaris and the Kia Cee'd, neither of which were there), using a quick script to isolate the car names, a lengthy manual process of sanitising all the exceptions the quick script couldn't handle and then another script to analyse letter frequencies of the model names (not the make/marque part), I came up with the following undefinitive data, that is almost certainly flawed but may yet be useful:

<spaces> = 85 (but this count of whitespace may not be accurate and is superfluous...
& = 1  (...as are these first four items of punctuation, given their absence from Randall's chart)
- = 23
. = 3
/ = 10
0 = 104
1 = 73
2 = 54
3 = 43
4 = 35
5 = 54
6 = 35
7 = 18
8 = 26
9 = 17
A = 231 (includes à)
B = 30
C = 95
D = 54
E = 210 (includes é and ë)
F = 46
G = 52
H = 18
I = 122
J = 12
K = 13
L = 113
M = 83
N = 99
O = 145 (includes ó)
P = 80
Q = 4
R = 202
S = 127 (includes Š)
T = 166
U = 45
V = 38
W = 19
X = 25
Y = 33
Z = 14

Comparing just B and F (natural frequency 1.4% and 2.2%, above 30 to 46, both instances being approximately 1:1.5 when comparing the two letters within the same source), this matches the similarly close-to-zero scores given to them by Randall. O vs. Y is 4.4:1, above, real life is 3.8:1 and adjusting for O being 1/9th 'more carlike' we get a similar value. But Z vs J is 7:6, real life it's 1:2 and I can't reconcile that with the 1.3:1 on Randall's chart. Probably indicates something non-linear (e.g. a log function) along the way, if O:Y wasn't so easy to distinguish. Might, of course, be a differently biased dataset and thus GIGO. 141.101.99.108 00:35, 2 September 2015 (UTC)