1726: Unicode

I'm excited about the proposal to add a "brontosaurus" emoji codepoint because it has the potential to bring together a half-dozen different groups of pedantic people into a single glorious internet argument.
[edit] Explanation

Cueball is a highway engineer that has been placing two traffic signs in a river trying in vain to guide the water flow and thus he ends up talking to the water trying to make it take a detour instead of going under the bridge. On the distant bank two other engineers are arguing, with gestures, in presumably a heated manner (either about where to place a third sign lying next to them, at the the water to make it behave a certain way, or they are actually calling out to the crazy Cueball in the river to come back in)

As rivers flow according to the landscape, this plan will not work and the river will continue on its course. Cueball is very frustrated by this and is still trying to make the river obey the traffic laws. The caption lays out the punchline: The comic compares the useless approach of Cueball attempting to divert a flowing, moving river with fixed signs that do nothing, with the Unicode Consortium's attempt to define the diverse and ever-changing human language with strict technical standards.

Unicode is a largely successful attempt to have a standard for representing all possible letters, numerals, digits and symbols that make up human writing in all languages. This includes the roman letters used in this article, characters with modifiers like ê (both with the common characters as well as the modifiers selectable separately), logographic characters like in Chinese, syllabic writing system like Japanese, right-to-left and/or top-to-bottom writing systems, mathematical symbols and many other writing systems.

Emoji, one of the trendier Unicode blocks, are also referenced in the title text (see below). The symbols on the signs in the river are real road signs, but interestingly enough they also both exist in Unicode, with the warning sign triangle with an exclamation mark ⚠ having code (U+26A0) and the black, rightwards arrow ➡ having code (U+271A). As can be imagined, coping with the wide variety of character sizes, orientations, ways they can be modified, capitalization rules, etc. can get to be very challenging as the Unicode Consortium tries to write rules that accommodate how printed language is actually used. Emoji have become a recurrent theme on xkcd.

The title text refers to a proposal to add three dinosaur heads to the official list of emoji.

This is likely to stir a glorious internet argument between a half-dozen opposing (and pedantic) camps that may now be brought together, such as the following:

  • Those who favor the inclusion of more emoji vs. those who oppose emoji on principle.
  • Those who accept the existence of Brontosaurus vs. those who deny its status as a species unique from Apatosaurus.
    • Randall has made it clear what he believes in 636: Brontosaurus.
    • Although it seems new development has occurred since the release of that comic, suggesting that Brontosaurus is a specific species. But that is still debated...
  • Those who favor a traditional, scaly image of dinosaurs vs. those who have accepted the feathered-dinosaur paradigm.
  • Those who want Brontosaurus depicted as an ordinary or shrinkwrapped sauropod vs. Those who want it depicted with extra soft tissue, especially the heavy neck padding thought to be used for elephant-seal-like duels (the "Brontosmash" hypothesis).
  • Those who prefer a different dinosaur species be included instead.
  • Those who point out that two of the dinosaurs in the "Jurassic Emoji" set actually come from the Cretaceous period, and as such renaming is necessary vs. those who think that "Jurassic" is a cooler word (because of the Jurassic Park movies).
  • Those who for religious or other reasons deny the existence of dinosaurs.

See also this discussion about this comic on the Unicode mailinglist...

Highway engineers were also the subject of 253: Highway Engineer Pranks and 781: Ahead Stop.

[edit] Transcript

[Cueball is standing in a river close to it's right bank, the water reaching up to his thighs. He is holding on to a traffic sign standing towards right. It has a label and an arrow below this pointing to the right bank. With his other arm he is pointing to the left at the advancing water masses. Further up the river is another street sign this sign has an exclamation mark inside a triangle. The water flow is indicated with several lines on the river surface, mainly moving along the river, but around Cueball and the signs there are circular lines. In the distance on the left bank of the river two people are standing and making gestures with raised arms. The left has white hair (could be either sex) and the other is a Cueball-like guy. A third sign is lying on the ground to the left of them face down. Behind them is a slope up to a road with a parked car. The road continues out over a a bridge that crosses the river. The river which passes under it both left and right of a central pillar. At that distance the right bank of the river (and thus the right end of the bridge) is not visible, being outside the panel. On each river bank grass can be seen and on the right bank also a small stone.]
Cueball: No, go this way, not-
Cueball: Are you even listening!?
Cueball: ... Hey! That's not what that area is for!
Sign with arrow: Detour
Sign with triangle: !
[Caption below the panel:]
Watching the Unicode people try to govern the infinite chaos of human language with consistent technical standards is like watching highway engineers try to steer a river using traffic signs.

  1. Proposal by Courtney Milan - 3 dinosaurs: http://unicode.org/L2/L2016/16072-jurassic-emoji.pdf
  2. Feedback by Andrew West - 13 dinosaurs: http://www.unicode.org/L2/L2016/16103-jurassic-fdbk.pdf
  3. Article by Becky Ferreira - they should have feathers: http://motherboard.vice.com/read/dinosaur-emojis

Sebastian 12:14, 29 August 2016 (UTC)--

Regarding the brontosaurus reference, there is also some material in the intro of the wikipedia page. Chtit draco (talk) 14:33, 29 August 2016 (UTC)
Comic could be a reference to WE’RE ALL USING THESE EMOJI WRONG - http://www.wired.com/2015/05/using-emoji-wrong/ where the 😪 emoji is supposed to be a sleepy emoji and not a side-tear emoji - http://emojipedia.org/sleepy-face/ - see facebook's interpretation vs Samsung's (talk) (please sign your comments with ~~~~)
Indeed. However IMHO the problem lies not in the standardisation attempt, but on the choice of non-obvious pictograms (which is a font-designer problem). The sleepy emoji would not be used wrong if it unquestionably looked like sleepy. Chinese solved this problem long ago by switching from pictograms to abstract ideogram designs. 14:13, 30 August 2016 (UTC) Sylvain M.

I thought it was funny that the two people in the upper left (who, at the time of this comment, were noted to be "helping" Cueball) are actually impeding the quixotic quest by arguing amongst themselves. 23:38, 29 August 2016 (UTC)

Personally, I'm still dumbfounded by the lack of a marijuana leaf. There are pills, a syringe, a cigarette, rice wine, plus *multiple* Emoji for both wine & beer. I hate the fact that Emoji are *not* implemented in a sensible, standardized fashion: For instance, the guy Emoji may or may not have a mustache, or gray hair. The "short hair" female may be blonde, or brunette & may even have a coiffure instead of short hair! I think they should be far more specific with their definitions. Personally, I'm sticking with emoticons until they get this sorted out.  ; P As for dinosaur Emoji, contrary to my previous statement about specificity, I believe you only need three dinomoji: Carnivore head (raptor or T-rex, non-specific), long-neck herbivore in profile, & winged. Anything more specific than that should probably be expressed with, y'know, WORDS. 07:35, 30 August 2016 (UTC)

Words? Weird concept ;) Elektrizikekswerk (talk) 07:47, 30 August 2016 (UTC)
There's already a winged dinosaur emoji and has been since 2010 http://emojipedia.org/bird/ Jeremyp (talk) 09:33, 30 August 2016 (UTC)

There is a good amount of detail regarding why/how the Unicode people are arguing over Emojis (In reference to the title text) but there is not much information provided regarding what Randall is referring to in the main strip, e.g. an example of what kind of language regulations the Unicode group try to impose. While the current explanation does a good job of explaining why there is a lot of drama regarding a Brontosaurus Emoji, the meat and potatoes of the article is in reference to language itself. I have never encountered anyone trying to communicate in English using letters that are not part of the current alphabet. Since English uses predefined Roman symbols for sound representation, and the Unicode people only deal with the representation of symbols, I am having a difficult time comprehending how the group in charge of rendering English into text would have any part in the changes that (at least English) is undergoing (which are largely related to spelling and grammar, not the symbols itself). Snowblinded (talk) 08:19, 30 August 2016 (UTC)

I think the main point of this comic is about using characters from different alphabets to get a funny look (or fool anti-spam). In Unicode, characters sharing the same design but from different alphabets have separate code-points. For example: U+0041 (latin "A"), U+0391 (greek "Alpha") and U+0410 (cyrillic "A") look exactly the same but are not interchangeable... neither in Unicode nor in real life since writing English with Greek letters doesn't make sense anyway. Example 2: U+0049 (latin "I"), U+2160 (roman numeral 1) and U+30BC (japanese "E") have a similar yet different look (and very different meaning), and so have different code-points (seems logical). One may want to mix them to get a funny typing... as long as writing proper English is not a concern. Conclusion: I hardly see how Unicode restricts anything, since the "consistent technical standards" pretty much already exists in any language. 11:55, 30 August 2016 (UTC) Sylvain M.

I feel like he isn't trying to steer the river but the two confused looking people across the river. What else are their role if it's not the case? 14:01, 30 August 2016 (UTC)

They have another sign laying down on the ground, so they seem to be fighting about where to put said sign. Psu256 (talk) 17:45, 30 August 2016 (UTC)

I think that the "Hey! That's not what that area is for!" line is about how people use features of Unicode in unintended ways.--Henke37 (talk) 12:33, 31 August 2016 (UTC)

You don't need to go far as emoji to show how Unicode is doomed; the CJK(Chinese, Japanese, Korean) charsets, used in probably most developed countries outside of America/Europe, have had pretty tough time getting settled yet still have a few problems 18:01, 31 August 2016 (UTC)

Can you elaborate or give a reference? Thanks 20:45, 31 August 2016 (UTC) Sylvain M.

Okay. Since I'm a Korean, let me start with Hangul, which is used to write Korean language. The beauty of Hangul is that a complete letter is consisted of 2~3 'jamo's(consonants or vowels). The first one is a consonant and called 'chosung', second one is a vowel and called 'joongsung', the last one's a consonant and called 'jongsung'. Possible numbers for each are 125, 95, 138. So total possible number of a letter is 1,638,750. But that's a theoretical number and actually frequently used letters are not that much. So in Unicode 1.0 there were 2,350 complete letters. However, it trimmed too much and was missing quite lots of letters. So 4,516 letters were added in Unicode 1.1. Unfortunately, this time the order of charset table was all messed up. You need a program to construct a letter from jamos and it was almost impossible to make a program that does consistent conversion. So in Unicode 2.0 these areas were totally scrapped, and 11,172 letters were allocated in a new area.
The Hangul charset was mostly settled there. The rest of 1,638,750 hangul letters that are rarely used are constructed by another method, writing three jamos in sequence. You might ask why we didn't use this method in the first place, that's because there would be too much overhead. We could have ended up using 4~6 byte per complete letter, instead of 2 byte per letter...
You can still find "CJK unified ideographs" keep being added even in recent Unicode versions. Since these ideographs are used in so vast area and different countries, there are so many similar but different characters. AFAIK these are mostly needed in Japanese names. 15:08, 1 September 2016 (UTC)

would a brontosaurus have feathers? 01:21, 2 September 2016 (UTC)

Also, it's possible people who like to argue over how Unicode should define things could get draw in? 04:23, 2 September 2016 (UTC)

