Difference between revisions of "1726: Unicode"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Explanation: added dinosaur-denial)
(See also)
Line 27: Line 27:
* [http://www.fileformat.info/info/unicode/char/26a0/index.htm  Unicode Character 'WARNING SIGN']
* [http://www.fileformat.info/info/unicode/char/26a0/index.htm  Unicode Character 'WARNING SIGN']
* [http://www.fileformat.info/info/unicode/char/27a1/index.htm Unicode Character 'BLACK RIGHTWARDS ARROW']
* [http://www.fileformat.info/info/unicode/char/27a1/index.htm Unicode Character 'BLACK RIGHTWARDS ARROW']
* [http://www.unicode.org/mail-arch/unicode-ml/y2016-m08/0103.html Discussion about this comic on the Unicode mailinglist]

Revision as of 07:28, 30 August 2016

I'm excited about the proposal to add a "brontosaurus" emoji codepoint because it has the potential to bring together a half-dozen different groups of pedantic people into a single glorious internet argument.
Title text: I'm excited about the proposal to add a "brontosaurus" emoji codepoint because it has the potential to bring together a half-dozen different groups of pedantic people into a single glorious internet argument.


Ambox notice.png This explanation may be incomplete or incorrect: first time making page. Needs much more work including links, sources and the Brontosaurus reference.
If you can address this issue, please edit the page! Thanks.

Cueball, along with two other figures, is placing traffic signs in a river. As rivers flow according to the landscape, this plan will not work and the river will continue on its course. Cueball is very frustrated by this and is still trying to make the river obey traffic laws. The caption lays out the punchline: the comic compares the useless approach of Cueball attempting to divert a flowing, moving river with fixed signs that do nothing, with the Unicode Consortium's attempt to define the diverse and ever-changing human language with strict technical standards.

Unicode is a largely successful attempt to have a standard for representing all possible letters, numerals, digits and symbols that make up human writing in all languages. This includes the roman letters used in this article, characters with modifiers like ê (both with the common characters as well as the modifiers selectable separately), ideographic characters like in Chinese, syllabic writing system like Japanese, right-to-left and/or top-to-bottom writing systems, mathematical symbols, emoji, and many other writing systems. The symbols on the signs in the river, are, in fact Unicode, with the warning sign triangle with an exclamation mark ⚠ having code (U+26A0) and the black, rightwards arrow ➡ having code (U+271A). As can be imagined, coping with the wide variety of character sizes, orientations, ways they can be modified, capitalization rules, etc. can get to be very challenging as the Unicode Consortium tries to write rules that accommodate how printed language is actually used.

The title text refers to a proposal to add three dinosaur heads to the official list of emoji. This is likely to stir debate between the following opposing camps:

  • those who favor the inclusion of more emoji vs. those who oppose emoji on principle
  • those who accept the existence of Brontosaurus vs. those who deny its status as a species unique from Apatosaurus
  • those who favor a traditional, scaly image of dinosaurs vs. those who have accepted the feathered-dinosaur paradigm
  • those who point out that two of the dinosaurs in the "Jurassic Emoji" set actually come from the Cretaceous period, and as such renaming is necessary vs. those who think that "Jurassic" is a cooler word
  • those who for religious or other reasons deny the existence of dinosaurs.

See also

636: Brontosaurus


[Single panel scene: Cueball is standing waist-deep in a river. With one arm he is holding on to a traffic sign that says "Detour" with an arrow pointing to the right. The other arm is pointing horizontally. Further up the river is another street sign apparently in around 0.5 metres of water; this sign has an exclamation mark inside a triangle. In the distance on one bank of the river, two people are standing and making gestures, with a sign lying on the ground next to them. Behind them is a parked car on a road that crosses a bridge over the river.]

Cueball: No, go this way, not —

Are you even listening!?

Hey! That's not what this area is for!


Watching the Unicode people try to govern the infinite chaos of human language with consistent technical standards is like watching highway engineers try to steer a river using traffic signs.

comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!


  1. Proposal by Courtney Milan - 3 dinosaurs: http://unicode.org/L2/L2016/16072-jurassic-emoji.pdf
  2. Feedback by Andrew West - 13 dinosaurs: http://www.unicode.org/L2/L2016/16103-jurassic-fdbk.pdf
  3. Article by Becky Ferreira - they should have feathers: http://motherboard.vice.com/read/dinosaur-emojis

Sebastian 12:14, 29 August 2016 (UTC)--

Regarding the brontosaurus reference, there is also some material in the intro of the wikipedia page. Chtit draco (talk) 14:33, 29 August 2016 (UTC)
Comic could be a reference to WE’RE ALL USING THESE EMOJI WRONG - http://www.wired.com/2015/05/using-emoji-wrong/ where the 😪 emoji is supposed to be a sleepy emoji and not a side-tear emoji - http://emojipedia.org/sleepy-face/ - see facebook's interpretation vs Samsung's (talk) (please sign your comments with ~~~~)
Indeed. However IMHO the problem lies not in the standardisation attempt, but on the choice of non-obvious pictograms (which is a font-designer problem). The sleepy emoji would not be used wrong if it unquestionably looked like sleepy. Chinese solved this problem long ago by switching from pictograms to abstract ideogram designs. 14:13, 30 August 2016 (UTC) Sylvain M.

I thought it was funny that the two people in the upper left (who, at the time of this comment, were noted to be "helping" Cueball) are actually impeding the quixotic quest by arguing amongst themselves. 23:38, 29 August 2016 (UTC)

Personally, I'm still dumbfounded by the lack of a marijuana leaf. There are pills, a syringe, a cigarette, rice wine, plus *multiple* Emoji for both wine & beer. I hate the fact that Emoji are *not* implemented in a sensible, standardized fashion: For instance, the guy Emoji may or may not have a mustache, or gray hair. The "short hair" female may be blonde, or brunette & may even have a coiffure instead of short hair! I think they should be far more specific with their definitions. Personally, I'm sticking with emoticons until they get this sorted out.  ; P As for dinosaur Emoji, contrary to my previous statement about specificity, I believe you only need three dinomoji: Carnivore head (raptor or T-rex, non-specific), long-neck herbivore in profile, & winged. Anything more specific than that should probably be expressed with, y'know, WORDS. 07:35, 30 August 2016 (UTC)

Words? Weird concept ;) Elektrizikekswerk (talk) 07:47, 30 August 2016 (UTC)
There's already a winged dinosaur emoji and has been since 2010 http://emojipedia.org/bird/ Jeremyp (talk) 09:33, 30 August 2016 (UTC)

There is a good amount of detail regarding why/how the Unicode people are arguing over Emojis (In reference to the title text) but there is not much information provided regarding what Randall is referring to in the main strip, e.g. an example of what kind of language regulations the Unicode group try to impose. While the current explanation does a good job of explaining why there is a lot of drama regarding a Brontosaurus Emoji, the meat and potatoes of the article is in reference to language itself. I have never encountered anyone trying to communicate in English using letters that are not part of the current alphabet. Since English uses predefined Roman symbols for sound representation, and the Unicode people only deal with the representation of symbols, I am having a difficult time comprehending how the group in charge of rendering English into text would have any part in the changes that (at least English) is undergoing (which are largely related to spelling and grammar, not the symbols itself). Snowblinded (talk) 08:19, 30 August 2016 (UTC)

I think the main point of this comic is about using characters from different alphabets to get a funny look (or fool anti-spam). In Unicode, characters sharing the same design but from different alphabets have separate code-points. For example: U+0041 (latin "A"), U+0391 (greek "Alpha") and U+0410 (cyrillic "A") look exactly the same but are not interchangeable... neither in Unicode nor in real life since writing English with Greek letters doesn't make sense anyway. Example 2: U+0049 (latin "I"), U+2160 (roman numeral 1) and U+30BC (japanese "E") have a similar yet different look (and very different meaning), and so have different code-points (seems logical). One may want to mix them to get a funny typing... as long as writing proper English is not a concern. Conclusion: I hardly see how Unicode restricts anything, since the "consistent technical standards" pretty much already exists in any language. 11:55, 30 August 2016 (UTC) Sylvain M.

I feel like he isn't trying to steer the river but the two confused looking people across the river. What else are their role if it's not the case? 14:01, 30 August 2016 (UTC)

They have another sign laying down on the ground, so they seem to be fighting about where to put said sign. Psu256 (talk) 17:45, 30 August 2016 (UTC)

I think that the "Hey! That's not what that area is for!" line is about how people use features of Unicode in unintended ways.--Henke37 (talk) 12:33, 31 August 2016 (UTC)

You don't need to go far as emoji to show how Unicode is doomed; the CJK(Chinese, Japanese, Korean) charsets, used in probably most developed countries outside of America/Europe, have had pretty tough time getting settled yet still have a few problems 18:01, 31 August 2016 (UTC)

Can you elaborate or give a reference? Thanks 20:45, 31 August 2016 (UTC) Sylvain M.

Okay. Since I'm a Korean, let me start with Hangul, which is used to write Korean language. The beauty of Hangul is that a complete letter is consisted of 2~3 'jamo's(consonants or vowels). The first one is a consonant and called 'chosung', second one is a vowel and called 'joongsung', the last one's a consonant and called 'jongsung'. Possible numbers for each are 125, 95, 138. So total possible number of a letter is 1,638,750. But that's a theoretical number and actually frequently used letters are not that much. So in Unicode 1.0 there were 2,350 complete letters. However, it trimmed too much and was missing quite lots of letters. So 4,516 letters were added in Unicode 1.1. Unfortunately, this time the order of charset table was all messed up. You need a program to construct a letter from jamos and it was almost impossible to make a program that does consistent conversion. So in Unicode 2.0 these areas were totally scrapped, and 11,172 letters were allocated in a new area.
The Hangul charset was mostly settled there. The rest of 1,638,750 hangul letters that are rarely used are constructed by another method, writing three jamos in sequence. You might ask why we didn't use this method in the first place, that's because there would be too much overhead. We could have ended up using 4~6 byte per complete letter, instead of 2 byte per letter...
You can still find "CJK unified ideographs" keep being added even in recent Unicode versions. Since these ideographs are used in so vast area and different countries, there are so many similar but different characters. AFAIK these are mostly needed in Japanese names. 15:08, 1 September 2016 (UTC)

would a brontosaurus have feathers? 01:21, 2 September 2016 (UTC)

Also, it's possible people who like to argue over how Unicode should define things could get draw in? 04:23, 2 September 2016 (UTC)