1726: Unicode

Explain xkcd: It's 'cause you're dumb.
Revision as of 20:51, 29 August 2016 by 162.158.214.230 (talk) (Japanese is syllabic, not logographic)
Jump to: navigation, search
Unicode
I'm excited about the proposal to add a "brontosaurus" emoji codepoint because it has the potential to bring together a half-dozen different groups of pedantic people into a single glorious internet argument.
Title text: I'm excited about the proposal to add a "brontosaurus" emoji codepoint because it has the potential to bring together a half-dozen different groups of pedantic people into a single glorious internet argument.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: first time making page. Needs much more work including links, sources and the Brontosaurus reference.
If you can address this issue, please edit the page! Thanks.

Cueball, along with two other figures, is placing traffic signs in a river. As rivers flow according to the landscape, this plan will not work and the river will continue on its course. Cueball is very frustrated by this and is still trying to make the river obey traffic laws. The caption lays out the punchline: the comic compares the useless approach of Cueball attempting to divert a flowing, moving river with fixed signs that do nothing, with the Unicode Consortium's attempt to define the diverse and ever-changing human language with strict technical standards.

Unicode is a largely successful attempt to have a standard for representing all possible letters, numerals, digits and symbols that make up human writing in all languages. This includes the roman letters used in this article, characters with modifiers like ê (both with the common characters as well as the modifiers selectable separately), pictographic characters like in Chinese, syllabic writing system like Japanese, right-to-left and/or top-to-bottom writing systems, mathematical symbols, emoji, and many other writing systems. As can be imagined, coping with the wide variety of character sizes, orientations, ways they can be modified, capitalization rules, etc. can get to be very challenging as the Unicode Consortium tries to write rules that accommodate how printed language is actually used.

The title text refers to a proposal to add three dinosaur heads to the official list of emoji. This is likely to stir debate between the following opposing camps:

  • those who favor the inclusion of more emoji vs. those who oppose emoji on principle
  • those who accept the existence of Brontosaurus vs. those who deny its status as a species unique from Apatosaurus
  • those who favor a traditional, scaly image of dinosaurs vs. those who have accepted the feathered-dinosaur paradigm

See also

636: Brontosaurus

Jurassic Emoji proposal

Transcript

[Single panel scene: Cueball is standing waist-deep in a river. With one arm he is holding on to a traffic sign that says "Detour" with an arrow pointing to the right. The other arm is pointing horizontally. Further up the river is another street sign apparently in around 0.5 metres of water; this sign has an exclamation mark inside a triangle. In the distance on one bank of the river, two people are standing and making gestures, with a sign lying on the ground next to them. Behind them is a parked car on a road that crosses a bridge over the river.]

Cueball: No, go this way, not —

Are you even listening!?

Hey! That's not what this area is for!

[Caption]

Watching the Unicode people try to govern the infinite chaos of human language with consistent technical standards is like watching highway engineers try to steer a river using traffic signs.


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

  1. Proposal by Courtney Milan - 3 dinosaurs: http://unicode.org/L2/L2016/16072-jurassic-emoji.pdf
  2. Feedback by Andrew West - 13 dinosaurs: http://www.unicode.org/L2/L2016/16103-jurassic-fdbk.pdf
  3. Article by Becky Ferreira - they should have feathers: http://motherboard.vice.com/read/dinosaur-emojis

We now have a sauropod 🦕 and a T-Rex 🦖 emoji approved in Unicode 10.0 in 2017, yay: https://emojipedia.org/sauropod/ https://emojipedia.org/t-rex/ --172.68.50.221 07:46, 5 April 2022 (UTC)

Sebastian 162.158.83.168 12:14, 29 August 2016 (UTC)--

Regarding the brontosaurus reference, there is also some material in the intro of the wikipedia page. Chtit draco (talk) 14:33, 29 August 2016 (UTC)
Comic could be a reference to WE’RE ALL USING THESE EMOJI WRONG - http://www.wired.com/2015/05/using-emoji-wrong/ where the 😪 emoji is supposed to be a sleepy emoji and not a side-tear emoji - http://emojipedia.org/sleepy-face/ - see facebook's interpretation vs Samsung's 162.158.49.60 (talk) (please sign your comments with ~~~~)
Indeed. However IMHO the problem lies not in the standardisation attempt, but on the choice of non-obvious pictograms (which is a font-designer problem). The sleepy emoji would not be used wrong if it unquestionably looked like sleepy. Chinese solved this problem long ago by switching from pictograms to abstract ideogram designs. 108.162.229.49 14:13, 30 August 2016 (UTC) Sylvain M.

I thought it was funny that the two people in the upper left (who, at the time of this comment, were noted to be "helping" Cueball) are actually impeding the quixotic quest by arguing amongst themselves. 108.162.237.222 23:38, 29 August 2016 (UTC)

Personally, I'm still dumbfounded by the lack of a marijuana leaf. There are pills, a syringe, a cigarette, rice wine, plus *multiple* Emoji for both wine & beer. I hate the fact that Emoji are *not* implemented in a sensible, standardized fashion: For instance, the guy Emoji may or may not have a mustache, or gray hair. The "short hair" female may be blonde, or brunette & may even have a coiffure instead of short hair! I think they should be far more specific with their definitions. Personally, I'm sticking with emoticons until they get this sorted out.  ; P As for dinosaur Emoji, contrary to my previous statement about specificity, I believe you only need three dinomoji: Carnivore head (raptor or T-rex, non-specific), long-neck herbivore in profile, & winged. Anything more specific than that should probably be expressed with, y'know, WORDS. 108.162.221.87 07:35, 30 August 2016 (UTC)

Words? Weird concept ;) Elektrizikekswerk (talk) 07:47, 30 August 2016 (UTC)
There's already a winged dinosaur emoji and has been since 2010 http://emojipedia.org/bird/ Jeremyp (talk) 09:33, 30 August 2016 (UTC)

There is a good amount of detail regarding why/how the Unicode people are arguing over Emojis (In reference to the title text) but there is not much information provided regarding what Randall is referring to in the main strip, e.g. an example of what kind of language regulations the Unicode group try to impose. While the current explanation does a good job of explaining why there is a lot of drama regarding a Brontosaurus Emoji, the meat and potatoes of the article is in reference to language itself. I have never encountered anyone trying to communicate in English using letters that are not part of the current alphabet. Since English uses predefined Roman symbols for sound representation, and the Unicode people only deal with the representation of symbols, I am having a difficult time comprehending how the group in charge of rendering English into text would have any part in the changes that (at least English) is undergoing (which are largely related to spelling and grammar, not the symbols itself). Snowblinded (talk) 08:19, 30 August 2016 (UTC)

I think the main point of this comic is about using characters from different alphabets to get a funny look (or fool anti-spam). In Unicode, characters sharing the same design but from different alphabets have separate code-points. For example: U+0041 (latin "A"), U+0391 (greek "Alpha") and U+0410 (cyrillic "A") look exactly the same but are not interchangeable... neither in Unicode nor in real life since writing English with Greek letters doesn't make sense anyway. Example 2: U+0049 (latin "I"), U+2160 (roman numeral 1) and U+30BC (japanese "E") have a similar yet different look (and very different meaning), and so have different code-points (seems logical). One may want to mix them to get a funny typing... as long as writing proper English is not a concern. Conclusion: I hardly see how Unicode restricts anything, since the "consistent technical standards" pretty much already exists in any language. 108.162.229.49 11:55, 30 August 2016 (UTC) Sylvain M.

I feel like he isn't trying to steer the river but the two confused looking people across the river. What else are their role if it's not the case?162.158.166.39 14:01, 30 August 2016 (UTC)

They have another sign laying down on the ground, so they seem to be fighting about where to put said sign. Psu256 (talk) 17:45, 30 August 2016 (UTC)

I think that the "Hey! That's not what that area is for!" line is about how people use features of Unicode in unintended ways.--Henke37 (talk) 12:33, 31 August 2016 (UTC)

You don't need to go far as emoji to show how Unicode is doomed; the CJK(Chinese, Japanese, Korean) charsets, used in probably most developed countries outside of America/Europe, have had pretty tough time getting settled yet still have a few problems 141.101.84.120 18:01, 31 August 2016 (UTC)

Can you elaborate or give a reference? Thanks 108.162.229.49 20:45, 31 August 2016 (UTC) Sylvain M.
Okay. Since I'm a Korean, let me start with Hangul, which is used to write Korean language. The beauty of Hangul is that a complete letter is consisted of 2~3 'jamo's(consonants or vowels). The first one is a consonant and called 'chosung', second one is a vowel and called 'joongsung', the last one's a consonant and called 'jongsung'. Possible numbers for each are 125, 95, 138. So total possible number of a letter is 1,638,750. But that's a theoretical number and actually frequently used letters are not that much. So in Unicode 1.0 there were 2,350 complete letters. However, it trimmed too much and was missing quite lots of letters. So 4,516 letters were added in Unicode 1.1. Unfortunately, this time the order of charset table was all messed up. You need a program to construct a letter from jamos and it was almost impossible to make a program that does consistent conversion. So in Unicode 2.0 these areas were totally scrapped, and 11,172 letters were allocated in a new area.
The Hangul charset was mostly settled there. The rest of 1,638,750 hangul letters that are rarely used are constructed by another method, writing three jamos in sequence. You might ask why we didn't use this method in the first place, that's because there would be too much overhead. We could have ended up using 4~6 byte per complete letter, instead of 2 byte per letter...
You can still find "CJK unified ideographs" keep being added even in recent Unicode versions. Since these ideographs are used in so vast area and different countries, there are so many similar but different characters. AFAIK these are mostly needed in Japanese names. 141.101.84.120 15:08, 1 September 2016 (UTC)

would a brontosaurus have feathers?141.101.98.19 01:21, 2 September 2016 (UTC)

Also, it's possible people who like to argue over how Unicode should define things could get draw in? 108.162.249.162 04:23, 2 September 2016 (UTC)

I want to see the brontosaurus emoji with the vomiting modifier from comic 1813 162.158.62.65 (talk) 15:02, 25 August 2022 (UTC) (please sign your comments with ~~~~)