{"id":4073,"date":"2016-03-19T11:33:39","date_gmt":"2016-03-19T11:33:39","guid":{"rendered":"https:\/\/really.zonky.org\/?p=4073"},"modified":"2016-03-19T11:34:44","modified_gmt":"2016-03-19T11:34:44","slug":"the-weird-characters-of-unicode","status":"publish","type":"post","link":"https:\/\/really.zonky.org\/?p=4073","title":{"rendered":"The Weird Characters Of Unicode"},"content":{"rendered":"<p>As you may or may not know, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Unicode\">Unicode<\/a> is the standard for encoding text into &#8216;computer speak&#8217;. There have been many different encodings of characters (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Grapheme\">graphemes<\/a>, and other symbols) into &#8220;computer speak&#8221; by different manufacturers; all of which were severely limited. Almost all of them included the normal Roman alphabet plus a variety of other symbols.\u00a0Amongst other things, this had two main problems.<\/p>\n<p>Firstly, it excludes large parts of the world that use other writing systems from using computers sensibly &#8211; computers are difficult enough to use (at least at first) without learning a whole new writing system. And even then storing a document by transliterating it is not ideal &#8211; you have changed the original and introduced another possible source of errors.<\/p>\n<p>Secondly there is the problem of errors introduced into documents when moving them from one computer system to another &#8211; for example it was not unknown for a &#8220;#&#8221; to become a &#8220;\u00a3&#8221; (or very similarly,\u00a0&#8220;\u00a3&#8221; will appear as &#8220;\u00c2\u00a3&#8221;). Very much more extreme examples <a href=\"https:\/\/en.wikipedia.org\/wiki\/Mojibake\">exist<\/a>.<\/p>\n<p>Of course because of the enormous number of symbols in Unicode, there is a great deal of fun to be had with Unicode &#8211; not infrequently <a href=\"http:\/\/t-a-w.blogspot.co.uk\/2008\/12\/funny-characters-in-unicode.html\">poking fun<\/a> at Unicode for including ridiculous symbols. And why not? There&#8217;s no harm in having a bit of fun :-<\/p>\n<p><a href=\"http:\/\/www.springfrog.com\/converter\/upside-down-text.htm\">\u0265\u0287\u0131p\u01dd\u0279\u01dd\u026f \u01dd\u029e\u0131\u026f<\/a><\/p>\n<p>However it is worth pointing out that Unicode standards are a serious business and slipping in &#8220;fun&#8221; symbols is not likely to happen. Although I was not directly involved, I did help out one of the people who pushed for the inclusion of medieval Slavonic characters within the Unicode standard, and it is not a trivial process.<\/p>\n<p>And now for some &#8220;fun&#8221; Unicode characters&#8230;<\/p>\n<h1>\u00fe<\/h1>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Thorn_(letter)\">Thorn<\/a>. The English letter that got away. Before the age of printing, English had an additional letter (I&#8217;m over-simplifying here) which was used instead of the digraph <em>th<\/em>, so words such as <em>the<\/em> would have been spelt <em>\u00fee.\u00a0<\/em>By the time that printing had arrived, the shape of the thorn letter was becoming more like a &#8220;y&#8221; and because the printers imported their equipment from countries that did not have a thorn, the printed books tended to use &#8220;y&#8221; instead of \u00fe. Which is of course where we get &#8220;Ye olde Shoppe&#8221; from.<\/p>\n<p>Of course it was confusing printing\u00a0<em>ye<\/em> when we said\u00a0<em>\u00fee<\/em> (or\u00a0<em>the<\/em>), so the printers settled on\u00a0<em>the<\/em>.<\/p>\n<p>So why is \u00fe in Unicode? Because you cannot discuss the letter without including it, and perhaps more importantly cannot encode a historical document that used \u00fe without an encoding for it.<\/p>\n<h1>\u2603 and probably \u2615<\/h1>\n<p>Ah! The snowman (and the cup of coffee). What sense of fun allowed these symbols into the standard?<\/p>\n<p>Well according to the Unicode standard, it is contained within a block of weather symbols so it was almost certainly contained within a TV station&#8217;s encoding standard for weather forecasts. And you cannot claim to be a universal standard for text encoding without including the symbols included in other encodings.<\/p>\n<h1>\u203d<\/h1>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Interrobang\">interrobang<\/a>. The punctuation symbol used (if rarely) for signalling both a question and an exclamation: What the bleep are you doing\u203d<\/p>\n<p>Whilst not commonly used today, it was very commonly used in the 1960s and so there are many documents that need encoding that use this symbol.<\/p>\n<h1>\u2620, \u26a0, \u2622<\/h1>\n<p>These look like fun don&#8217;t they? They certainly do to me, but in fact they are international symbols for various dangers &#8211; poison (\u2620), warning (\u26a0) of a general nature, and radioactivity (\u2622). All pretty serious stuff; and you\u00a0<em>really<\/em> don&#8217;t want those symbols garbled in a document.<\/p>\n<h1>\u0e5b<\/h1>\n<p>This is a Thai &#8220;letter&#8221; and I picked it out because it&#8217;s made fun of elsewhere, but it stands for all the non-European symbols used in language.<\/p>\n<p>It may look kind of funny, but it probably isn&#8217;t so much to someone who knows Thai. To put it another way, if you told me that we&#8217;re not going to include the &#8220;M&#8221; in a character encoding because it looks too silly, I&#8217;d be very, very annoyed (my name contains two of &#8217;em).<\/p>\n<p>And yes I can type all of the above and the following into a text terminal \ud83d\ude03<\/p>\n<p>&nbsp;<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4076\" src=\"https:\/\/i0.wp.com\/really.zonky.org\/wp-content\/uploads\/2016-03-19_1119.png?resize=695%2C127&#038;ssl=1\" alt=\"2016-03-19_1119\" width=\"695\" height=\"127\" srcset=\"https:\/\/i0.wp.com\/really.zonky.org\/wp-content\/uploads\/2016-03-19_1119.png?w=736&amp;ssl=1 736w, https:\/\/i0.wp.com\/really.zonky.org\/wp-content\/uploads\/2016-03-19_1119.png?resize=300%2C55&amp;ssl=1 300w\" sizes=\"auto, (max-width: 695px) 100vw, 695px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As you may or may not know, Unicode is the standard for encoding text into &#8216;computer speak&#8217;. There have been many different encodings of characters (graphemes, and other symbols) into &#8220;computer speak&#8221; by different manufacturers; all of which were severely limited. Almost all of them included the normal Roman alphabet plus a variety of other <a href='https:\/\/really.zonky.org\/?p=4073' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_share_on_mastodon":"0"},"categories":[4],"tags":[1393,1395,1396,1394,1388,862,1390],"class_list":["post-4073","post","type-post","status-publish","format-standard","hentry","category-it","tag-1393","tag-1395","tag-1396","tag-interrobang","tag-thorn","tag-unicode","tag-th","category-4-id","post-seq-1","post-parity-odd","meta-position-corners","fix"],"share_on_mastodon":{"url":"","error":""},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1f2KI-13H","_links":{"self":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts\/4073","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4073"}],"version-history":[{"count":4,"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts\/4073\/revisions"}],"predecessor-version":[{"id":4078,"href":"https:\/\/really.zonky.org\/index.php?rest_route=\/wp\/v2\/posts\/4073\/revisions\/4078"}],"wp:attachment":[{"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4073"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4073"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/really.zonky.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4073"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}