Unicode

Archived Posts from this Category

IUC45 talk: “Top issues in Universal Acceptance”

Posted by on 30 Jun 2021 | Tagged as: meetings and conferences, Unicode, Universal Acceptance

I’m delighted to be presenting, once again, to the 45th Internationalization and Unicode Conference (IUC45).  The conference is the gathering of my “tribe”, people who are as enthusiastic about language, text, and software as I am. If you like this stuff, it’s the best place in the world to be for those three days. Or, given the pandemic, the conference might be partially or completely virtual, so that webcast is the best UDP session in the world. In either case, please register and join us there.

Continue Reading »

Earth, Moon, and abolishing leap seconds: the curious astronomy and politics of time() (IUC44 session)

Posted by on 30 Nov 2020 | Tagged as: culture, meetings and conferences, time, Unicode

Last month was the pandemic-distanced rendition of the Internationalization and Unicode Conference. This year is the 44th conference, or IUC44.  In addition to a tutorial (blogged about last month), I delivered a presentation: Earth, Moon, and abolishing leap seconds: the curious astronomy and politics of time(). Here are my slides, and a video of me talking through my slides.

Continue Reading »

Email addresses and domain names are NON-latin! Now what? (IUC44 tutorial)

Posted by on 31 Oct 2020 | Tagged as: meetings and conferences, Unicode, Universal Acceptance

Two weeks ago was the pandemic-distanced rendition of the Internationalization and Unicode Conference. This year is the 44th conference, or IUC44.  In addition to a presentation (to be blogged later), I delivered a tutorial: Email addresses and domain names are NON-latin! Now what? Here are my slides, and a video of me talking through my slides.

Continue Reading »

PostScript code converting UTF-8 to UTF-16

Posted by on 31 Aug 2020 | Tagged as: robobait, software engineering, Unicode

This is a little bit of code which was fun and nostalgic to write, even though the motivating project fell through. I wrote PostScript language functions to convert strings with UTF-8 contents, into strings with UTF-16 contents. This was intended to be part of a batch tool to convert PDF documents to PDF/A format, but that did not work out. However, the code works, and here it is.

Continue Reading »

A settler’s guide to to reading, typing, and spelling Vancouver’s new shibboleths

Posted by on 30 Jun 2018 | Tagged as: community, culture, Unicode, Vancouver

My home, Vancouver B.C., just announced new names for two public places: “šxʷƛ̓ənəq Xwtl’e7énḵ Square” and “šxʷƛ̓exən Xwtl’a7shn” . In contrast to just about every other name in this town, these names are not Scottish- or English-derived. Nor are they a Chinese phoneticisation of a Scottish-derived name. Instead, at long last our town asked the First Nations leaders, whose people have been here the longest by far, to contribute the names. I think it is awesome. It is a step towards reconciliation, tiny but real. I think these names will become Vancouver’s new shibboleths.

But names like these represent change, and change is unsettling. The characters are unfamiliar-looking! We don’t know how to pronounce them! There are rectangular boxes showing missing text! There is no ə key on our keyboards! Heh. We seem to have no problem expecting immigrants who grew up with Chinese or Ge’ez or Gujurati writing to learn how to write and pronounce “Granville”, but we are reluctant to step up when it’s our turn.

Never fear. I’m a software engineer specialising in internationalisation and Unicode. Let me explain how to read, type, and spell these names.  It’s really very interesting. Continue Reading »

Top Posts: Why Unicode has separate codepoints for “characters with identical glyphs”

Posted by on 31 May 2018 | Tagged as: i18n, multilingual, robobait, software engineering, Unicode

I post on various forums around the net. Sometimes I am able to tap into such inspiration that I want to add that essay to my portfolio. Such was the case here. The question: Why does Unicode have separate codepoints for characters with identical glyphs? My response begins: The short answer to this question is, “Unicode encodes characters, not glyphs”. But like many questions about Unicode, a related answer is “plain text may be plain, but it’s not simple”.… Continue Reading »

Email addresses and domain names are NON-latin! Now what? (IUC41 tutorial)

Posted by on 28 Feb 2018 | Tagged as: i18n, meetings and conferences, multilingual, Unicode, web technology

Last fall I attended the Internationalization and Unicode Conference. That year was the 41st conference, or IUC41.  In addition to a presentation (described in a blog last October), I delivered a tutorial: Email addresses and domain names are NON-latin! Now what?  I should have blogged about my slides last October, but better late than never. Here are my slides. Continue Reading »

Universal Acceptance of non-Latin email addresses and domain names: how does your framework rate? (IUC41 presentation)

Posted by on 31 Oct 2017 | Tagged as: i18n, meetings and conferences, multilingual, Unicode, web technology

One of my treats each year is to attend the Internationalization and Unicode Conference. This year was the 41st conference, or IUC41.  As I often do, I made a presentation. This year, the title was, Universal Acceptance of non-Latin email addresses and domain names: how does your framework rate? I’d like to share my slides. Continue Reading »

“Building Localization Capacity Through Non-specialist Developers”, at IUC 39

Posted by on 30 Sep 2015 | Tagged as: meetings and conferences, software engineering, Unicode

I’m delighted to be presenting, once again, to the 39th Internationalization and Unicode Conference (IUC39).  The conference is the gathering of my “tribe”, people who are as enthusiastic about language, text, and software as I am. If you like this stuff, it’s the best place in the world to be for those three days, so please register and join us there.

My presentation is, Building Localization Capacity Through Non-specialist Developers. Here’s the abstract: Continue Reading »

‘’tain’t right’, says he: storm in apostrophe

Posted by on 14 Jun 2015 | Tagged as: culture, language, Unicode

A friend pointed me to a interesting blog post, Which Unicode character should represent the English apostrophe? (And why the Unicode committee is very wrong.) by Ted Clancy, 3. June 2015. The argument: “The Unicode committee is very clear that U+2019 (RIGHT SINGLE QUOTATION MARK) should represent the English apostrophe…. This is very, very wrong. The character you should use to represent the English apostrophe is U+02BC (MODIFIER LETTER APOSTROPHE). I’m here to tell you why why….” [Emphasis in the original.]

I understand that there might be many people on this planet who actually don’t care about English language orthography concerning the apostrophe, contractions, and Unicode plain text representations thereof. Go ahead, skip this post and go on with your day. I am completely captivated by such questions. I started writing a quick reply, which grew to the point where it seemed better to host it on my blog than on Clancy’s comments page. Continue Reading »

Next Page »