i18n

Archived Posts from this Category

Choosing between UTF-8 and UTF-16: which has the better bytes-per-character ratio?

Posted by on 31 Dec 2010 | Tagged as: i18n, language, software engineering, Unicode

Software engineers sometimes are called on to specify which encoding a text file format should use.  These days, the top contenders for encoding are UTF-8 and UTF-16, both based on the Unicode Standard. One factor (amongst several, and perhaps not the most compelling) in choosing between them is storage efficiency: the number of bytes per character, or amount of storage per unit of text. If a given text takes a kilobyte of storage in UTF-8 and twice that in UTF-16, that’s a difference, which may be meaningful.

I recently looked for quantitative data about space efficiency of UTF-8 and UTF-16, and couldn’t find very much. Engineering discussions about storage efficiency are better informed by quantitative data than by opinion and supposition. I want to give one morsel of quantitative data more visibility, and clarify this issue. Continue Reading »

Building Multilingual Websites in Drupal and Joomla, at IUC34

Posted by on 31 Oct 2010 | Tagged as: CMS, drupal, i18n, Joomla, meetings and conferences, multilingual, Unicode

Once again I was fortunate enough to be invited to present at this year’s Internationalization and Unicode Conference (IUC). I have posted the paper and slides for my tutorial, Building Multilingual Websites in Drupal and Joomla, over on jdlh.com.

This was my abstract, from the Unicode conference program for my talk: Continue Reading »

How about an IMLIG (Internationalisation, Multilingual, Localisation Interest Group) for Vancouver?

Posted by on 27 Jun 2010 | Tagged as: i18n, language, meetings and conferences, multilingual, Unicode, Vancouver, web technology

There is a lot of international, multilingual, and multicultural activity in Vancouver. Also, there’s a thriving tech scene. But there’s no place for the people in the intersection of those two circles — those interested in and working on the internationalisation, localisation, and multilingual aspects of technology projects — to get together and share ideas. I think there ought to be.

And I’ll even propose a name: IMLIG1604, the I18n L10n M11l I6t G3p (Internationalisation, Localisation, and Multilingual Interest Group) for North America’s 604 area code. If you can decipher the title, you’re in the club!

Continue Reading »

Will Machine-only Translation Always Fall Short?

Posted by on 22 Jun 2009 | Tagged as: culture, i18n, language

I encountered a new blog from my i18n tribe today, Localization Best Practices. Their post, “Pidgins and Creoles” or “Why Machine-only Translation Will Always Fall Short”, caught my eye.  It is interesting, even if I don’t fully agree with them.

Jonathan writes that, at a recent conference on localisation:

…an audience member asked me about machine translation, and if it would ever completely take the place of human linguists in the industry. I answered “No,” although I did concede that machine translation is consistently making strides and does have a place in the localization community. He then mentioned that a scientific group in Europe recently had success with a robot performing a live human appendectomy. He believed that if something that delicate could be automated, what made something a “simple” as language beyond the scope of machines and artificial intelligence?  I thought about his question and then simply said, “Because there are no pidgins or creoles for appendectomies.” Continue Reading »

“International and multilingual Drupal and Joomla! sites” at LinuxFest Northwest

Posted by on 29 Apr 2009 | Tagged as: CMS, drupal, i18n, Joomla, meetings and conferences, multilingual, web technology

“International and multilingual Drupal and Joomla! sites” slide previewLast week I gave a presentation, International and multilingual Drupal and Joomla! sites. I’ve posted my slides and handouts at that link for anyone who wants to catch up on them.

The occasion was LinuxFest Northwest 2009, held at Bellingham Technical College in Bellingham, WA, USA. It was a delightful event. It’s thoroughly grassroots and volunteer, it has a friendly and accessible vibe, yet it attracts very knowledgeable people.

Continue Reading »

International and multilingual Drupal sites

Posted by on 22 Nov 2008 | Tagged as: CMS, drupal, i18n, meetings and conferences, Vancouver

International and multilingual Drupal sitesI gave a presentation about “International and multilingual Drupal sites” to the friendly folks at the Vancouver Drupal Users Group on November 20, 2008. Follow the link above to see the slides.

This was a great opportunity for me to investigate Drupal 6’s internationalisation (i18n). As part of the research for my paper, I set up a basic Drupal 6 site with UI strings and content translated into Japanese and English languages. I found that Drupal 6 has very good support for multilingual site hosting. However, there were some tricky aspects to installing the right modules and then setting up the system configuration. I summarise them in the presentation, but it’s probably worth writing some better documentation.

Continue Reading »

Jim is a panelist at the Internet Marketing Conference, Vancouver, Sept 12

Posted by on 07 Sep 2008 | Tagged as: culture, i18n, language, meetings and conferences, Vancouver

Internet Marketing Conference in Vancouver, September 11-12, 2008I’m going to be a panelist at the Internet Marketing Conference Vancouver 2008, which runs from September 11-12, 2008. The panel is called “Writing for the Web“. It is full of experts on writing — and then there’s me. I’ll be approaching from the topic crosswise, talking about international and multilingual issues. In other words, how your writing is affected if it will be translated, or is part of a multinational project.

The panelists are an interesting bunch. I’m looking forward to meeting them. They are:

Continue Reading »

Simple script-detection algorithm for font switching?

Posted by on 26 Aug 2008 | Tagged as: i18n, language, multilingual, software engineering, Unicode

Does anybody know of a simple script-detection algorithm (or heuristic) for font switching?

This came up with one of my clients. Suppose you have a guest book on your web site, and seven visitors left you the following inspiring messages:

  1. すべての人間は、生まれながらにして自由であり、かつ、尊厳と権利とについて平等である。
  2. 人人生而自由,在尊严和权利上一律平等。
  3. Semua orang dilahirkan merdeka dan mempunyai martabat dan hak-hak yang sama.
  4. 人人生而自由,在尊嚴和權利上一律平等。
  5. Alle Menschen sind frei und gleich an Würde und Rechten geboren.
  6. ‘Ολοι οι άνθρωποι γεννιούνται ελεύθεροι και ίσοι στην αξιοπρέπεια και τα δικαιώματα.
  7. 모든 인간은 태어날 때부터 자유로우며 그 존엄과 권리에 있어 동등하다.

(It looks like your visitors all read the Universal Declaration of Human Rights courtesy of the UDHR in Unicode project).

Now suppose you are so touched that you want to lay out all seven messages in a PDF file, and print it out as a booklet.  You have a beautiful layout template, and various complementary fonts: Latin script, Japanese, Korean, simplified Chinese, Traditional Chinese, and Greek script.

Which font to you apply to each message?  More importantly, is there a simple heuristic by which software can make the choice? (More after the jump.)

Continue Reading »

Jim presents to Joomla Day Vancouver this Saturday, June 14, 2008

Posted by on 11 Jun 2008 | Tagged as: CMS, i18n, Joomla, meetings and conferences, multilingual, Vancouver

There is a Joomla! Day in Vancouver this Saturday. I’ll be giving a brief presentation, on jdlh.com as an example of a multilingual Joomla! website, with human-friendly URLs.

Continue Reading »

“Web 2.0 goes to Babel: Multilingual websites and user-supplied content” at IUC32

Posted by on 31 May 2008 | Tagged as: CMS, i18n, Joomla, meetings and conferences, multilingual, Unicode

Oh right, I forgot to mention: I’ve been accepted to present to the 32nd Internationalization & Unicode Conference this September! I’m presenting on a topic which I’ve been working on lately: multilingual websites. The title is: Web 2.0 goes to Babel: Multilingual websites and user-supplied content.

Continue Reading »

« Previous PageNext Page »