Unicode

Archived Posts from this Category

Trip report: IUC37

Posted by Jim DeLaHunt on 31 Oct 2013 | Tagged as: Joomla, Unicode, drupal, meetings and conferences

Delightful!  Last week I came home from the gathering of my trip, the 37th Internationalisation and Unicode Conference. My tutorial on Building multilingual websites in Drupal 7 and Joomla! 3 again went well. And I found inspiration, new knowledge, and old friends there.

Those of you looking for my slides and handouts, they are at the preceding link. You are welcome to share them, per their Creative Commons license. I’d appreciate credit when you share them. And I’d appreciate your feedback on this blog’s comments. Continue Reading »

“Building multilingual websites in Drupal 7 and Joomla 3″ (IUC37 tutorial)

Posted by Jim DeLaHunt on 30 Sep 2013 | Tagged as: Joomla, Unicode, drupal, meetings and conferences

I can’t believe I didn’t announce this before now. I’m delighted to be asked, once again, to present a tutorial on Building multilingual websites in Drupal 7 and Joomla! 3, at the 37th Internationalization and Unicode Conference (IUC37), this October in Santa Clara, California, USA.

This is my abstract, from the Unicode conference program for my talk: Continue Reading »

Top Posts: StackOverflow “How do I get SQLAlchemy to correctly insert a unicode ellipsis into a mySQL table?”

Posted by Jim DeLaHunt on 31 Jul 2013 | Tagged as: Unicode, robobait, software engineering

I post on various forums around the net, and a few of my posts there get some very gratifying kudos. I’ve been a diligent contributor to StackOverflow, the Q-and-A site for software developers. I’m in the top 15% of contributors overall, and one of the top 25 answerers of Unicode-related questions. Here’s my second best-voted answer in StackOverflow so far.

The question, How do I get SQLAlchemy to correctly insert a unicode ellipsis into a mySQL table?,  was asked by user kvedananda in February 2012. In abbreviated form, it was:

Continue Reading »

Top Posts: StackOverflow “Django headache with simple non-ascii string”

Posted by Jim DeLaHunt on 31 May 2013 | Tagged as: Python, Unicode, software engineering

I post on various forums around the net, and a few of my posts there get some very gratifying kudos. I’ve been a diligent contributor to StackOverflow, the Q-and-A site for software developers. I’m in the top 15% of contributors overall, and one of the top 25 answerers of Unicode-related questions. Here’s my top-voted answer in StackOverflow so far.

The question, Django headache with simple non-ascii string,  was asked by user Ezequiel in January 2010. In abbreviated form, it was:

Continue Reading »

I18n and Unicode conference, and tutorial on multilingual Drupal and Joomla web sites, complete

Posted by Jim DeLaHunt on 31 Oct 2012 | Tagged as: CMS, Joomla, Unicode, culture, digital preservation, drupal, i18n, meetings and conferences, multilingual

Another stimulating Internationalisation and Unicode Conference (IUC36) just finished up last week (October 22-24, 2012). As usual it was rich with interesting people, stimulating subjects, and inspiration. My tutorial, Building multilingual websites in Drupal 7 and Joomla! 2.5, was well-attended and seemed to go well. My final paper and slides are posted at the preceding link.

Continue Reading »

“Building multilingual websites in Drupal 7 and Joomla 2.5″ (IUC36 tutorial)

Posted by Jim DeLaHunt on 30 Jun 2012 | Tagged as: Joomla, Unicode, drupal, meetings and conferences

I’m delighted to be asked, once again, to present a tutorial on Building multilingual websites in Drupal 7 and Joomla! 2.5, at the 36th Internationalization and Unicode Conference (IUC36), this October in Santa Clara, California, USA.

Continue Reading »

Choosing between UTF-8 and UTF-16: which has the better bytes-per-character ratio?

Posted by Jim DeLaHunt on 31 Dec 2010 | Tagged as: Unicode, i18n, language, software engineering

Software engineers sometimes are called on to specify which encoding a text file format should use.  These days, the top contenders for encoding are UTF-8 and UTF-16, both based on the Unicode Standard. One factor (amongst several, and perhaps not the most compelling) in choosing between them is storage efficiency: the number of bytes per character, or amount of storage per unit of text. If a given text takes a kilobyte of storage in UTF-8 and twice that in UTF-16, that’s a difference, which may be meaningful.

I recently looked for quantitative data about space efficiency of UTF-8 and UTF-16, and couldn’t find very much. Engineering discussions about storage efficiency are better informed by quantitative data than by opinion and supposition. I want to give one morsel of quantitative data more visibility, and clarify this issue. Continue Reading »

Building Multilingual Websites in Drupal and Joomla, at IUC34

Posted by Jim DeLaHunt on 31 Oct 2010 | Tagged as: CMS, Joomla, Unicode, drupal, i18n, meetings and conferences, multilingual

Once again I was fortunate enough to be invited to present at this year’s Internationalization and Unicode Conference (IUC). I have posted the paper and slides for my tutorial, Building Multilingual Websites in Drupal and Joomla, over on jdlh.com.

This was my abstract, from the Unicode conference program for my talk: Continue Reading »

11 Django gotchas

Posted by Jim DeLaHunt on 31 Aug 2010 | Tagged as: Python, Unicode, robobait, software engineering, web technology

This post has been a long time in the making. A year ago, I started work on my Twanguages code. This was code to analyse a corpus of Twitter messages, and try to discern patterns about language use, geography, and character encoding.  I decided to use the Django web framework and the Python language for the Twanguages analysis code.  I know Python, but I was learning Django for the first time.

Django is really, really marvellous.  When I tried this expression, and got the Python array of records I was expecting,

q2 = TwUser.objects.annotate(ntweets=Count('twstatus')).filter(ntweets__gt=1)

I wrote in my log, “I think I just fell in love. Power and concision in a tool, awesome.”

But Django gave me fits.  It has its share of quirks to trap the unwary novice. Eventually I began writing notes about “Django gotchas” in my log.  Some of them are Django being difficult, or inadequate. Some are me being a clueless novice, and Django not rescuing me from my folly. But all of them were obstacles.  I share them in the hopes of helping another Django novice.

Here are my Django gotchas.  They are ranked from the most distressing to most benign. They apply to Django 1.1, the current version at the time. (As of August 2010, the current version is 1.2.1.) A couple of gotchas were addressed by Django 1.2, so I moved them down to a section of their own. The rest presumably still apply to Django 1.2, but I haven’t gone back to check.

  1. API fails unhelpfully. I wrote a simple query expression like:
    S2 = models.TwStatus.objects.get( key )

    I got a lot of weird errors, e.g. “ValueError: too many values to unpack” (where key is string) and “TypeError: ‘long’ object is not iterable” (where key is long). I had made a mistake, of course; the call to get() should have a keyword argument of “id__exact” or the like, not a positional argument. The correct spelling is this:

    S2 = models.TwStatus.objects.get( id__exact=key )

    The gotcha is that Django’s .get() isn’t written defensively. It isn’t very robust to programmer errors. Instead of checking parameters and giving clear error messages, it lets bad parameters through, only to have them fail obscurely deep in the framework. If defensive programming of the Django API would slow it down too much in production, I’d love to have a debug mode I could invoke during development. Continue Reading »

How about an IMLIG (Internationalisation, Multilingual, Localisation Interest Group) for Vancouver?

Posted by Jim DeLaHunt on 27 Jun 2010 | Tagged as: Unicode, Vancouver, i18n, language, meetings and conferences, multilingual, web technology

There is a lot of international, multilingual, and multicultural activity in Vancouver. Also, there’s a thriving tech scene. But there’s no place for the people in the intersection of those two circles — those interested in and working on the internationalisation, localisation, and multilingual aspects of technology projects — to get together and share ideas. I think there ought to be.

And I’ll even propose a name: IMLIG1604, the I18n L10n M11l I6t G3p (Internationalisation, Localisation, and Multilingual Interest Group) for North America’s 604 area code. If you can decipher the title, you’re in the club!

Continue Reading »

« Previous PageNext Page »