Again this year, I joined Vancouver open data enthusiasts in celebrating Open Data Day last Saturday. Despite limited time and schedule conflicts, I was able to make progress on an interesting project: a “dataset dataset” for the City of Vancouver’s Open Data Catalogue.
Think of the applications programming interface (API) for an application environment: an operating system, a markup language, a language’s standard library. What internationalisation (i18n) functionality would you expect to see in such an API? There are some obvious candidates: a text string substitution-from-resources capability like gettext(). A mechanism for formatting dates, numbers, and currencies in culturally appropriate ways. Data formats for text that can handle text in a variety of languages. Some way to to determine what cultural conventions and language the user prefers. There is clearly a whole list one could make.
Wouldn’t it be interesting, and useful, to have such a list? Probably many organisations have made such lists in the past. Who has made such a list? Are they willing to share it with the internationalisation and localisation community? Is there value in developing a “good practices” statement with such a list? And, most importantly, who would like to read such a list? How would it help them? In what way would such a list add value? Continue Reading »
Delightful! Last week I came home from the gathering of my trip, the 37th Internationalisation and Unicode Conference. My tutorial on Building multilingual websites in Drupal 7 and Joomla! 3 again went well. And I found inspiration, new knowledge, and old friends there.
Those of you looking for my slides and handouts, they are at the preceding link. You are welcome to share them, per their Creative Commons license. I’d appreciate credit when you share them. And I’d appreciate your feedback on this blog’s comments. Continue Reading »
I can’t believe I didn’t announce this before now. I’m delighted to be asked, once again, to present a tutorial on Building multilingual websites in Drupal 7 and Joomla! 3, at the 37th Internationalization and Unicode Conference (IUC37), this October in Santa Clara, California, USA.
This is my abstract, from the Unicode conference program for my talk: Continue Reading »
Today back in 1998, my uncle Spencer Boise asked me, “Jim and Ducky, do you both recognize the rights and responsibilities inherent in the marriage contract?” and I replied, “I do. I have come here freely to take this woman to be my wife. I promise to love her, comfort her, honor her, and keep her, above all others.”
Continue Reading »
Top Posts: StackOverflow “How do I get SQLAlchemy to correctly insert a unicode ellipsis into a mySQL table?”
I post on various forums around the net, and a few of my posts there get some very gratifying kudos. I’ve been a diligent contributor to StackOverflow, the Q-and-A site for software developers. I’m in the top 15% of contributors overall, and one of the top 25 answerers of Unicode-related questions. Here’s my second best-voted answer in StackOverflow so far.
The question, How do I get SQLAlchemy to correctly insert a unicode ellipsis into a mySQL table?, was asked by user kvedananda in February 2012. In abbreviated form, it was:
I post on various forums around the net, and a few of my posts there get some very gratifying kudos. I’ve been a diligent contributor to StackOverflow, the Q-and-A site for software developers. I’m in the top 15% of contributors overall, and one of the top 25 answerers of Unicode-related questions. Here’s my top-voted answer in StackOverflow so far.
Canada Post and the US Postal Service raised their postage rates again in January 2013. I was busy then, but I’ve grabbed a moment and updated my handy Canada Post and USPS postage rate quick reference card. The Canada Post rate increases were effective January 14, 2013, and the USPS increases were effective January 27.
My Canada Post and USPS Postage Rates project page, http://jdlh.com/en/pr/postage_card.html, has links to download the latest charts as I update them. The spreadsheet source file for the charts is also there. Both are licensed CC-BY-SA, so please feel free to re-use and modify them (as long as you attribute my work and share your product as freely).
Heads up: Canada Post has already received approval for first-class mail rate increases in 2014. The 2013 increases of both agencies came almost exactly one year after their 2012 increases, so I won’t be surprised if this becomes an annual event. The good news is that both Canada Post and USPS offer “perpetual” or “forever” stamps, which are worth first-class basic domestic postage, whatever the price may increase to.
OpenDataDay 2013 was celebrated last Saturday, February 23rd 2013, at over 100 hackathons and work days in 38 countries around the world. The City of Vancouver hosted a hackathon at Vancouver City Hall, and I joined in. My project was a language census of Vancouver’s open data datasets. Here’s what I set out to do.
Vancouver’s Open Data Day was a full house of some 80 grassroots activists, with attendance throughout the day by city staff, including Linda, the caretaker of the Vancouver Open Data portal and the voice of @VanOpenData on Twitter. I missed the “Speed Data-ing” session in the morning, where participants could circulate among city providers of datasets to talk directly about was available and what each side wanted. I’m told that national minister the Honourable Tony Clement was also there (who now is responsible for the Government of Canada’s Open Data portal data.gc.ca, but who also in 2010 helped turn off the spigot of open data at its source by killing the long form census). I saw Councilmember Andrea Reimer there for the afternoon working session, and listening to the day-end wrap-ups, tweeting summaries of each project. I won’t try to describe all the projects. Take a look at the Vancouver Open Data Day 2013 wiki page, or the tweets tagged #vodhd13 (for Vancouver), and tagged #OpenData (worldwide).
I gave myself two goals for the hackathon. First, provide expertise and increased visibility for internationalisation and multi-lingual issues among the participants. Second, work on a modest project which would move internationalisation of local data forward.
My vision is that apps based on Vancouver open data should be localised into all the languages in which Vancouver residents want them. Over 30% of the people in the Vancouver region speak a language other than English at home, says Stats Canada. That is over 700,000 people of the 2.9m people in the area. Now of course localising those apps and web sites is a task for the developer. My discipline, internationalisation (i18n), is a set of design and implementation techniques to make it cheaper and easier to localise an app or web site. At some point, an app or web site presents data sourced from an open data dataset. In order for the complete user experience to be localised, the dataset also needs to be localised. A challenge of enabling localisation of open data-sourced apps is to set up formats, social structures, and incentive structures which makes it easier for datasets to get localised into the languages which matter to the end users.
To that end, I picked a modest project for the day. It was to make a language census of the city of Vancouver’s Open Data datasets. The link is to a project page I started on the Open Data Day wiki. I intended it to be a simple table describing the Vancouver, but it ended up with a good deal of explanation in the front matter. I won’t repeat all that, but just give a couple of examples.
The 3-1-1 Contact Centre Interactions dataset (CSV format) has rows like (I’ve simplified):
Category1 , Category2 , Category3 , Mode , 2012-11, 2012-12, 2013-1 CSG - Licenses, Animal Control, Dead Animals Pickup, Voice In, 22, 13, 13
While the Animal Control Inventory Deceased Animals dataset (CSV format) has rows like (again, simplified):
ID, Date ,CatOther , Description ,Sex,ACO , Bag 7126,2013-02-23,SDC , Tan/black medium hair cat, ,Duty driver- JT, 13-00033 7127,2013-02-23,Dead Budgie, , ,Duty driver-JT , 13-00034 7128,2013-02-26,Cat , Black and White ,F , , 13-00035
Note that most of the fields are simply data: dates, numbers, codes. These do not need to be localised. Some of the fields, like the Category fields in the 311 Interactions, are English-language phrases. But they are pulled from a controlled vocabulary, and so could be translated once into the target language, and would not usually need to be updated when new data is release. In contrast, a few fields in the Animal Control Inventory dataset, e.g. CatOther, Description, and ACO, seem to contain free text in English. Potentially, every new record in the dataset represents a new translation task.
The purpose of the language census is to go through the datasets in the Vancouver Open Data catalogue, and the fields for each dataset, and simply identify which fields are data, which are controlled vocabulary, and which are free text. It’s not a major exercise. It doesn’t involve programming. Yet I believe it’s an important building block towards the vision of localised apps driven by open data.
Incidentally, this exercise inspired me to propose another dataset for the Vancouver catalogue: a dataset listing the datasets. There are 130 datasets in the Vancouver Open Data catalogue, and more are on the way. The only listing of them is an HTML page intended for human consumption. It would be nice to have a machine-readable table in CSV or XML format, describing the names and URLs and formats of the datasets in some structured way.
I’m happy to report success at my first goal, also. Several participants stopped by to talk with me about language support and internationalisation. I’m hopeful that it will help the non-English localisation of the apps, and city datasets, happen a little bit sooner.
If you would like to help in the language census, the project page is a wiki, and you are welcome to make constructive edits. See you there! Or, add a comment below.
With the year coming to an end, it is the season of making donations to organisations doing good in the world. In both Canada and the USA, this is motivated by a tax deadline; donations to certain charities by December 31 can be tax deductions for that year. It’s an opportunity to lay out here a concept that I helped draft a decade ago: the “Social Justice Tithe”.
The Social Justice Tithe means giving at least 10% of your income to some combination of charities, religious groups, and political groups that enact your values.