Archived Posts from this Category
Archived Posts from this Category
Again this year, I joined Vancouver open data enthusiasts in celebrating Open Data Day last Saturday. Despite limited time and schedule conflicts, I was able to make progress on an interesting project: a “dataset dataset” for the City of Vancouver’s Open Data Catalogue.
Think of the applications programming interface (API) for an application environment: an operating system, a markup language, a language’s standard library. What internationalisation (i18n) functionality would you expect to see in such an API? There are some obvious candidates: a text string substitution-from-resources capability like gettext(). A mechanism for formatting dates, numbers, and currencies in culturally appropriate ways. Data formats for text that can handle text in a variety of languages. Some way to to determine what cultural conventions and language the user prefers. There is clearly a whole list one could make.
Wouldn’t it be interesting, and useful, to have such a list? Probably many organisations have made such lists in the past. Who has made such a list? Are they willing to share it with the internationalisation and localisation community? Is there value in developing a “good practices” statement with such a list? And, most importantly, who would like to read such a list? How would it help them? In what way would such a list add value? Continue Reading »
I want to pass along a tip about confusing field names used in the Ads Factory component for Joomla for geographic data. I encountered this while customising this component for a client. At first I thought it was a bug, but now I think it’s just an odd naming convention.
Ads Factory, by Romanian developers The Factory, is a commercial component for Joomla 1.5 which lets you add classified ads to your Joomla site. (My client had me working with version 1.x on Joomla 1.5, but I see there is also a version 2.1 of Ads Factory which is Joomla 1.6 native.) There are quite a few places where Ads Factory includes geographic information: each user record can record a latitude and longitude for that user; each ad can record a latitude and longitude for the advertised merchandise; and there is way to make a “radius search”, i.e. find all ads within a given distance of a user-specified location.
These latitude and longitude values are stored in database fields with name suffixes “X” and “Y”. The user’s latitude and longitude are stored in fields “GoogleX” and “GoogleY” of the Ads Factory user table. Similarly, but not completely consistently, the ad’s latitude and longitude are stored in fields “MapX” and “MapY” of the Ads Factory ads table. The confusion comes in understanding which field stores the latitude, and which stores the longitude.
Latitude is, of course, the signed number of degrees north of the equator of a point on the earth’s surface. It ranges from +90.0 (the North Pole) to 0.0 (the Equator) to -90.0 (the South Pole). Thus, it’s a vertical coordinate. Longitude is the signed number of degrees east of the 0° meridian (roughly Greenwich, England). It ranges from +180.0 to -180.0. My part of North America is 122-123° west of Greenwich, so we have longitudes of -123.0 to -122.0 or so. It’s a horizontal coordinate. This is a well-established convention in many mapping standards.
Tidy Cartesian mathematicians like me use the convention of (X,Y) coordinates, where X is the horizontal coordinate and Y is the vertical coordinate. This is a well-established convention in geometry and graphics (though there are some exceptions).
My first interpretation of Ads Factory field names like “GoogleX” and “GoogleY” was to interpret them according to the Cartesian convention: X is horizontal, and so stores longitude, while Y is vertical, and so stores latitude. Thus (MapX, MapY) would be (longitude, latitude), the opposite of what one expects from mapping. Odd. I was surprised to find some parts of the code storing latitude in X (the horizontal coordinate!) and longitude in Y (the vertical!), which was surely a bug. I was horrified when it appeared that every part of this code had the same bug!
Then I understood the convention. Ads Factory’s developer appear to have used the (X, Y) convention to indicate just the order of the coordinates, but not their Cartesian meaning. (MapX, MapY) means (latitude, longitude), as is conventional in mapping. X is the vertical coordinate, Y is the horizontal coordinate, in the Ads Factory context. If you remember that X means “first”, not horizontal, and Y means “second”, not vertical, the Ads Factory field names are self-consistent, and the code uses them correctly.
I haven’t seen any Ads Factory documentation which explains this, so I hope this note will help some of you Ads Factory enhancers who are using these fields.
Postscript: what did my client ask me to do with Ads Factory for their site? Modify the radius search to search around the user’s latitude and longitude, instead of a location the user enters. Also, to sort the keyword and category search results by distance from the user. Quite straightforward to do, though it requires customisations to the Ads Factory code that have to be re-done everytime one upgrades the Ads Factory component.
I use the EasyEclipse distribution of Eclipse, the free (libre) software development environment. I just figured out how to fix an obscure error message:
Eclipse Web tools editors (2.0.1) requires plug-in "system.bundle" Eclipse Data Tools (1.5.1) requires plug-in "system.bundle"
When I would start up EasyEclipse (version 1.3.1 for Mac OS X, with Python, C++, Java, PHP and more support added), it would tell me that I had some outdated components, and offer to update them for me. But when I opened the menu item Help… Software Updates… Manage configuration, I would get the ominous error alert:
“The current configuration contains errors and this operation can have unpredictable results. Do you want to continue? [Cancel] [OK]”.
I wasn’t able to find documentation about this problem specifically. (My purpose in writing this is to help others benefit from what I learned.)
Are you using PHP (or libGD) to generate PNG images? Are you having problems getting your text anti-aliased, and also having your “transparent” colour recognised as transparent? Well, I had that problem too. libGD, the component which PHP uses to handle image operations, gives you a choice: you can have anti-aliased text, or a designated colour as transparent… but not both. Here’s why, and what you can do about it.
This post has been a long time in the making. A year ago, I started work on my Twanguages code. This was code to analyse a corpus of Twitter messages, and try to discern patterns about language use, geography, and character encoding. I decided to use the Django web framework and the Python language for the Twanguages analysis code. I know Python, but I was learning Django for the first time.
Django is really, really marvellous. When I tried this expression, and got the Python array of records I was expecting,
q2 = TwUser.objects.annotate(ntweets=Count('twstatus')).filter(ntweets__gt=1)
I wrote in my log, “I think I just fell in love. Power and concision in a tool, awesome.”
But Django gave me fits. It has its share of quirks to trap the unwary novice. Eventually I began writing notes about “Django gotchas” in my log. Some of them are Django being difficult, or inadequate. Some are me being a clueless novice, and Django not rescuing me from my folly. But all of them were obstacles. I share them in the hopes of helping another Django novice.
Here are my Django gotchas. They are ranked from the most distressing to most benign. They apply to Django 1.1, the current version at the time. (As of August 2010, the current version is 1.2.1.) A couple of gotchas were addressed by Django 1.2, so I moved them down to a section of their own. The rest presumably still apply to Django 1.2, but I haven’t gone back to check.
S2 = models.TwStatus.objects.get( key )
I got a lot of weird errors, e.g. “ValueError: too many values to unpack” (where key is string) and “TypeError: ‘long’ object is not iterable” (where key is long). I had made a mistake, of course; the call to get() should have a keyword argument of “id__exact” or the like, not a positional argument. The correct spelling is this:
S2 = models.TwStatus.objects.get( id__exact=key )
The gotcha is that Django’s .get() isn’t written defensively. It isn’t very robust to programmer errors. Instead of checking parameters and giving clear error messages, it lets bad parameters through, only to have them fail obscurely deep in the framework. If defensive programming of the Django API would slow it down too much in production, I’d love to have a debug mode I could invoke during development. Continue Reading »
There is a lot of international, multilingual, and multicultural activity in Vancouver. Also, there’s a thriving tech scene. But there’s no place for the people in the intersection of those two circles — those interested in and working on the internationalisation, localisation, and multilingual aspects of technology projects — to get together and share ideas. I think there ought to be.
And I’ll even propose a name: IMLIG1604, the I18n L10n M11l I6t G3p (Internationalisation, Localisation, and Multilingual Interest Group) for North America’s 604 area code. If you can decipher the title, you’re in the club!
Monday, 30. November 2009, 18:30-20:30h. At The Network Hub, 422 Richards Street, 3rd floor, Vancouver, BC V6B 2Z3. tel +1 604 767 8778.
A monthly meeting of the Vancouver Joomla User Group. Admission free. All people interested in learning more about the content management system, and helping others learn more, are welcome.
One of the many nice touches of the Django framework is that it provides tools and instructions to make a standalone Django documentation set from its distribution. (Django is an application framework for the Python language that helps with database access and web application.) Standalone docs are great for people like me who work on a laptop and are sometimes off the net. But I’m using Mac OS X, I get my code through Macports, and Django’s instructions don’t quite cover this case. So I just figured it out. Here’s the tricks I needed. Maybe it will help you.
What “twanguage” do you “tweet”? Twitter, the buzzing conversation of brief web and SMS messsages, exploded into wide use in 2009. But just how wide? To how many countries has it spread? And into which languages? I’m aiming to find out.
I’ve started a project named “Twanguages”, a language census of a sample of Twitter’s global traffic. I’m curious: which are the top languages? Are #hashtags localised? How does language correlate with location? And which Unicode character is the most rarely used?
I’ll be presenting our results at the 33rd Internationalization and Unicode Conference (IUC33), held in San Jose, California, on October 14-16, 2009. I have a place cleared for a Twanguages project page, and I’ll post interim results there as they become available (right now it’s only a placeholder). Stay tuned!