A few days ago, I had to set the clock on my GPS unit. My GPS unit! It talks to over 24 satellites, each of which has atomic clocks accurate to nanoseconds — yet it didn’t know that Daylight Savings Time began in March instead of April.
We software engineers use variety of abstractions to represent points and durations in time. We synchronise to time servers which are accurate to the millisecond. Behind them are atomic clocks. Our abstractions can represent times centuries in the past and future. They are tidy, regular abstractions. But Daylight Savings Time is a reminder that the reality of time is messy and irregular. It is affected in small ways by astronomy, and in large ways by politics and human idiosyncrasy. That’s why I had to manually set the clock on my GPS unit.
One of the thing I love about engineering is where technology meets human idiosyncrasy. The technology bends to support the idiosyncrasy, and the human idiosyncrasy bends to fit the technology. Time representation is one such case.
Many software systems have an abstraction that represents time in terms of the number of seconds since some special reference date and time. <time.h> is the POSIX manifestation of this, storing times as integer numbers of seconds and microseconds since a 1970 reference time. From the integers, the system computes human-friendly structures like years, months, dates, hours, and minutes (leap years are no big deal). There’s a mechanism for converting from Coordinated Universal Time (UTC) to local time in various time zones. This includes Daylight Savings Time. All very tidy.
But when you start to look hard at time zones, things start to get interesting. Most of us live in time zones which are whole numbers of hours before or after UTC. In Vancouver, Canada, local time is eight hours earlier than UTC during Pacific Standard Time, and seven hours earlier than UTC during Pacific Daylight Time. But the time in Newfoundland and Labrador, on Canada’s east coast, is 30 minutes rather than an hour offset from its neighbour to the west. Afghanistan, India, Iran, three zones in Australia, and Venezuela also have time zones 30 minutes out of phase with most of the world. Nepal and Chatham Island, New Zealand, are 45 minutes out of phase: when it’s 05:00h in Vancouver, it’s 01:45h on Chatham Island. Time zones were regularised greatly in the 19th and 20th centuries. Before then, some timezones were arbitrary minutes and seconds out of sync with each other. (The TZ data set, tzdata2008b.tar.gz or its successor, is a fascinating read, with lots of historical time zone information and bibilographies.)
Daylight Saving Time has a richer and more controversial history than you may have known. David Prerau’s book “Seize the Daylight: The Curious and Contentious Story of Daylight Saving Time“, gives three centuries of that history. It’s reasonably well known in North America that some jurisdictions observe daylight saving time and some don’t. What’s less appreciated is that the rules for daylight savings time vary over time. Many software systems, even when they provide for time zones and daylight savings time, do so through static data tables. They get caught out when, as in the Energy Policy Act of 2005 in the USA, or every year in Israel, somebody decides to change daylight saving rules. Most software systems don’t have a way of describing all the historical changes in daylight saving time; they do really well to store one historical rule for a time zone. This is what happened to my GPS unit. I’ll have to manually set its daylight saving time until the maker comes up with a firmware patch that corrects the time zone tables.
About those time zone labels, like “CST”: they are ambigous! “CST” can mean “U.S./Canada Central Standard Time, Australian Central Standard Time, China Standard Time, or Cuba Summer Time“, points out Raymond Chen in “The Old New Thing“. Software systems generall don’t well with the ambiguity. Specifications for representing dates in plain text, such as the W3C’s profile of ISO 8601, “Date and Time Formats” or RFC 2822 - Internet Message Format, section 3.3. Date and Time Specification, are important because they offer a way to write dates unambiguously.
Another problem that crops up when time zone definitions vary, be it for daylight saving or other changes, is that it becomes more complicated to calculate time spans. If I want to calculate the number of seconds between April 1, 2005 08:00h and April 1, 2008 Vancouver local time, I’ll need to allow for the fact that the 2005 time is standard time but the 2008 time is daylight savings.
I’ll also need to allow for leap seconds. Remember that abstraction that each day has 24 hours, each hour has 60 minutes, and each minute has 60 seconds? Well, usually that’s true. But some minutes in UTC are defined to be 61 seconds long, so some days are 86,401 (instead of 86,400) seconds long. These leap seconds get added by the International Earth Rotation and Reference Systems Service (IERS) in order to keep the sun rising on time. It turns out the earth actually takes a bit longer than 24 hours * 60 minutes * 60 seconds to turn one day, so without leap seconds the sun would rise later and later UTC. ( You’ll be glad to know there will be no leap second on June 30, 2008. Way to turn, planet Earth!)
So to calculate that time span between 2005 and 2008 correctly, I’ll need to allow for the minute of December 31, 2005 23:59h UTC being 61 seconds long.
So seconds are small and finicky. Surely we can be confident about the date, right? Well, that’s complicated too.
Trivia question: on what date in 1917 did Russia’s “October Revolution” occur? Well, on November 7, of course! It was October 25 in the Julian calendar in use in Russia at the time (”old style”), but November 7 in the current Gregorian calendar (”new style”). They differ by 13 days this century. These calendar differences are widespread across the world over the last 1000 years. Until 1750, England’s civil year started on March 25, not January 1. Thus January 30 1649 (”new style”) was known at the time as January 30 1648 (”old style”). Wikipedia can tell you way more about these “Old Style” and “New Style” date complexities.
On many world maps, there is an “international date line” zig-zagging across the Pacific ocean. To the east of this line, local time is earlier than UTC; to the west, later than UTC. I hate to be the one to break it to you, but this line doesn’t actually exist. Time zones are the choice, essentially political, of human jurisdictions. That zig-zag on the map is the cartographer’s way of showing you which time zones are which side of UTC.
Being political, time zones can change dates. The Pacific island republic of Kiribati stretches from 172° E to 150° W longitude. Centered in the Gilbert Islands in a time zone 12 hours after UTC, upon independence it acquired islands to the east in time zones 11 and 10 hours before UTC. This meant that the ends of the country observed different dates, and only four days per work week overlapped. Effective January 1, 1995, Kiribati changed the Phoenix Island and Line Island time zones to 13 and 14 hours after UTC respectively. This changed their dates, so finally, the whole country was on the same date. But beware if you have to compute a time span between 1994 and 1995 for Kiribati, because their local time didn’t include December 31, 1994! Nevertheless, they didn’t (pace Wikipedia’s Geography of Kiribati article) move the International Date Line, just some time zones.
Usually the <time.h> abstraction of time, as a count of seconds and microseconds after a certain reference date and time, works plenty well for software engineering. There are certainly many sources of error in our time data (inaccurate clocks, clobbered time stamps) that are much greater than the limitations of this abstraction. But remember, time measurement is a human convention, and human conventions almost always have really interesting complexity, and variation over time.
Don’t let the tidiness of the abstraction blind you to the richness of the reality.
I want my site, jdlh.com, to be a multilingual site that communicates the business I want to do and lets me explore the tools for being world-ready. For nearly two years, I’ve worked to get a combination of tools that would do the job. I’m happy to say that this week I finally assembled a plausible solution. The final piece was sh404SEF, after some patching, with Joomla! 1.0.x and Joom!Fish.
jdlh.com supports content in multiple languages (English, Japanese, and German so far), and also a user interface in multiple languages (the same three now, but could differ). Each URL can include a language code between the domain name (”jdlh.com”) and the path to the content. The language codes look like “/en/” for English, “/de/” for German, and “/ja/” for Japanese. The codes are based on RFC 3066 . Where there is a language code in a URL, the site presents content localised for that language, to the extent possible. The content may not always available in that language, so the site may present the content in a fall-back language.
Where there is no language code in a URL, especially in the basic domain name http://jdlh.com/, the site looks at the HTTP Accept-Language header to determine which language the user prefers, and redirects the browser to content with that language code.
It’s important to me that the URLs of content on my site be concise, comprehensible to humans, and stable over time. I like Jakob Nielsen’s “URL as UI” column, and the W3C’s “Cool URIs don’t change“, and try to follow them.
jdlh.com is built using Joomla!, a free software content management system (CMS). Version 1.0.x of Joomla!, which I use as of early 2008, can be coaxed into using UTF-8 text encoding and tolerating multi-lingual content. I add in Joom!Fish, a Joomla component which helps manage content in multiple parallel languages, and provides useful language utilities like that UI widget at the top of the page, to select between languages.
Joomla has many strengths, but easy-to-read URLs aren’t among them. Left to itself, a Joomla URL is an opaque stream of numbers and codes. Turning those URLs into human-friendly URLs, which are concise, comprehensible to humans, and stable over time is the work of a “SEF” (Search-Engine-Friendly) component. Joomla has had several, but the first which satisifed us for jdlh.com is one called sh404SEF (see also sh404SEF on Joomla extensions and sh404SEF on siliana.net).
There has been a tough interaction between Joomla, Joom!Fish, and sh404SEF (and its ill-starred predecessors). Since mid-2006, Joomla would work with either of the other two, but not both together. Even as Joomla! moved forward to version 1.5.x, which has a better foundation for multilingual sites, I was held back to Joomla 1.0.x because Joom!Fish didn’t support the new version yet. Finally, in late February 2008, I discovered version 1.3.1 “TEST PR build 255″ of sh404SEF, which seemed to work well with Joom!Fish (currently 1.8.2) and Joomla (currently 1.0.15).
I made a patch to sh404SEF, one of the modules that extends the Joomla! content management system that runs this website. What the patch does is to ensure that all three of the languages supported on this website are treated equally in the URLs of this site. Without the patch, the “/en/” tag for URLs of English-language content would be missing in some cases. See my article “Default-language patch for sh404SEF published” for a description of the patch, and a link to the code.