language
Archived Posts from this Category
Archived Posts from this Category
Posted by Jim DeLaHunt on 30 Jul 2009 | Tagged as: Unicode, language, meetings and conferences, multilingual, web technology
What “twanguage” do you “tweet”? Twitter, the buzzing conversation of brief web and SMS messsages, exploded into wide use in 2009. But just how wide? To how many countries has it spread? And into which languages? I’m aiming to find out.
I’ve started a project named “Twanguages”, a language census of a sample of Twitter’s global traffic. I’m curious: which are the top languages? Are #hashtags localised? How does language correlate with location? And which Unicode character is the most rarely used?
I’ll be presenting our results at the 33rd Internationalization and Unicode Conference (IUC33), held in San Jose, California, on October 14-16, 2009. I have a place cleared for a Twanguages project page, and I’ll post interim results there as they become available (right now it’s only a placeholder). Stay tuned!
Posted by Jim DeLaHunt on 22 Jun 2009 | Tagged as: culture, i18n, language
I encountered a new blog from my i18n tribe today, Localization Best Practices. Their post, “Pidgins and Creoles” or “Why Machine-only Translation Will Always Fall Short”, caught my eye. It is interesting, even if I don’t fully agree with them.
Jonathan writes that, at a recent conference on localisation:
…an audience member asked me about machine translation, and if it would ever completely take the place of human linguists in the industry. I answered “No,” although I did concede that machine translation is consistently making strides and does have a place in the localization community. He then mentioned that a scientific group in Europe recently had success with a robot performing a live human appendectomy. He believed that if something that delicate could be automated, what made something a “simple” as language beyond the scope of machines and artificial intelligence? I thought about his question and then simply said, “Because there are no pidgins or creoles for appendectomies.” Continue Reading »
Posted by Jim DeLaHunt on 19 Feb 2009 | Tagged as: British Columbia, language, multilingual
Do you know a blog which is by or for people in BC, and is in some language other than English? If so, submit it for the BC Polyglot Blog Directory!
I created this directory in honour of the 2009 Northern Voice conference, which starts tomorrow at UBC. I wanted to highlight all those minority-language bloggers in BC. In a little bit of searching I already have blogs in French, Traditional Chinese, and Japanese. I fully expect to find blogs in simplified Chinese and Punjabi as well. After all, 18% of people BC use a language other than English at home, according to Statistics Canada and the 2006 census.
I’ve created the directory on my site, at http://jdlh.com/en/pr/bc_polyglot_blogs.html. See the Rules and Q&A there for more information. You can submit listings for the directory by leaving a comment on this post, or by sending a message using that website’s Contact form for Jim DeLaHunt. Please supply the name of the blog, the URL, the language(s) in which it publishes, where the blog is located, and what geography it addresses.
I look forward to seeing this baby grow!
Posted by Jim DeLaHunt on 07 Sep 2008 | Tagged as: Vancouver, culture, i18n, language, meetings and conferences
I’m going to be a panelist at the Internet Marketing Conference Vancouver 2008, which runs from September 11-12, 2008. The panel is called “Writing for the Web“. It is full of experts on writing — and then there’s me. I’ll be approaching from the topic crosswise, talking about international and multilingual issues. In other words, how your writing is affected if it will be translated, or is part of a multinational project.
The panelists are an interesting bunch. I’m looking forward to meeting them. They are:
Posted by Jim DeLaHunt on 26 Aug 2008 | Tagged as: Unicode, i18n, language, multilingual, software engineering
Does anybody know of a simple script-detection algorithm (or heuristic) for font switching?
This came up with one of my clients. Suppose you have a guest book on your web site, and seven visitors left you the following inspiring messages:
(It looks like your visitors all read the Universal Declaration of Human Rights courtesy of the UDHR in Unicode project).
Now suppose you are so touched that you want to lay out all seven messages in a PDF file, and print it out as a booklet. You have a beautiful layout template, and various complementary fonts: Latin script, Japanese, Korean, simplified Chinese, Traditional Chinese, and Greek script.
Which font to you apply to each message? More importantly, is there a simple heuristic by which software can make the choice? (More after the jump.)
Posted by Jim DeLaHunt on 02 Apr 2008 | Tagged as: culture, i18n, language, software engineering
One of the answers I occasionally write at LinkedIn Answers seemed worth reposting here. The question was: “Do all languages in the world use Western numerals (1, 2, 3 etc) to express numerical values?“. My answer (slightly revised):
The simple answer to your question is, “No”. Or, “Yes”. It depends which exact question you are asking.
Is it the case that all languages in the world use only Western numerals (usually known as “Arabic” or “Hindu-Arabic numerals“, by the way) to express numerical values? No. Many languages use multiple number forms, depending on context. In the English language, for example, a numerical value could be expressed with words (”one”) in text, Hindu-Arabic numerals (”1″) in a technical context, or Roman numerals (”i”, “I”) in lists. Arabic, Hindi, Japanese, and Chinese all have native characters to express numerical values, which are used in some contexts.
Do all languages in the world use Western numerals sometimes, in some contexts, to express numerical values? Yes — mostly, probably. The qualifications are because I hate to make generalisations about human culture; it’s so diverse. And, note that languages without written forms probably don’t use Hindu-Arabic numerals at all.
Is it the case that Western numerals are — in all cultures, in all contexts — the idiomatic, preferred way to express numerical values? No. They aren’t even sufficient for all contexts in English (viz “one”, “i”).
Do all cultures which use Western numerals to express numerical values do so in the same way? No. In particular, the punctuation between the whole and the fractional part of a number, and the grouping of digits, differ by cultures. North America uses “1,234,567.89″; many European cultures use “1.234.567,89″; I’ve seen Japanese texts that say “123,4567.89″. See the CLDR number format patterns, creating international number formats in Excel, and the user guide to ICU formatting numbers.
Let’s shift focus from expressing numbers in cultures to implementing numbers in software products.
If you were making priority decisions for a software product (that’s my background) to expand its market internationally, and that product expresses numerical values using Hindu-Arabic numerals in some contexts appropriate in North America, can you be confident that it’s the only system you’ll need to express numerical values? No. You of course need to look at the cultural requirements of each new market as you go. But I’m confident that over time, some market will require some system other than Hindu-Arabic numerals to express numerical values. So I’m confident that sooner or later, you will have to give that software product the ability to express numerical values in a variety of ways (i.e., to internationalise it).
Postscript: the questioner, LinkedIn product manager Minna King, was kind enough to mark this as the “Best Answer” of the six posted.
[Edited for clarity based on reader feedback.]