What “twanguage” do you “tweet”? Twitter, the buzzing conversation of brief web and SMS messsages, exploded into wide use in 2009. But just how wide? To how many countries has it spread? And into which languages? I’m aiming to find out.
I’ve started a project named “Twanguages”, a language census of a sample of Twitter’s global traffic. I’m curious: which are the top languages? Are #hashtags localised? How does language correlate with location? And which Unicode character is the most rarely used?
I’ll be presenting our results at the 33rd Internationalization and Unicode Conference (IUC33), held in San Jose, California, on October 14-16, 2009. I have a place cleared for a Twanguages project page, and I’ll post interim results there as they become available (right now it’s only a placeholder). Stay tuned!