I encountered a new blog from my i18n tribe today, Localization Best Practices. Their post, “Pidgins and Creoles” or “Why Machine-only Translation Will Always Fall Short”, caught my eye.  It is interesting, even if I don’t fully agree with them.

Jonathan writes that, at a recent conference on localisation:

…an audience member asked me about machine translation, and if it would ever completely take the place of human linguists in the industry. I answered “No,” although I did concede that machine translation is consistently making strides and does have a place in the localization community. He then mentioned that a scientific group in Europe recently had success with a robot performing a live human appendectomy. He believed that if something that delicate could be automated, what made something a “simple” as language beyond the scope of machines and artificial intelligence?  I thought about his question and then simply said, “Because there are no pidgins or creoles for appendectomies.”

(A pidgin, of course, is a blend of languages which humans from different language groups create for mutual communication. A creole is a pidgin which has matured and enriched into a stable language.)

…original pidgins were created throughout a 5-7 year period of cross-communication between two separate languages, with the transition to creole occurring at the start of the next generation. However, as communication tools increase in scope and availability, the rate of change for both has been cut by almost 75%. This means that completely new languages are being created every 5-6 years!… Machines, no matter how advanced, cannot keep up with the evolution of local languages.…

…Machines can never fully replace humans in language and translation. The complexities of verbal communication between people are too complicated, random and “human” for machines to completely grasp.…

My take?

Firstly, beware of saying “never” in forecasting technological progress. Many such forecasts have looked unimaginative a few decades later.

I think Jonathan might be blurring the issues of verbal and written communication here. I’d guess that most though not all translation is for written text (while interpretation is about verbal communication). He used as an example the word “fo-shizzle”, common amount his younger sister’s friends. That may not be the best example, because it seems a verbal code to reinforce in-group membership.  An example from written text might be better (“tweet”, “blog”).  That said, it’s certainly true that written communication is complicated, random, and “human”, just as verbal communication is.

I think there are three issues with machine translation adapting to pidgins, creoles, and language change: language fluency, motivation, and data access.

I’m cautious about saying that machines will “never” have the fluency to master any given pidgin or creole. In particular, statistical approaches to translation — what Google Translate is really good at — might eventually prove uncanny in their ability.  However, motivation (or its proxy, money) is a major constraint.  Machines may not get good at some creoles simply because those with the machines won’t find the effort lucrative enough. There will always, however, be humans fluent in a thriving pidgin or creole, of course; because it is a human activity.

Data access may prove to be the biggest rate limiter to machines gaining fluency in particular languages. A corpus of parallel texts in different languages is a prerequisite for statistical translation. Computers have an easy time extracting text from web documents; a harder time from printed documents; and a very hard time from verbal communication.

I’d expect to see slower progress in machine translation for pidgins and creoles which are verbal, off the net, and with thin wallets. But I expect that would be primarily a matter of motivation and data access, not the ability of the machine to grasp a pidgin per se.

All that said, I’m glad to have found another blog enriching the translation discipline. I wish them much success.