10 Dec, 2006 in Chinese Studies . Tags: Chinese Studies;

Countless translation applications and services have been built during the years, based on various methods, but most of them are quite bad. According to NIST – Google Translate has been giving better translation results than most engines by using their unique translation by search scheme.

Here’s “Not Lost in Translation” (thanks, Omer) :

Last week, the National Institute of Standards and Technology (NIST) released the results of its yearly evaluation of computer algorithms that translate Arabic and Mandarin Chinese texts into English. Topping the charts was Google, whose translations in both languages received higher marks than 39 other entries. A machine-calculated metric called BLEU (BiLingual Evaluation Understudy) used scores from professional human translators to assign a single, final score between zero and one. The higher the score, the more the machine translation approximated a human effort.

“If you get a good score, you’re doing well,” says Peter Norvig, Google’s head of research. “If you get a bad score, then either you did poorly or you did something so novel that the translator didn’t see it.”

The Google team, led by Franz Och, designed an algorithm that first isolates short sequences of words in the text to be translated and then searches current translations to see how those word sequences have been translated before. The program looks for the most likely correct interpretation, regardless of syntax.

“We look for matches between texts and find several different translations,” Norvig says. “You take all these possibilities and ask, What is the most probable in terms of what’s been done in the past?”

By comparing the same document (a newspaper article, for example) in two languages, the software builds an active memory that correlates words and phrases. Google’s statistical approach, Norvig says, reflects an organic approach to language learning. Rather than checking every translated word against the rules and exceptions of the English language, the program begins with a blank slate and accumulates a more accurate view of the language as a whole. It “learns” the language as the language is used, not as the language is prescribed. (Google’s program is still in development, but other publicly available webpage translators use a similar method.)

“This is a more natural way to approach language,” Norvig says. “We’re not saying we don’t like rules, or there’s something wrong with them, but right now we don’t have the right data … We’re getting most of the benefit of having grammatical rules without actually formally naming them.”

View Comments so far | Have Your Say!

  1. Sidonie Billon - Gravatar

    Sidonie Billon  |  December 10th, 2006 at 6:48 pm #

    Honestly speaking, I’ve never tested “countless translation applications”, the only thing I do is using some free online translator, e.g. http://www.online-translator.com, http://www.systransoft.com, http://www.freetranslation.com, when I need to get translation from a language I’m not familiar with. Perhaps, many translation applications are really “quite bad”, and it happens that my favorite programs are below the mark, but they do work, and do help me a lot, and more than often the translation accuracy amount up to 80 %.

  2. fiLi - Gravatar

    fiLi  |  December 11th, 2006 at 8:32 am #

    Systran is actually the engine used by Altavista, which is one of the most popular around the web and that I use as well (due for my tendency towards Traditional Chinese, which Google Translate doesn’t fully support yet).

    You’re right, those are good tools. I think the thought behind google that search is a solution to many things in life, like translation, is a very interesting approach. It’s quite amazing to see that in most cases – they are able to demonstrate better results with their methods.

  3. Google Translation « I’ve Got A Fang Blog - Gravatar

    Google Translation « I’ve Got A Fang Blog  |  March 8th, 2007 at 7:18 pm #

    [...] I wanted to check this out further so I did more searching on the Web.  I found several anecdotal blog posts about how Google has been getting dramatically better at translating things over time, and then I found this post explaining how they have a whole team of researchers working on this are developing some cool new methods.  In short, they are doing complicated statistical analyses where they choose the best word in context based on other previous vetted translations. [...]

Leave a Feedback

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus