7 Machine Language Translator Startups
Artificial intelligence is now so smart that it can master the world’s most complicated strategy game, Go, without being given one single instruction or preset rule. The AI algorithms just figured it out through trial and error until they were the best Go players on the entire planet in a matter of just days. Yet it’s these same great algorithms that cannot understand what the 1.37 billion Chinese on this planet are saying. That’s because the Chinese use 1,000s of characters to communicate their thoughts in a manner that is so complex that even the world’s best translation algorithms produce nothing but Engrish. That also happens to be the case for all languages of developed Asian countries. While we enjoy some good Engrish just as much as any other traveler, we still find it mind blowing that computers can’t understand over 1/5 of the people on the planet. That’s why when we saw a recent news article that claimed Microsoft has achieved human parity translating Chinese articles into English, we decided to take a closer look at the language translation space.
Translation sites and blogs seem to agree that the hardest languages to translate to English are Chinese, Korean, Japanese, and Arabic. This is because of differences in syntactic structure, implicit elements in sentences, and unique idioms, not to mention the characters. Here is a simple “hello” in the four hardest languages to translate:
Most people probably can’t even tell which is which – and that’s only for one word which happens to be one of the most commonly used across any language. Google Translate has upgraded its approach, but still produces rubbish according to a simple test of translating a sentence once, then translating it back to the original language to see if the meaning has changed. Voice-over translation of simple sentences might be enough for tourists, but complex web and app localizations and other professional documents need larger, flexible models to achieve a useful level of accuracy. How long until we can translate Chinese newspapers, for example?
So how do machine translators (MTs) work? Statistical MTs were invented in the 1990s and work with two separate databases (or models); a translation database and a language database. The translation database contains bilingual phrases with probability scores attached; the better the translation, the higher the probability of the MT picking the phrase. The language database is a collection of monolingual texts that ensures the output will be correct grammatically and stylistically. Training such a system is a resource-intensive process. You need to add large volumes of good quality translations to your phrase table and re-evaluate the probability scores.
Then there came adaptive MTs which learn dynamically. Whenever an output is corrected, that single correction is saved in the phrase library and given a better score than the original output making the process more efficient. Finally, we have neural MTs that use neural network models to predict the likelihood of a sequence of words and model entire sentences instead of looking at phrases, then looking at the way phrases connect to each other. Recent research output has grown exponentially in this field, which of course has led to an emergence of startups trying to tackle the translation problems. Here are 7 such startups trying to tackle the more difficult languages in the world.
Founded in 2013, Portugese startup Unbabel has received $31 million in funding from the likes of Y Combinator and Google Ventures to build a platform that combines adaptive machine translation with human editing. Text is translated automatically, then broken up to smaller segments and sent to human editors. Corrections to the draft become inputs for their “proprietary machine learning algorithm”, then the full text is reassembled and proofread one final time.
Unbabel employs a community of 50,000 freelance editors and claims that translations are performed in under an hour, with the most popular language pairs taking only 10 minutes. Their service currently covers 28 languages and has native integration with Salesforce, Zendesk, and Freshdesk. Unbabel also offers an Application Programming Interface (API) that clients can use to integrate Unbabel technology into their systems. It’s quicker and cheaper than human translation and offers freelance employment to the masses, that is until the AI learns to walk alone.
Update 09/23/19: Unbabel has raised $60 million in Series C funding to “consolidate its footprint” across its existing markets in Europe and the U.S.and expand into Asia. This brings the company’s total funding to $91.2 million to date.
Founded in 2015 by ex-Google Translate employees, Silicon Valley startup Lilt has raised $3 million to develop a machine-assisted translation system for language translators. Based on adaptive MT technology, it is a predictive typing tool that proposes better suggestions with each correction.
Lilt claims their ‘neural feedback loop’ makes human translators 3-5 times faster and machine translation more accurate. The tool comes equipped with proofreading features and a terminology database that allows translators to concentrate on the text. An API is available for easy integration.
In April 2017, SDL (LON:SDL), a $490 million provider of language solutions based out of the UK, filed a lawsuit against Lilt for infringement on three of its patents associated with machine translation. Just before the filing, Lilt ran an aggressive marketing campaign comparing its service favorably against competitors, including SDL, also claiming to be the first provider of adaptive machine translation. The dispute was settled in November 2017 “to the mutual satisfaction of both parties” according to the press release, but settlement terms remain confidential. Back on track now, it is only a matter of time before AI from one of these companies ends up autocorrecting everything automatically, and the humans can “move on to more value-added activities”.
Founded in 2012, Korean startup Flitto has raised $2.2 million to establish a translation crowdsourcing platform. Flitto is a community hub where members translate each other’s requests for points. When you request a translation, anyone can have a go at it. You choose the one which you like best and the translator receives points which he or she can spend on translation requests or take out as cash.
This doesn’t have anything to do with machine translation so far, however, Flitto earns 80% of its revenue from selling collected language data called “corpus” to customers like Baidu, Microsoft, Tencent, and NTT DoCoMo. Used to train proprietary MT algorithms, the data contains slang, pop culture references, and dialects that would be hard to obtain elsewhere.
Selling corpus data seems to be the new way to make money in the “translation services” market. Conyac, the Japanese crowdsourcing translation startup acquired by Rozetta (TYO:6182) in 2016, is also selling corpus alongside speech recognition and chatbot data. Rozetta had an IPO on the Tokyo Stock Exchange in November 2015, declaring a 10-year goal to create a fully automated translation device by 2025 in order to “liberate Japan from the burden of its linguistic handicap”.
Founded in 2010, Silicon Valley startup Dakwak has raised $600,000 to develop a “machine learning translation as-a-service platform”. Their technology is built on a mix of machine learning, crowdsourced translation, and professional translation that can be applied to any website instantaneously, then edited manually. Foreign language versions of websites are stored on Dakwak servers, and visible to search engines right away.
Dakwak’s value proposition is simple: there are more than 1.5 billion internet users who don’t speak English and use search engines in their native language. Clients of Dakwak can access this market for $1,000 a year which gets their website localized in 9 chosen languages or can go for a custom package for up to 58 languages.
Founded in 2009, German startup DeepL (formerly known as Linguee) has raised undisclosed funding to develop a neural machine translator to rival Google and Microsoft. Led by former Google researcher Gereon Frahling, DeepL claims they are running the 23rd largest supercomputer in the world that is able to translate 1 million words per second. The neural network is trained by Linguee, their bilingual dictionary and first product. According to DeepL studies, translation results rank above competitors’ by a 3:1 ratio in blind tests. DeepL Translator was launched in August 2017 with 7 languages and is currently being trained to use Mandarin, Japanese and Russian. Maybe they’re buying some of that delicious big data they need to feed their AI algorithms from startups like the aforementioned Flitto or Conyac.
Founded in 2011, Irish startup KantanMT has raised an undisclosed amount of funding to develop a subscription-based customizable machine translation engine. Clients can train their own statistical MT engine and integrate it into their systems. Translators using the service reported a 60% increase in productivity on average. KantanMT provides supplemental modules to analyze and plan translation projects or measure the accuracy of the MT. The company is the member of a consortium of MT vendors that was awarded $2.3 million last year to develop the next generation Machine Translation platform for the European Commission. All you Irish-Americans out there in New Yawk will be pleased to hear that it even speaks Irish.
Founded in 2012, Chinese startup UTH International has received between $6-$8 million funding to develop a full suite of multilingual information processing solutions from what they claim to be “the world’s largest accessible translation memory corpus“. (Corpus is that morbid sounding name that refers to translation data.) Besides an MT system and language database, their offering includes an e-commerce solution, a search platform, a software localization toolkit, a translation management tool, a customer service system, and a publishing system. UTH’s latest round of investment was led by Sogou, one of China’s top search engines eyeing an IPO in the US. Browsing around the website, we’re not sure if they have been eating their own dog food with sentences like this one that has a hint of Engrish lurking around:
“This allows us to continuously improve the intelligent translation quality and assist foreign-affiliated lawyers to gradually eliminate their dependence on the human translation, with its high costs to realize the accessible conversion of multiple languages“
From now on it should be mandatory that every foreign translation firm out there has to translate their own webpage copy as proof of how good their platform works.
What’s more incredible than the fact that Microsoft is only now starting to understand the Chinese, is that a translation company like Flitto makes their bread and butter selling data, not providing data translation services. Essentially translation has become a commodity now, and it will only get cheaper as the machine learning algorithms get better over time. This may be the only instance where the training data can solely be produced by humans, so the algorithms will only progress as fast as thousands of little human hands can type. We have quite a ways to go before Engrish becomes a thing of the past, and all you translators out there will need to “move on to perform more value-added activities”.
If you enjoyed this article, then sign up for our free newsletter - Nanalyze Weekly. About every week, we'll send you a simple summary of all our new articles. If you didn't enjoy this article, share it on Twitter and tell everyone how much you hated it.