Investing in Natural Language Processing

I’m sorry Dave. I’m afraid I can’t do that. Those two sentences are perhaps the most widely-known example of natural language processing before it was even a thing. The movie was 2001: A Space Odyssey, and the year was 1968. Since then, millions of programmers have used the words of HAL 9000 in their code to send cheeky messages to users who probably wouldn’t get it. Today, we want to talk about how humans have been progressing in all aspects of natural language processing thanks to developments in artificial intelligence.

What is Natural Language Processing or NLP?

Human language is used to communicate in spoken, written, and non-verbal form. We think nothing about speech because we use it so seamlessly. Western people who visit China for the first time will realize how helpless you are if you can’t communicate with people. That’s where you’ll learn how to mime your way through situations using non-verbal language and lots of grunting. When it comes to reading signs, they now have augmented reality apps that will translate Chinese characters on the go. So, we can define natural language processing (NLP) as the ability to take human communication in any form and understand it. That’s why you’ll also hear people talk about natural language understanding.

What is Natural Language Understanding?

Natural language understanding isn’t about a computer taking verbal language and properly translating it to written language. If it were that simple, we wouldn’t have Engrish. We previously wrote a piece titled “When Will AI Make Engrish a Thing of the Past?” If you’ve spent time in Asia, you’ll know about Engrish. What usually happens is that someone decides to use Google translate to quickly translate a bunch of Chinese written language into English and hilarity ensures. Some of this stuff is epic level.

Number One Engrish from Engrish.com

If you have spent any time in Asia, you’ll appreciate the anthropomorphism on display here (so common in Asia) along with the totally innocent reference to sucking the tapioca balls out of the milk tea. 

We don’t mean to poke fun at people who can’t speak English, in fact, we find it extremely endearing. Sometimes, the Chinese actually translate literally. Did you know that a raccoon is called a water bear in Chinese? That’s the literal translation. The owl becomes a “cat head eagle,” a kangaroo becomes a “bag rat,” and the delicious egg-shaped kiwifruit becomes a “goose berry.” Now you can see how literal translations can be problematic. A smart algorithm would know to translate the Chinese word for raccoon into the English word for raccoon.

You also have words that exist in one language but don’t exist in others. We’ve all heard adages that Eskimos have 100 words for snow and the Italians have 100 words for love. (They say the Italians invented making love, but that’s not true. The Greeks did. The Italians were the first to try it with their women. Ba dum tsss.) But what about words that don’t exist in another language? For example, in Japanese there is a special word to describe the way light appears through the trees in a forest. Think about how tough it would be for a human to translate that word appropriately, much less a computer. It’s going to be a while before NLP algorithms are capable of spotting fake news.

NLP for Sentiment

Sometimes, you don’t need perfect accuracy to utilize natural language understanding to gain insights. For example, all that inane drivel that gets posted on social media is usually always being monitored by company representatives. (Armchair Twitter CEOs use this to try and strongarm companies and make them kowtow to every little thing they find offensive. Unfortunately, none of these people have ever stepped outside of their personal echo chambers.) Another useful function for natural language understanding is to screen product reviews to see what people are saying about your products, both the good and the bad.

NLP for Compliance

NLP is also becoming good at detecting shady stuff. A startup called Digital Reasoning built a product called Synthesys which is exceptionally good at understanding what people are actually saying when they use analogies. If you tell your buddies in sales that you “received some snow last night,” you might be in the HR office before you’ve even had a chance to gork a big fat one off the breakroom counter. You can imagine how useful this might be to a compliance department that’s trying to monitor insider trading.

NLP for Stock Indices

Companies that create stock indices often need to allocate stocks into buckets such as industries or sectors. The way to assign companies in this manner usually involves some classification standard such as the aptly titled “Global Industry Classification Standard” or GICS. Using NLP, analysts can now have this process automated according to what the company actually does, not what they say they do. One company called Kensho Technologies is using natural language processing to scour millions of pages of regulatory filings along with other unstructured big data sets to compile “new economy indices.”

Source: Kensho Index Methodology

(Kensho was acquired by S&P Global in 2018.)

Scraping data to gain insights has applications across the entire fiance industry. Alphasense scours the world’s financial data and then let you ask questions about it using natural language. CB Insights uses NLP algorithms to collect information about startups for their database.

NLP for Transcribing Speech

Natural language understanding is when computers can understand anyone, in any language. We’re clearly not there yet, but we’re getting a whole lot closer thanks to the power of artificial intelligence. One common use case for NLP is transcribing speech in places like courtrooms and boardrooms.

Accurately transcribing voice to text depends on accurately identifying context. For example, we pronounce “blue” and “blew” exactly the same way, but NLP algorithms need to differentiate these words based on their context. Microsoft’s machine learning algorithms reached the level of human accuracy (equal to a 5.1% error rate) in 2017, even though their voice recognition research has been going on since the ‘90s. At least 13 startups are working on how to transcribe voice to text.

What is Natural Language Generation?

Once we’ve mastered natural language understanding, we can master natural language generation (NLG). In simple terms, NLP algorithms analyze language to extract data, while NLG algorithms do the opposite by analyzing data and turning it into language. We can train machine learning algorithms to do this by having humans-in-the-loop to correct them when they make a mistake. Eventually though, the algorithms need to venture out on their own. There’s a funny example of a Microsoft’s chatbot, Tay, started posting offensive content on Twitter because users egged it on. Everyone got all offended but we found it kind of hilarious to imagine how panicked Microsoft’s Head of Communications was when they saw Tay start denying the Holocaust.

Credit: CBS News

Another win for Godwin’s Law, an Internet adage which says that “as an online discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches 1.”

As you can see, we’re a long way from letting computers loose to say whatever they want to humans, something we highlighted in our piece on The Myth of the Clever AI Chatbot back in 2016. Some companies like Narrative Science “transform data into stories” which involves taking structured data and telling a story around it. You can see how this might be used to report earnings for a stock. (Ever see these pieces? You can usually tell that they’re auto-generated because they’re dreadfully boring.) Another application might be reporting scores for sports game and mentioning highlights.

Fast forward to today and chatbot technology has progressed quite a bit. Let’s talk about chatbots for a bit.

What Are Chatbots?

Simply put, a chatbot is messaging interface you communicate with where the person on the other side is actually a machine learning algorithm. Ideally, the user wouldn’t even know they were chatting with an algorithm and they could seamlessly be passed to a human in case the algorithm can’t handle the request. Think about this in the context of a customer support chat. Some companies we spoke with actually insert manual pauses into their chatbots to make them sound more natural. Today, the first point of contact for many websites are intelligent chatbots which can solicit and qualify leads. New Zealand startup Uneeq is taking chatbots a step further and creating digital humans.

Credit: Uneeq

The digital human seen above speaks to you, so it’s a bit different from a chatbot. The concept is sound, but we still found natural speech to be a bit awkward. What companies like Uneeq are working on is equipping these digital humans with some sentiment analysis.

What we’re finding is that chatbot applications are better suited to industry verticals. That’s because many companies within a particular industry use lots of language that’s specific to what they do. For example, Kasisto has developed a conversational AI platform for banking which can resolve queries 82% of the time without needing a human to be involved. Other industry specific use cases are emerging around “AI assistants.”

The AI Assistant

In every industry, people sell things. Companies are always looking to optimize the sales process as this translates into increased revenues. We’ve looked at quite a few startups applying artificial intelligence to various customer relationship management scenarios, most of which involved some form of speech recognition. Berlin startup i2x uses NLP to analyze calls in real-time or in archive form (many call centers have mountains of call recordings that sit there gathering dust). Two functions where this can be particularly useful are customer service and sales.

Not everyone can afford a personal assistant, so now a handful of startups are working on AI personal assistants that use NLP to:

  • Coordinate appointments
  • Take meeting notes
  • Manage email
  • Organize files
  • Produce reports

There are also industry-specific AI assistants being developed. At least nine startups are working on virtual assistants for the healthcare industry. In the legal industry, a number of startups are working on using NLP to scour through contracts. The smart voice assistant platform from SoundHound still stops well short of general artificial intelligence, when machines will be able to understand everything—even women—a moment in time referred to as the Singularity.

Machine Learning and Language Translation

A Translator for All Languages – Does One Exist? We asked that question back in 2017 when Doppler Labs was working on their babelfish device. That same year, Doppler Labs couldn’t raise funding and went kaput. So, how are we coming along with a translation tool of any kind that can translate foreign language, either written or spoken? At least seven startups are using machine learning to improve language translations.

Investing in Chatbots and NLP

It’s important to understand that chatbots are implemented by 100s of companies, some which are startups that specialize in chatbot applications for particular industries, like chatbots for lawyers. Others are targeting certain languages. For example, almost half the AI startups in Indonesia we came across were developing chatbots for Bahasa Indonesian. Another group of startups are focused on making chatbots easy for anyone to implement and use. Even the name is changing. We now refer to chatbots as “conversational AI for enterprises.”

Institutional investors are spoiled for choice when it comes to investing in NLP startups across the globe. For retail investors, there aren’t many pure-play stocks around. One company to watch is Nuance. They’ve come a long way since we reviewed their product – the world’s best transcription software – back in 2016. Today, Nuance has reinvented themselves as “an AI technology company,” targeting the healthcare industry with solutions such as Computer-Assisted Physician Documentation (CAPD), a $2 billion opportunity. The basic idea is that NLP algorithms will transcribe your doctor visit and automatically populate your electronic health record (EHR). One of our favorite dividend growth investing stocks, 3M, is also dabbling in this space. So is Augmedix which is slated to trade on the over-the-counter market.

Another pure-play stock for the NLP theme is China’s CooTek (CTK), a company that was founded in 2008 by three former Microsoft employees. CooTek has developed a smartphone app called TouchPal which autocorrects typos and predicts your next word with higher accuracy than that drivel that Android’s type predictor spits out. TouchPal is actively used by over 132 million users every day and supports over 110 languages in more than 240 countries and regions worldwide.

Pure-play disruptive tech stocks are not only hard to find, but investing in them is risky business. That's why we created “The Nanalyze Disruptive Tech Portfolio Report,” which lists 20 disruptive tech stocks we love so much we’ve invested in them ourselves. Find out which tech stocks we love, like, and avoid in this special report, now available for all Nanalyze Premium annual subscribers.