Crowdsourced Big Data is Big Business for Appen

The term “metadata” refers to data that describes data, and it comes in quite handy when you’re training machine learning algorithms. Take for example a picture of a table with a plate of eggs, bacon, and toast along with a couple of hands holding utensils. We can easily label the individual objects – knife, fork, left hand, right hand, glass of orange juice, etc. – but we can also deduce more information from the more subtle details. The bright red nail polish, the wedding ring on one hand, smooth youthful skin, these signs could point to a younger married woman sitting down for breakfast.

A human might be able to arrive at this conclusion quite easily given the ability to connect the dots using context, but an AI algorithm will have a tougher time. However, if we label all those “clues,” then the machine learning algorithms can begin to learn how to interpret imagery in the same way humans can. One company that’s been printing cash lately by providing crowdsourced big data labeling services is a publicly traded Australian firm called Appen (APX). (Below numbers in USD unless otherwise noted.)

About Appen

Click for company websiteAppen’s history extends back to 1996, but it was only in 2013 that they settled on the name Appen and then completed a successful IPO in January of 2015. Since then, Appen has worked to become a “global leader in the development of high quality, human annotated datasets for machine learning and artificial intelligence.” We recently wrote about their remote workforce of over one million employees who help cleanse and label datasets that include speech, text, image, and video. The data is then used for a variety of use cases as seen below.

Use cases for Appen's services
Source: Appen

In a recent investor deck, Appen cites an interesting metric from McKinsey – one quarter of AI applications require data updates on a weekly basis while one third required monthly data updates. Those frequent updates represent recurring revenues for Appen, a company that’s showed some phenomenal revenue growth as a result of strong demand for their services. The company’s growth has been reflected in their share price which has risen from around $0.40 per share in January of 2015 to around $17.60 per share today giving the company a present-day market cap of around $2 billion representing a gain of about +4,300%.  In other words, an investment of $1,000 would be worth $43,000 today. Of course, past performance is not an indicator of future performance, but revenue growth has been consistently strong over the past five years.

Appen's revenue growth
Appen’s revenue growth in Australian dollars – Source: Appen

The reddish shades reflect their “content relevance” services and the blue shades represent their “language translation” services. Of these two business areas, content relevance generates the majority of the revenues at the moment – about 86%.

Content Relevance

The power of Appen’s service comes from the one million remote workers they can pull from to crowdsource projects based on each client’s unique needs. Appen can curate a crowd based on a specific skill set, including legal, medical or financial services experience. Once a virtual team has been assembled, they can then be directed to perform any number of tasks that help improve the functionality of machine learning algorithms. For example, Google provides certain “automatic suggestions” when you type in particular terms. These are often indicative of what other people search for. Here’s an example using Tiger Woods.

As you can see, it’s pretty easy to understand how something inappropriate might appear there like “why does Tiger Woods get to sleep with so many hot women.” Google might use Appen’s services to check these auto-suggest lists to make sure nobody gets offended ever. That’s one example of where humans might be able to make a judgment call that an algorithm won’t be able to do anytime soon.

There’s a whole menu of content relevance services that Appen offers as seen below.

When it comes to training machine learning algorithms, it’s not just about making sure the training data is clearly annotated, it’s also about making sure that the output – such as the auto-suggest example we gave earlier – is accurate. If it’s not, the algorithm can be trained to identify these exceptions and become further refined over time.

Language Translation

Appen recently merged their sales and customer service teams for both their “language translation” and “content relevance” segments in order to keep the customer at the center of their focus, but they plan to keep reporting on both segments which could imply future acquisitions in the area of language translation. While revenues from this segment may be a small piece of the pie, the company has extensive capabilities with more than 400,000 pre-screened language speakers that can support more than 180 languages. In 2016 alone, they transcribed more than 40 million audio files.

The way Appen uses their giant pool of language speakers extends well beyond basic translation services. For example, they offer a service called “localization” which means that if you develop an app in English, they’ll help you publish the same app in any one of the 180 languages they support. (It’s not as easy as simply translating your application text – that’s where Engrish comes from.) It’s quite interesting to look through their Language Resources Catalog which provides some pretty obscure datasets with prices to match. Take for example the below audio dataset in which 103 Italians utter things like digits, street names, and generic commands while sitting in a running car or traveling in a car at a particular speed.

An Italian automaker might find this dataset useful for training an algorithm which aims to provide hands-free voice-operated services to Italian drivers. Other datasets include people talking with various background noises, people conversing in meetings where oftentimes participants will talk over each other, messages left on voicemails, distinguishing surnames from words, and the list goes on. If you need a particular dataset in any language, they have the manpower to get it done in a short period of time. Imagine trying to do that work yourself.


Appen’s plans for growth this year include spending $6 million in building a global engineering team and expanding their operations in China. Eight out of the top-ten global technology companies use Appen’s services, but their success may be flying under the radar given that they’re traded on the Australian Securities eXchange (ASX). Most retail investors outside of Australia are probably unaware of their success story, which could imply that it’s undervalued. However, we’re also at the peak of the AI hype cycle which means they could also be overvalued.

The best way to invest in these kinds of stocks is to enter into a position with conviction. That means you enter into a position for the long-term and ignore the stock price except to add to your position on dips. In order to establish your initial position, you Dollar-Cost-Average (DCA) which means you purchase a fractional amount of your overall position at fixed intervals over a set period of time. This obviously becomes easier for investors who have bigger portfolios since making multiple purchases will incur fixed transaction costs. The larger your purchases, the less those transaction costs will impact your returns.

We’re holding five AI stocks in our tech stock portfolio. Is Appen one of them? Find out in the “Nanalyze Disruptive Tech Portfolio Report,” which contains a complete list of disruptive tech stocks and ETFs we’re holding – now available for all Nanalyze Premium annual subscribers.

Leave a Reply

Your email address will not be published.