Data Entry Automation With Machine Learning

In our recent article on Indonesia’s Big “Big Data” Problem, we looked at how technologies like blockchain and artificial intelligence (AI) are being used to uncover new sets of data that corporations can use to better understand the world’s fourth biggest country by population. A consistent theme throughout the time we spent talking to local tech firms was that great potential was simply waiting to be unlocked, and that probably holds true at a smaller scale for many developed market corporations.

For example, think about something like “data entry.” The mere fact that you require a human to take data from <FORM A> and then manually input that data into <FORM B> means that your archaic business processes need changing. This is the equivalent of those backwards companies that ask you to fax them a form in order to request a change of address for your account. How common a problem is it? Too common.

According to the Bureau of Labor Statistics, there are 180,100 people employed across the United States as “Data Entry Keyers” with an average salary of $32,500 per year. That equates to about $5.8 billion being spent a year on something that adds very little value. We haven’t even considered that humans will fat-finger data which will later need to be investigated and rectified. And as it turns out, our back-of-the-napkin estimates are just a fraction of what’s actually being spent. “About $57 billion is spent on data entry per year and the volume of data entry is increasing year over year,” that’s according to the firm we’re going to talk about next, HyperScience.


Click for company websiteFounded in 2014, New Yawk startup HyperScience has taken in nearly $49 million in funding so far with a large chunk of that funding coming in the form of a $30 million Series B round that closed just days ago. The company was founded by a couple of lads from Bulgaria (Vladimir Tzankov and Krasimir Marinov) who previously worked at SoundCloud in Sofia, and that appears to be where they met the company’s CEO, whose last gig was as a Director for SoundCloud. (For all our lovely American readers, Bulgaria is not a city in Germany. It’s a lovely little country, and the only one in Europe that hasn’t changed its name since it was first established. They also eat ketchup and mayo on pizza, but nobody’s perfect.)  

Update 10/19/2020: HyperScience has raised $80 million in Series D funding to accelerate product development and international expansion. This brings the company’s total funding to $188.9 million to date. 

Data Entry Automation with Machine Learning

We first came across this startup in our article on Artificial Intelligence in Business Process Automation where we talked about how the startup uses “advanced computer vision techniques to process documents, identify content types, and extract the content.” It’s hard to believe that we’re able to create synthetic life now, but at the same time our need for manual data entry is expanding, so we wanted to understand a bit more about why that is. HyperScience says that this is due to several kinds of documents that large organizations receive in significant volume:

  1. Documents that are filled out either by hand or typed (such as mortgage applications or insurance claim forms)
  2. Documents that are generated on a computer and either printed on paper or emailed as fully digital PDFs (such as invoices or pay stubs)

In both of these cases, data is not being entered into digital forms which is a problem itself that needs to be solved. Leaving that aside for a minute, the problem we’re trying to solve here is that neither type of form is machine-readable because up until now, machines have lacked the ability to understand context. The below example illustrates this problem quite simply:

Example of HyperScience context recognition
Source: HyperScience

While it’s easy to understand how handwriting can be difficult to decipher, there are also problems with processing paper forms even if they are typed. If your company is being sent invoices by 500 different suppliers, the combinations and permutations of field names pose a challenge for a machine that’s trying to recognize them while a human would do that without thinking. Take a look at the below example where three invoices contain the exact same type of data, it’s just all named differently.

Invoice processing problems
Source: HyperScience

Large companies can spend tens of millions a year on data entry, and HyperScience claims to offer an 80% reduction in cost, a 5x increase in speed, and 67% reduction in error rate. While before the company had a number of product offerings, they’ve consolidated them all into a single product offering called Hyperscience. That’s according to a recent article by TechCrunch which also states that the product is sold on a “per-document” charge. The process works like this:

  • Mail rooms receive high volumes of incoming forms, which are sorted and scanned to create digital images of the forms
  • HyperScience automatically identifies and understands each type of incoming form, and sorts them accordingly before processing
  • HyperScience reduces the keying workload and increases throughput. Teams review a subset of machine-entered fields instead of all fields (human-in-the-loop)
  • HyperScience understands that the same fields can be called different things or submitted in different formats, and conforms the data to your database conventions

Let’s look at some real-world success stories.

Hyperscience Success Stories

Whenever we talk to AI companies about their product offerings, we’re always on the lookout for accomplishments that seem like they would be impossible without a little bit of help from some artificial intelligence algorithms. That certainly seems like the case with our first example, which involves a financial services firm with 10,000 employees and over $1 trillion in assets under management. Someone from compliance demanded that they “unlock data from over 1 million unlabeled files.” Some files had relevant information, others didn’t. Many forms were handwritten, others were faxed. The purpose of the exercise was to avoid “costly regulatory fines and reputational risk” (though we all know it was just some compliance person being a pain in the a33 and trying to justify their existence like they always do).

The firm’s existing Optical Character Recognition (OCR) teams couldn’t even attempt to tackle the problem, so the plan was to hire and train a team for the sole purpose of doing truckloads of manual data entry in a short period of time. In less than four weeks, the hungry machine learning algorithms were happily munching away on the data and 60 temporary workers were told to look elsewhere for work:

Data entry automation with machine learning
Data entry automation with machine learning – Source: HyperScience

This next data entry automation example is even better and involves an investment services firm with 30,000 employees and $10 billion in revenues. In one department, more than 200 people performed data keying, processing more than 60 million pages of documents per year. The high error rate led to “significant re-keying efforts, slowing customer response times, skyrocketing costs, and disrupted customer experience.” They couldn’t hire people fast enough to keep up with increased document volumes (probably because they were recruiting a bunch of recipe-driven robots from some “emerging market center of excellence” who – to be fair – weren’t looking at data keying as their ideal career aspiration). Because the costs were so high and the process so manual, the over-worked data-keyers were only instructed to extract the bare minimum data from documents, leading to an incomplete data picture. Again, the machine learning algorithms were unleashed and – as they say at BuzzFeed – you won’t believe what happens next:

Data entry automation with machine learning
Data entry automation with machine learning – Source: HyperScience


Just today, Gartner released the results of a survey that says 37% of organizations have implemented AI in some way, a number that was just 13% four years ago. It seems like we just started discussing the usefulness of chatbots and now 52% of telco organizations deploy them to service customers. In the medical profession, 38% of healthcare providers now rely on computer-assisted diagnostics. If you are still using paper forms, you’re in the dark ages. At least consider adopting some data entry automation and freeing the poor souls who have to suffer through transcribing paper forms into a digital system, a job which is about as rewarding as it sounds. Make yourself feel good when you hand out those pink slips by paying for them to get certified in artificial intelligence and then hire them back later. Everyone wins.


Leave a Reply

Your email address will not be published.

  1. Really like these new tips, which I haven’t heard of before, like the SUBHEAD. Can’t wait to implement some of these as soon as possible.

    1. We can’t really tell is this is a legitimate comment or not but glad you found the content useful mate.