Big Data Privacy and Why You Should be Concerned

Big data privacy is a topic that is becoming increasingly important as the methods we are using to accumulate big data are becoming more sophisticated. The term “big data” has now become a catch all term that refers to the proliferation of massive sophisticated data sets that we can now analyze using new tools like deep learning. Through analyzing big data, we gain insights that create efficiencies and help us become more effective at our jobs across all industries. The problem is that now we’re getting good at analyzing every single piece of publicly available data out there and some of the information is going to raise big data privacy concerns. We’ve also talked before about the fact that big data companies are literally watching your every move now.

Big data comes in two forms; structured big data and unstructured big data. Big data that comes from the Industrial Internet, for example, is highly structured and fully described. Big data that comes from the comments you leave online or the Facebook posts half the world generates every day are all examples of unstructured big data. Big data privacy is nonexistent for all these unstructured data sets.

Big Data Privacy

Analyzing these types of unstructured data can uncover a wealth of information that would probably concern most people. It’s not a coincidence that when you set your Facebook status to “I’m Engaged” you’re going to get bombarded with marriage related ads. That’s a very simple example that doesn’t even require artificial intelligence (AI). We decided to single out one cognitive computing / big data company that recently came out of stealth mode and take a look at the services they are offering in the context of big data privacy. The company is called Voyager Labs and they have developed “the world’s next-gen cognitive computing platform for understanding human behavior“.

Click for company websiteFounded in 2012, Israeli startup Voyager has taken in $100 million to develop two tools that are out there scouring unstructured data and making meaning of it. Remember how we described building our ultimate AI-powered ETF by feeding it tons of financial big data? That’s kind of what’s happening here. Essentially every place you go on the internet and read content without logging in, a computer can do the exact same thing. We used to call that “screen scraping” and it’s what developers did back in the day when they wanted to use a publicly available data source without licensing it. Screen scraping isn’t new, but using artificial intelligence to make sense of “unstructured data” is what Voyager is doing that’s new and exciting. You know all those public Facebook posts, forum comments, trip reviews, and all the other things you’ve ever left online? Those are all examples of the unstructured data that can be used to make decisions. Let’s take a closer at the two tools developed by Voyager Labs.

Voyager Analytics

Their first product offering is called Voyager Analytics and below you can see some common use cases for it:

big data privacy concern 1
Source: Voyager Labs Website

If we think through what type of applications a cognitive computing platform might be used for, the above use cases seem to make sense. Fraud, for example, can be spotted by irregular purchases that only AI would be smart enough to identify through very obscure data relationships. Organized retail crime would be uncovered by patterns that could very easily spot data anomalies leading to certain “leakage” like employee theft. Criminal investigations are also a no brainer for this type of tech, but what about private investigations? Would you call background checks performed by HR a form of private investigation? Probably. Now, every single bit of data you’ve put out there that isn’t protected by a password is subject to being evaluated by machine learning algorithms that will ascribe meaning and significance to it that could be used for your next hiring decision. If you weren’t already aware, this sort of screening is already commonplace these days except now we can expect it to become much more efficient and far-reaching.

Voyager Scorpio

What we found to be even more interesting is this next tool which is used to evaluate loan applicants. Anyone who has applied for a loan knows that it’s quite easy to fudge certain aspects of your life to appear like you’re more qualified for a loan than you actually are. You can continue to say whatever you like, but a lender that uses the Voyager Scorpio cognitive computing platform is going to have AI algorithms assessing your suitability for a loan in real-time as seen below:

big data privacy concern 2
Source: Voyager Labs Website

It’s going to be real interesting when we see how machines decide who to give loans to. Do you think that the income of the county you live in might be a good predictor of your likeliness to pay back a loan? That’s probably one of 50 different predictive variables that the system will use to see as a predictor of a failing loan. If the day before you walked into the loan office, your employer announced layoffs, that’s going against your favor.

Anyone whose name has appeared in the press is probably someone of importance (and likely to pay back their loan), but only those who appear in the business section. Those names that appear in the criminal section or civil lawsuit section are going to have their applications thrown straight into the circular filing cabinet. It’s only going to be a matter of time before someone accuses the company of discrimination at which point the company will just blame the AI algorithms and get on with business. No mention will be made by any lender that they use this type of technology unless criticisms like this are raised. If Voyager Scorpio significantly reduces nonperforming loans, it’s going to be used whether people like it or not.

Voyager Labs News

Voyager came out of stealth just a few days ago so they aren’t saying much about who they are working with or what applications they are focusing on. The two diagrams seen above are probably the most informative pieces of information on their website. However, poking around in the “Careers” section of their website shows that they are focused on selling into EMEA and need someone who can “travel a lot” with a European passport to start knocking on doors. Mention is also made of a “cloud-based RESTful service using the company’s infrastructure” which implies that they are not outsourcing their cloud infrastructure to any of the big providers. That’s most likely because of security reasons which is probably why they’re also looking to hire a security architect.

Big Data Privacy and The Government

Stanford Law Review published a paper on big data privacy back in 2012 called “Privacy in the Age of Big Data” which warned that the government needed to step in and address a potential problem. In May 2014, the White House published an 85-page report titled “Big Data: Seizing Opportunities, Preserving Values”. In February 2015, the White House released an interim progress report on its big data privacy initiative. The report proposed 6 key recommendations which are seen below:


While a great deal of information you put out into the wide world of social media and the internet probably can’t be traced to you, a lot of it can. Voyager isn’t the only company looking at everything you say online. Hundreds of AI startups are probably moving in this direction with each specializing in a particular area or application like “loan risk modeling”.


People are going to become a lot more shy about expressing their opinions online when they know that everything they say will soon be scrutinized with a fine-tooth comb. You should be as equally concerned about what other people are saying about you as well. We highlighted some social media listening tools that are used to monitor what is said about you or your company. Tomorrow’s aspiring politicians are most certainly using social media today, and everything out there is now fair game forever. Big data privacy does not exist for anyone. You’ve been warned.


Leave a Reply

Your email address will not be published.