Big Data: Big Brother or Bigger Than Any of Us?

January 3. 2017. 5 mins read

You know that brother-in-law you have—the crazy one who talks about alien abductions and government conspiracies? Who thinks the CIA is following him? Well, he might be on to something, and not just on something.

The year just ended will be remembered for many things—don’t even get us started on Princess Leia—but the blow up over big data and privacy will be chief among them. We covered this topic throughout 2016, highlighting many of the heavyweights in this field like Palantir Technologies, now ranked as the third-most valuable private start-up in the United States by Business Insider, with a valuation of $20.53 billion. Palantir is the leader of the big data herd, but is far from being alone in this billion-dollar sector, as companies are needed to collect, store and analyze the huge amounts of data being generated online every day.


Much of that big data is so-called unstructured data, and much of it is generated from various social media platforms, from Yelp and TripAdvisor reviews to Facebook posts and Twitter tweets. How much data? In one minute, according to consulting and technology solutions company Excelacom, internet users post about 350,000 tweets (50,000 from President-elect Donald Trump alone), share more than 500,000 images, log nearly a million swipes on Tinder, and log on to Facebook about 700,000 times. Companies are coming up with all sorts of creative ways to capture that data exhaust—think carbon capture without the credits—and analyze it for fun and profit. Much of the profit has come from analyzing our online behavior and pushing brands and products at us, often based on location data from an IP address or the GPS most of us carry in our back pocket.

It turns out that marketing may be one of the more benign uses of big data.

We learned in October of last year that a startup called Geofeedia gobbles up big data from social media, including Facebook, Twitter, Instagram, and others, and then feeds very specific information about individuals to law enforcement officials. This isn’t just about catching the bad guys, according to the ACLU, which cried foul over the practice. By analyzing the information and data location embedded in social media posts, Geofeedia helped the government track non-criminals like activists and protestors.

Big data backlash

Headquartered in Chicago, Geofeedia has raised $23.79 million since it was founded in 2011, with a big infusion of $17 million in private equity in February 2016 from Silversmith Capital Partners. Its customer base includes—included?—Fortune 500 companies like McDonald’s and Dell, news outlets like the Associated Press, BBC and CNN, and major law enforcement agencies like the Los Angeles County Sheriff’s Department.

It’s the customers in the latter category that got Geofeedia’s use of social media big data in big trouble.

From the ACLU:

Using Geofeedia’s analytics and search capabilities and following the recommendations in their marketing materials, law enforcement in places like Oakland, Denver, and Seattle could easily target neighborhoods where people of color live, monitor hashtags used by activists and allies, or target activist groups as ‘overt threats.’ We know for a fact that in Oakland and Baltimore, law enforcement has used Geofeedia to monitor protests.

Facebook, Twitter and Instagram have all since blocked Geofeedia’s access to their data, which appears to have crippled the company. The Chicago Tribune reported in November that Geofeedia laid off about half of its approximately 60-person staff. In the same news story, company CEO and co-founder Phil Harris was quoted as saying the company would be changing directions but didn’t offer any specifics. We assume that direction will be away from providing law enforcement agencies like the Denver police details about participants in the city’s annual 4-20 Mary Jane rallies to start.

One thing to keep in mind in the big data privacy debate is that the technology itself isn’t the problem—but how it’s used is. For example, a security company reportedly used the Geofeedia data to locate a missing American student in Brazil and to alert clients of possible terrorist threats in Belgium. That’s from a story posted on Security Management, a trade publication for security professionals

Social media listening tools

The fallout from the scandal continued into December. That’s when Twitter announced it would cut off all geospatial intelligence data being provided to law enforcement through Dataminr, an analytics firm partially owned by Twitter.

The Verge reported that the move has hurt Dataminr, which canceled a CIA contract earlier in 2016 over similar concerns. Despite being hobbled by Twitter’s anti-surveillance policy, Dataminr still provides a limited version of its product, “which provides tailored breaking news alerts based on public Tweets, to those supporting the mission of first response.”

Like Geofeedia, Dataminr appears to be a new breed of social media listening tools. It has raised more than $180 million in funding since its founding in 2009. Its investors include such names as Venrock, Institutional Venture Partners, Fidelity, Wellington and Credit Suisse. And like Geofeedia, Dataminr’s customer base includes corporate security companies that need immediate intel to protect employees and assets around the world. More interestingly for investors, Dataminr analyzes and processes big data for hedge funds and investment banks “to take action on early market moving information and gain perspective and context from differentiated information.”

Details are in the big data

One thing should be clear: Despite our uneasiness with big data—not to mention the mad ravings of your brother-in-law—social media is here to stay. Backlashes over privacy only last as long as the next news cycle or the next Trump tweet. And that means more companies will continue to come online to exploit that data in interesting, profitable and possibly dangerous ways.

Boston-based Crimson Hexagon, for example, announced just last month its social data repository hit one trillion posts. The company—founded in 2007 and backed by about $30 million in investments, including $20 million in Series C funding in March 2016—decided to have a little fun with the milestone.  It used its data repository to showcase key cultural and business trends over the past six years.

Some of the cool factoids culled from a trillion particles of data exhaust include:

  • July 2014 is when Airbnb overtook both Marriott and Hilton—America’s two largest hotel chains—in terms of conversation volume on social media.
  • Mac users are more interested in creative pursuits like design and web development, while PC users prefer gaming and news.
  • People are increasingly happy about fats in their food, as long as they’re the good kind. Positive feelings about good fats have nearly doubled since 2010 compared to other, more traditionally popular nutritional components like vitamins and natural ingredients.

The truth is out there—if you have enough big data to find it.


Leave a Reply

Your email address will not be published.