10 Data Science and Predictive Analytics Startups

Just last week, a data company called Alteryx (NYSE:AYX) went public, raising $120 million and giving them a market cap of around $828 million. Speculation is that now the data theme is heating up and in turn people are starting to blab on about “data driven cultures” and “trusting your data, not your gut”. The point is, this stuff has been around for a while yet some of the applications we’re seeing when “big data” and artificial intelligence come together are frankly mind blowing.

We’ve written extensively and in depth about big data but we never really delved into the work that goes into using this data. Clearly, we can’t just use traditional tools that have been used in the data warehouses of old, meaning that we need a new approach. A lot of this work falls on the lap of guys we call “data scientists” or specialists in the field of “data science”. A data scientist in a large organization follows a certain work process. He defines the project (or problem), explores the data at hand, prepares the data, creates the model (analytical model), deploys (or uses) the model, and once the model is adopted, manages this model.

Normally, data scientists handle more or less similar sets of data coming maybe from the same industry, the same products, the same customers, etc. which makes the use of the same mathematical model fairly safe. But businesses rarely are dealing with the same data which means they want to experiment or want a glimpse of different outcomes from different scenarios. This need translates to different mathematical models. This also means more data scientists. Data scientists don’t actually come cheap. We need a technology to make sense of data regardless of industry, source, format, or size at a fraction of the time a data scientist would take. 

Here are ten players trying to solve these problems.

Opera Solutions

Founded in 2004, New Jersey startup Opera Solutions has taken in $122.2 million so far with $70 million of that coming in the form of debt financing late last year. Opera Solutions approaches predictive analytics by identifying “signals” from a client’s big data and they’ve managed to identify 1.5 trillion signals so far. Apparently, you need about 3,000 signals to get “insights” from your big data set. Their platform, “Signal Hub” can generate those signals and uses artificial intelligence to make predictions. Back in 2009, they beat out 41,000 teams in the memorable “Netflix Challenge” to come in second place with an algorithm that optimized “recommended videos”. Their solution is a “layer” that fits between your raw data and the users of the data. Take two minutes and be wowed by this video where they show their technology applied to an airline.

Did you see at the beginning just how much big data they’re looking at for that airline (which we’re guessing  is United)?


Click for company websiteFounded in 2012, Boston, Massachusetts startup DataRobot has taken in $111.42 million in 3 rounds of funding so far with the latest round of $54 million closing just this month. Their cloud-based solution requires no data science expertise and is referred to as “world class machine learning as a service”. It deploys quickly and in just a few days, you’ll be solving real-world problems that would otherwise require staffing data scientists. Practically anyone can use it, even MBAs. The platform is industry agnostic and is currently being used in banking, healthcare, fintech, and insurance with use cases like these:

Data Science and Predictive Analytics in Insurance

CTOs take note here – this thing could make you look really good in that next board meeting where you can talk about how you already “failed fast” and how you’re actually “using AI” now which could also make for a nice “thank the troops” speech at your next shareholders meeting.

Update 08/20/2019: For more information on DataRobot, please see our article titled “Automated Machine Learning from DataRobot.”

Update 09/17/2019: DataRobot has raised $206 million in Series E funding to continue building out its product line while looking for acquisition opportunities where it makes sense. This brings the company’s total funding to $430.6 million to date. 

CrowdFlower Inc.

Click for company websiteFounded in 2007, San Francisco startup CrowdFlower has raised $38 million in 5 rounds of funding, a big portion of which came from Microsoft Ventures. The CrowdFlower AI platform puts together the best features of AI, machine learning, and human intervention to make sense of big data with predictive models powered by Microsoft’s own Azure Machine Learning cloud service (of  course). They call their methods “human in a loop” and essentially they start by gathering all your data and then doing cool isht with it like automating people’s jobs. The cost for a “starter package” is around $1,500 dollars a month so buying the thing is actually cheaper than hiring your typical recipe-driven EMC employee who refuses to think outside the box. If you’re “John in Mumbai”, then this thing should have you shaking in your faux leather boots. 


Click for company websiteFounded in 2007, Boston, Massachusetts startup RapidMiner is the industry’s “#1 open source predictive analytics platform” and has taken in $36 million so far to make this happen. Given that it is open-source software, RapidMiner is free to download and use until you decide to start scaling it at which point the pricing kicks in. The open-source community of over 200,000 users ensures that you’ll always have a data scientist on hand who can answer your questions. Their commercial clients include a slew of big names like Samsung, BMW, Salesforce, Cisco and many others. Remember, data mining (looking for patterns in data) has actually been around for decades, except now we’re using machine learning to identify patterns at a rate that is exponentially faster than humans ever could by using traditional data mining algorithms. RapidMiner has put together a 264-page ebook on the topic if you’re interested in learning more about Data Mining for the Masses

DataScience, Inc.

Founded in 2014, Culver City, California startup DataScience has taken in $28 million in 3 rounds of funding so far to develop their platform. DataScience, as the name suggests, does the whole stretch of the data science process and offers expertise through its DataScience Cloud platform.  The idea is that they work with you to ensure that your initiative is successful and that you “fail fast” while trying to “innovate or die”. One of the areas they’re working in is the “recommendation engine” which they say drives $1 billion a year in sales for Netflix and gives Amazon a 20-35% lift in sales every year. Who knew?

Civis Analytics

Click for company websiteFounded in 2013, Chicago, Illinois, is a spin-off of the work performed for the Obama re-election campaign. The Company has raised $22 million so far from prominent investors like Verizon Ventures and Eric Schmidt of Google who was the technology adviser for the Obama re-election campaign. Their cloud-based data science platform is built by data scientists, for data scientists, and is being used by companies like Airbnb and the Discovery Channel. In addition to using machine learning to drive insights like the rest, Civis also talks about how “your data will pair perfectly with their enhanced national database of 220 million Americans“. Maybe they should get together with DemystData and really “enhance” that database of Americans. This should raise some really interesting questions about “big data privacy“. 


Click for company websiteFounded in 2013, French startup Dataiku has taken in $17.7 million in funding to create their Data Science Studio (DSS) platform that allows loading of data from all your data storage platforms and cleansing the data visually, turning the data into models, and turning the models into continuously running applications. The goal is to shorten the load-prepare-train-test cycles of the platform’s machine learning functionality. Does anyone remember when we used to call this stuff ETL? Those were much simpler times.

Update 08/24/2020: Dataiku has raised $100 million in Series D funding to fuel their continued growth. This brings the company’s total funding to $246.8 million to date.    

Domino Data Lab

Click for company websiteFounded in 2013, San Francisco startup Domino has taken in $13.6 million in funding so far to develop their data science platform which is structured into layers (security, user, and infrastructure). Domino generates analytic models in the form of application program interfaces (APIs) managed at the “user layer” which can be tested in parallel through “experiments”, deployed, track, filtered, and compared. It’s currently being used by companies like Allstate, Zurich, and Monsanto.

Arimo, Inc.

Founded in 2012, Silicon Valley startup Arimo took in a single round of $13 million in funding led by Andreessen Horowitz. The Company cites the strength of their leadership team as a competitive advantage and their Behavioral AI™ solution is currently deployed for yield optimization, cross-selling & upselling, product recommendations, predictive maintenance, demand forecasting, fraud, and other applications as well. If you want to see what their platform is capable of, here’s a walkthrough of how they used it to identify a path to automate $200 billion in Medicare claims adjustments. That task took a whole 10 hours. 


Click for company websiteFounded in 2011, Boston, Massachusetts startup Nutonian has taken in a single $4 million funding round to develop a machine learning engine that goes by the tradename Machine Intelligence™. This engine works by doing cross-validation, meaning it splits the data set into two parts: one training the analytic model, the other validating its accuracy. They claim to extract insights at a rate of “billions per second” and presently serve clients like Alcoa, Amazon, NASA, and BP among others.


One company that is worth mentioning (but didn’t make our list because they were acquired by Google) is Kaggle, a platform that hosted predictive modeling competitions. As of last year, Kaggle had 536,000 users, each of which has a profile which says something about their skills. That’s a whole lot of talent that Google now has access to without having to deal with recruiters


Leave a Reply

Your email address will not be published.