Alternative Data for Finance and Investing

February 22. 2019. 7 mins read

We’ve all heard some variation of the adage about how 98% of the world’s data was created in the past two years, and at the rate the Internet of Things (IoT) is developing, that should continue to hold true. A firm out there has now developed an IoT sensor that can actually power itself using WiFi signals, so it’s only a matter of time before everything out there has its own digital twin.

When it comes to media data, that’s growing quickly as well, though one could argue the vast majority of what gets vomited out across the social media airwaves isn’t worth the ones and zeroes it was printed on. Still, all the world’s news now comes out first in digital form, and as this news comes out it is consumed by people who want to make money off of it, something we talked about recently referring to this as “generating alpha using an information advantage.” In other words, bad earthquake in Tokyo means bad day on the Japanese stock market. Generally speaking.

In the world of finance, data has always been an extremely important part of the business with traditional financial data usually lumped into broad categories like fundamental data (earnings, revenues, etc.) or market data (volume, price, etc.). Data provides an information advantage, but not if everyone has access to the same data or – more importantly – uses it in the same way. Historically, data has been used in pretty much the same way by everyone, something that is even more boring than it sounds. Then came alternative data, something much more exciting. Satellites can now count cars in retail store parking lots, watch the levels of oil tanks at refineries, or count aluminum ingots. Data has become a whole lot cooler, and today we’re going to talk about one of the oldest players in the alternative data game – a firm called RavenPack.

A Leader in Alternative Data

Click for company websiteOriginally founded in Spain, RavenPack moved to what the Americans call the greatest city in the world – New Yawk – where they took their $5 million in funding and proceeded to grow a sizable business selling alternative data to some of the biggest quant traders out there. The best way to describe what the company does is that they scour the Internet for news that appears from any source you can think of. As soon as a new article appears, their system consumes it, digests the contents, makes sense of the contents, and makes the information available in a database in near real-time (incredibly, that whole process takes place in less than one-third of a second).

When we talk about how they “make sense” of data, it’s a whole lot more sophisticated than just screen scraping, looking for a name, and then trying to predict positive or negative sentiment. What they’re able to do is identify 7,000 different types of market events, all while differentiating between “fact” or “opinion.”

A financially tuned NLP engine
Source: RavenPack

While the platform covers 40,000 plus stocks, it also allows you to track organizations, places, commodities, people, and products, which means the tool is applicable to all kinds of asset classes. Most importantly, they’re able to provide historical data (critical for backtesting) since they’ve been capturing this stuff going all the way back to 2000. That’s why over two-thirds of the top-20 quantitative funds out there use RavenPack’s alternative data platform, and they claim it’s pretty much a must-have in the systematic trading space given the proven value of “sentiment” which we’re going to talk about next.

Alternative Data and Sentiment

Using Natural Language Processing (NLP) algorithms, RavenPack goes beyond earnings whispers to look at things like wildfires in Sonoma County or layoffs at any of Lincoln Electric’s 56 global manufacturing locations. Water shortages, wildfires with acreage affected, a CEO resigning, on the job accidents, labor disputes, and even earthquakes with severity measured by death toll and Richter scale, all of these events can be captured in less than a second once the news hits the wire.

What’s also being captured and relayed here is something called sentiment, which isn’t as intuitive as it sounds. Let’s say that some Middle Eastern country lobbed some rockets at another Middle Eastern country. While we would all agree that’s a negative sentiment, what we could then see are varying degrees of negative sentiment based on the number of deaths that occurred:

  • Violence in Middle East + 100 Deaths = (more negative sentiment)
  • Violence in Middle East Only = (negative sentiment)
  • Violence in Middle East + Zero Deaths = (less negative sentiment)

But here’s the thing. Sentiment doesn’t really matter if you’re holding a company in your portfolio that sells missile defense systems to both countries in the spat. More violence in the Middle East means more sales so you may want to apply your own sentiment to market events, which brings us to how this alternative data is being used.

How Alternative Data is Used

One of the lovely marketing folks at RavenPack managed to get us some time with their team to pore over their data sets and we came away with a new appreciation for alternative data and just how many use cases there are for it. While listening to one of their clients talk about the usefulness of their data, they addressed a question we’ve wondered before. How is sentiment data different from price momentum? (Price momentum is the rate of change for a stock price movement.) In other words, if someone says something bad about a stock, then the stock price drops. Consequently, if you chart sentiment and price momentum, you should just see two lines side-by-side that behave exactly the same. Whoopee doo. Turns out, that’s not the case.

RavenPack’s client took around 12 years of historical data and ran tests against historical stock price performance of a particular index of stocks and showed that the “sentiment” signal was actually able to outperform the “momentum” signal consistently over a decade. That outperformance was particularly impressive because it was demonstrated in U.S. markets, an environment which is the most efficient in terms of the “efficient market hypothesis” which says that all available information is priced in. This also implies that these sorts of signals can possibly prove even more effective if applied to foreign markets.

In another case, RavenPack data was used to compare the earnings revisions with actual earnings data pulled straight from news sources. If you’re not familiar with how this works, it goes something like this.

A bunch of analysts put together financial models in complex spreadsheets they learned how to build for the small price of a six-figure MBA program. The analysts will then forecast a company’s earnings using their models. When they revise those forecasts, that’s referred to as an “earnings revision” which will consequently impact the price of a stock. What RavenPack’s client found was that going straight to the horse’s mouth was a more effective signal than waiting for some hungover analyst to digest the news and then proclaim some new views to the world. Earnings revisions aren’t a valuable signal at all because analysts have no information advantage. Algorithms can get an information advantage by going straight to the news and cutting out the middleman. Of course, that doesn’t mean that it’s all about speed. The days of using this data for high-speed trading are all but gone now.

Alternative Data and Time Horizons

While historically this type of data was used for High Frequency Trading (HFT), that term has a different meaning to different people. While RavenPack does offer real-time data, clients today are looking towards smart rather than fast. Available information is not synonymous with useful information, and the usefulness is no longer in a nanotrading environment (trading in microseconds) but with horizons measured in days or weeks. Now, the more complex a signal is, the longer the alpha generation lasts, so clients are more focused on getting this data in the hands of everyone as opposed to just tech-savvy quants.

RavenPack is also making sure that people starting out know about their products. If you graduated from an Ivy league university and studied data, you would have used RavenPack’s data in your classes. But brand awareness isn’t the only thing that’s putting the data in lots of people’s hands. It’s also the platform they’ve built around their data.

Alternative Data Democratization

Bloomberg published a piece late last year titled “Parking Lots Don’t Tell the Whole Story: The Trouble With Alternative Data” in which they gathered a bunch of talking heads to discuss issues with alternative data. In that article, they talked about how the proliferation of data sets means that the information advantage is being eroded away. While this certainly may be the case for some of the quant funds that were first to play this game, the potential applications are really endless. It only means that those who generate alpha will need to become more sophisticated by using the data in more complex ways. (Sounds like something AI algorithms might be pretty good at doing.) RavenPack calls this “alternative data democratization,” and the idea is that everyone should have access to the data so that we can find new and exciting use cases for it. And that’s just what they’ve done with the platform you see below:

A RavenPack portfolio dashboard
Source: RavenPack

What you can then do is just create your portfolio on their platform and then monitor sentiment for your holdings. This use case is being used by portfolio managers who want to keep tabs on relevant news or private wealth managers who want to front-run any news about a client’s holdings so they already have a response prepared when the phone rings. Other use cases include risk management, compliance, and even corporate clients who want to keep tabs on mergers and acquisitions that are happening. In the not-so-distant-future, RavenPack sees a variation of this product being offered to retail investors as well.


While there are a lot of data sets out there, companies need to find the ones that can generate alpha. For that, they need historical data for backtesting, good universe coverage, and good data structure. That pretty much describes RavenPack’s data, and the tools they built allow for anyone with half a brain to use an AI-powered platform to understand what’s happening in the world.  At the same time, they also provide the capabilities for more savvy individuals to drill-down and check the underlying data to see how the algorithms arrived at their conclusions. With a platform this slick, and a product that’s used by a great deal of firms, it doesn’t seem like RavenPack will have any problems finding a suitor when it comes time for an exit.


Leave a Reply

Your email address will not be published.