Investing in the Explosive Growth of Unstructured Data

Elon Musk suggested that today’s polarized societies can be brought together by finding a common interest. Exploration of the universe is one suggestion he makes which implies that our media outlets ought to glamorize the sciences a bit more. It’s great Rhianna looks “stunning” as she waddles around New Yawk City with her weed-smoking gun-toting baby daddy in tow, but just maybe we could have filled that slot with an article on how the smartest AI algorithms on this planet cracked one of the grand challenges of biology in just 18 months? One of mankind’s biggest scientific breakthroughs to date didn’t even manage to get picked up by CNN’s daily political wankathon.

CNN's daily political wankathon. Not a single article on the entire site covers the AlphaFold breakthrough - Credit: CNN
Not a single article on the entire site covers the AlphaFold breakthrough – Credit: CNN

DeepMind’s incredible accomplishment – predicting the structure of almost every protein cataloged by science – has already led to advances in combating malaria, antibiotics resistance, and plastic waste according to an article by New Scientist that you can find if you dig really hard through all the rubbish headlines out there. But that’s not even the exciting part. Now that DeepMind can predict the structure of known proteins, it can allow us to create unknown proteins. The grey pixel below represents known proteins, while all other beige pixels represent opportunities for incredible breakthroughs.

Image showing pixels in a square with the grey pixel representing known proteins, while all other beige pixels representing opportunities for incredible breakthroughs.
Credit: TED Talk

Once DeepMind gets done working on the protein problem it can then start to tackle other datasets out there, many of which are newly emerging thanks to emerging trends like social media, smartphones, IoT sensors, geospatial imaging, and the like.

The Changing Nature of Data

Data that describes data is called metadata. Historically, we’ve described data in simple predictable forms. Phone numbers have a certain format based on country code. Email addresses have a predictable format. People’s names will never exceed a certain length. These are all fields in databases that can be easily described. Every software application has a well-defined database attached to its behind, and that method of storage has largely remained the same since Bill Gates brought us Windows 3.1.

First proposed in 1970, the relational database management system (RDBMS) has been around for decades and is the backbone of nearly every organization that stores data. Extracting insights from historical data was the domain of data mining companies, while the emergence of AI introduced forward-looking insights – predictive analytics. Companies like Confluent (CFLT) allow us to analyze the data faster, in real time, so that decision-making becomes faster. Up until recently, data was structured in a way that made it easy for analysts to query it using a common language, but over the last decade, the traditional RDBMS is becoming less suitable for managing the changing nature of data itself.

Structured Data and SQL

An RDBMS contains structured data. That is, every relational database contains a schema that describes the type of data each row will contain. Here’s a simple example of a relational database schema.

A simple example of a relational database schema.
A typical relational database schema – Credit: MySQL

Before storing data in an RDBMS, you need to define what you plan to store. Once your database has been defined, then you might create stored procedures which are essentially functions that can be called by a front-end developer to manipulate the data. This serves to provide a security layer and makes sure that the integrity of the data is preserved. Now imagine how tough it would be to add a new field to the database. The stored procedure would then need to be changed and the developer would need to change their code to accommodate the change.

Anyone who works with databases knows how to use structured query language (SQL) which allows one to manipulate data across any type of RDBMS, whether it’s made by Microsoft or Oracle or IBM. That’s why the name “NoSQL” probably feels like a threat to some people’s livelihood, so let’s address the elephant in the room.

NoSQL vs. RDMS

NoSQL actually stands for “not only SQL” and it’s an entirely new paradigm for databases that allows unstructured data to be easily stored and accessed by programmers who have been subjected to the pain of relational databases for far too long. It’s growing in popularity because unstructured data is simply exploding.

Structured data can be defined as data that can be stored in relational databases, and unstructured data is everything else. Human-generated unstructured data includes emails, YouTube videos, social media posts, text messages, audio/video files, MS Office documents, presentations, log files, and the list goes on. Machine-generated unstructured data includes satellite imagery, scientific data, digital surveillance, sensor data, and software logs. Every article on our website is unstructured data, and everything published on the internet as well.

An excellent blog piece by Cloudera (also unstructured data) pulls together interesting statistics about unstructured data from various sources. Only 10% of unstructured data is actually stored, less than that analyzed. While structured data is growing by around 12% per year, unstructured is growing at a rate of 55% to 65% annually.

Infographic showing Structured Data Vs Unstructured Data
Credit: Datamation

The general consensus among industry experts is that 80 to 90 percent of data today is unstructured and 90% of it was created in the past several years. Of course, they’ve been saying that for the last decade, but here’s an even more meaningful data statistic – less than 1% of data that’s been produced is being analyzed. For example, imagine what sort of customer insights might be derived from mining call center transcripts, social media posts, product reviews, and chatbot conversations as they’re being generated. In the future, the vast majority of data will be unstructured, and it needs to be stored before AI algorithms can start munching away on it.

For those of you interested in learning how NoSQL databases differ from relational databases on a technical level, there’s plenty of information out there. For your average retail investor, it’s sufficient to say that NoSQL has become quite popular in the last decade as companies look to capitalize on all the unstructured data at their disposal. In addition to handling unstructured data, advantages of NoSQL include high scalability, distributed workloads, lower cost, schema flexibility, and no complex relationships. Perhaps the most strategic advantage is ease-of-use which is helping to drive adoption:

Adoption of NoSQL databases has primarily been driven by uptake from developers who find it easier to create various types of applications compared to using relational databases.

Credit: MongoDB

The above statement was made by a company that many deem the market leader in NoSQL – MongoDB.

The NoSQL Market Leader

Our search for a leader always starts with a solid day of sifting through articles and documents to see what the pulse of the community tells us. Everywhere we looked, we read about a firm called MongoDB which seems to have leadership in the NoSQL space. Research firm Slintel claims MongoDB has a 47% market share of the NoSQL opportunity, but that seems to be the result of some automated methodology that churns out insights faster than a room full of Johns in Mumbai. For clarity, we turned to a source that usually sets us straight when it comes to enterprise software leadership – the MBAs at Gartner – but were surprised to see MongoDB was nowhere to be found.

Gartner's Magic Quadrant for cloud database management systems
Credit: Gartner

In reading through the December 2021 Gartner Magic Quadrant for Cloud Database Management Systems, we found the following statement:

Its market performance is outstanding, and it has been one of the most successful vendors in moving to the cloud. This vendor did not respond to requests to participate in this year’s Magic Quadrant. This is the fifth consecutive year of nonparticipation for MongoDB, thus our information on the vendor’s strategy and roadmap is significantly outdated. As a result, we have not attempted to assess MongoDB in this Magic Quadrant.

Credit: Gartner

Maybe they’re too busy executing to jump through Gartner’s hoops. Well, popular public opinion it is then.

The Holy Trinity

A two-part article by VentureBeat described MongoDB, Snowflake, and Databricks (privately held) as “the data world’s hottest trio” that are all aspiring to become “the next-generation default enterprise cloud data platform.” While not accessible to the average layperson, the articles describe how these three firms are able to coexist without stepping on each other’s toes, at least for now. In other words, there’s so much market share to be captured – whether blue ocean or by stealing from the Oracles of the world – that it’s not a zero-sum game.

What we like about MongoDB is their opensource approach and messaging to app developers that “traditional databases have proven to be hurdles, owing to the rigid nature of relational schema and the inability to scale them out.” Every developer has intimately experienced the RDBMS pain point, and adoption will be driven from the bottom up similar to what Confluent is doing. In short, we’d be keen to own any of these firms at a reasonable valuation.

As we look to boost our exposure to the growth of big data, we’ve looked at some themes and stocks to play them with as seen below (links lead to our past research pieces):

Next, we’re interested in looking at how we might get exposure to the growth of unstructured data. To do so, we’ll start by taking a closer look at MongoDB.

Conclusion

The traditional relational database paradigm has been around for 50 years providing a foundation for modern computing. The explosion of data as a result of tech trends like social media, IoT sensors, smartphones, and geospatial imaging have led to an unstructured data boom that’s taxing traditional methods of storing data. Emerging onto the scene are technologies like NoSQL that don’t necessarily threaten RDBMS vendors, but create their own blue ocean total addressable market that only stands to grow if unstructured data grows as fast.

We recently looked at data storage as a logical thesis for the growth of big data. What we found was that traditional data storage methods like HDD were being displaced by new technologies like SSD. In other words, investing in data storage is a good idea, provided you know what technologies are coming out ahead. The same holds true for investing in database software. While the Oracles of the world try to dismiss the potential of upcoming technologies like NoSQL, the growth numbers tell a different story. Provided unstructured data grows as expected, there’s a compelling case to be made for investing in technologies like NoSQL.

Tech investing is extremely risky. Minimize your risk with our stock research, investment tools, and portfolios, and find out which tech stocks you should avoid. Become a Nanalyze Premium member and find out today!

4 thoughts on “Investing in the Explosive Growth of Unstructured Data
  1. Nice article, but the main problem is the stocks discussed here already raised a lot in the last 2-3 months:
    MDB had 52 week low on 26th May, now it is +78%.
    CFLT had 52 week low on 24th May , now it is +100%.
    SNOW had 52 week low on 14th June, now it is +53%
    I wouldn’t buy any of them now with the current price. Maybe there is still some opportunity with PSTG: in May it reached a low $22, now over $30, so it is +36%.

    1. Great comment Stan, thanks for doing the math on returns since 52-week lows. We’re thinking the same thing – too rich. You can be almost certain that at some point they’ll “disappoint” Wall Street’s lofty expectations and that will provide an opportunity. We’ll likely set simple valuation ratio targets and wait for an opportunity. Would buy Snowflake at 20.

Leave a Reply

Your email address will not be published.

[class^="wpforms-"]
[class^="wpforms-"]
[class^="wpforms-"]
[class^="wpforms-"]
[class^="wpforms-"]
[class^="wpforms-"]
[class^="wpforms-"]
[class^="wpforms-"]