Investing in the Explosive Growth of Unstructured Data
Elon Musk suggested that today’s polarized societies can be brought together by finding a common interest. Exploration of the universe is one suggestion he makes which implies that our media outlets ought to glamorize the sciences a bit more. It’s great Rhianna looks “stunning” as she waddles around New Yawk City with her weed-smoking gun-toting baby daddy in tow, but just maybe we could have filled that slot with an article on how the smartest AI algorithms on this planet cracked one of the grand challenges of biology in just 18 months? One of mankind’s biggest scientific breakthroughs to date didn’t even manage to get picked up by CNN’s daily political wankathon.
DeepMind’s incredible accomplishment – predicting the structure of almost every protein cataloged by science – has already led to advances in combating malaria, antibiotics resistance, and plastic waste according to an article by New Scientist that you can find if you dig really hard through all the rubbish headlines out there. But that’s not even the exciting part. Now that DeepMind can predict the structure of known proteins, it can allow us to create unknown proteins. The grey pixel below represents known proteins, while all other beige pixels represent opportunities for incredible breakthroughs.
Once DeepMind gets done working on the protein problem it can then start to tackle other datasets out there, many of which are newly emerging thanks to emerging trends like social media, smartphones, IoT sensors, geospatial imaging, and the like.
The Changing Nature of Data
Data that describes data is called metadata. Historically, we’ve described data in simple predictable forms. Phone numbers have a certain format based on country code. Email addresses have a predictable format. People’s names will never exceed a certain length. These are all fields in databases that can be easily described. Every software application has a well-defined database attached to its behind, and that method of storage has largely remained the same since Bill Gates brought us Windows 3.1.
First proposed in 1970, the relational database management system (RDBMS) has been around for decades and is the backbone of nearly every organization that stores data. Extracting insights from historical data was the domain of data mining companies, while the emergence of AI introduced forward-looking insights – predictive analytics. Companies like Confluent (CFLT) allow us to analyze the data faster, in real time, so that decision-making becomes faster. Up until recently, data was structured in a way that made it easy for analysts to query it using a common language, but over the last decade, the traditional RDBMS is becoming less suitable for managing the changing nature of data itself.
Structured Data and SQL
An RDBMS contains structured data. That is, every relational database contains a schema that describes the type of data each row will contain. Here’s a simple example of a relational database schema.
Before storing data in an RDBMS, you need to define what you plan to store. Once your database has been defined, then you might create stored procedures which are essentially functions that can be called by a front-end developer to manipulate the data. This serves to provide a security layer and makes sure that the integrity of the data is preserved. Now imagine how tough it would be to add a new field to the database. The stored procedure would then need to be changed and the developer would need to change their code to accommodate the change.
Anyone who works with databases knows how to use structured query language (SQL) which allows one to manipulate data across any type of RDBMS, whether it’s made by Microsoft or Oracle or IBM. That’s why the name “NoSQL” probably feels like a threat to some people’s livelihood, so let’s address the elephant in the room.
NoSQL vs. RDMS
NoSQL actually stands for “not only SQL” and it’s an entirely new paradigm for databases that allows unstructured data to be easily stored and accessed by programmers who have been subjected to the pain of relational databases for far too long. It’s growing in popularity because unstructured data is simply exploding.
Structured data can be defined as data that can be stored in relational databases, and unstructured data is everything else. Human-generated unstructured data includes emails, YouTube videos, social media posts, text messages, audio/video files, MS Office documents, presentations, log files, and the list goes on. Machine-generated unstructured data includes satellite imagery, scientific data, digital surveillance, sensor data, and software logs. Every article on our website is unstructured data, and everything published on the internet as well.
An excellent blog piece by Cloudera (also unstructured data) pulls together interesting statistics about unstructured data from various sources. Only 10% of unstructured data is actually stored, less than that analyzed. While structured data is growing by around 12% per year, unstructured is growing at a rate of 55% to 65% annually.
The general consensus among industry experts is that 80 to 90 percent of data today is unstructured and 90% of it was created in the past several years. Of course, they’ve been saying that for the last decade, but here’s an even more meaningful data statistic – less than 1% of data that’s been produced is being analyzed. For example, imagine what sort of customer insights might be derived from mining call center transcripts, social media posts, product reviews, and chatbot conversations as they’re being generated. In the future, the vast majority of data will be unstructured, and it needs to be stored before AI algorithms can start munching away on it.
For those of you interested in learning how NoSQL databases differ from relational databases on a technical level, there’s plenty of information out there. For your average retail investor, it’s sufficient to say that NoSQL has become quite popular in the last decade as companies look to capitalize on all the unstructured data at their disposal. In addition to handling unstructured data, advantages of NoSQL include high scalability, distributed workloads, lower cost, schema flexibility, and no complex relationships. Perhaps the most strategic advantage is ease-of-use which is helping to drive adoption:
Adoption of NoSQL databases has primarily been driven by uptake from developers who find it easier to create various types of applications compared to using relational databases.Credit: MongoDB
The above statement was made by a company that many deem the market leader in NoSQL – MongoDB.
The NoSQL Market Leader
Our search for a leader always starts with a solid day of sifting through articles and documents to see what the pulse of the community tells us. Everywhere we looked, we read about a firm called MongoDB which seems to have leadership in the NoSQL space. Research firm Slintel claims MongoDB has a 47% market share of the NoSQL opportunity, but that seems to be the result of some automated methodology that churns out insights faster than a room full of Johns in Mumbai. For clarity, we turned to a source that usually sets us straight when it comes to enterprise software leadership – the MBAs at Gartner – but were surprised to see MongoDB was nowhere to be found.
In reading through the December 2021 Gartner Magic Quadrant for Cloud Database Management Systems, we found the following statement:
Its market performance is outstanding, and it has been one of the most successful vendors in moving to the cloud. This vendor did not respond to requests to participate in this year’s Magic Quadrant. This is the fifth consecutive year of nonparticipation for MongoDB, thus our information on the vendor’s strategy and roadmap is significantly outdated. As a result, we have not attempted to assess MongoDB in this Magic Quadrant.Credit: Gartner
Maybe they’re too busy executing to jump through Gartner’s hoops. Well, popular public opinion it is then.
The Holy Trinity
A two-part article by VentureBeat described MongoDB, Snowflake, and Databricks (privately held) as “the data world’s hottest trio” that are all aspiring to become “the next-generation default enterprise cloud data platform.” While not accessible to the average layperson, the articles describe how these three firms are able to coexist without stepping on each other’s toes, at least for now. In other words, there’s so much market share to be captured – whether blue ocean or by stealing from the Oracles of the world – that it’s not a zero-sum game.
What we like about MongoDB is their opensource approach and messaging to app developers that “traditional databases have proven to be hurdles, owing to the rigid nature of relational schema and the inability to scale them out.” Every developer has intimately experienced the RDBMS pain point, and adoption will be driven from the bottom up similar to what Confluent is doing. In short, we’d be keen to own any of these firms at a reasonable valuation.
As we look to boost our exposure to the growth of big data, we’ve looked at some themes and stocks to play them with as seen below (links lead to our past research pieces):
- Data insights will need to happen quicker
- Confluent (CFLT) real-time data analytics
- Data storage needs to offer better performance at a lower cost
- Pure Storage (PSTG) and flash-native
- Data centers are being deployed like hotcakes
- NVIDIA (NVDA) 43% of revenues from data center hardware
- Data center REITs – Equinix (EQIX) and Digital Realty (DLR)
- Data warehouses are outdated –
- Snowflake (SNOW) modern data warehouse with a lower total cost of ownership
- Unstructured data is growing exponentially
Next, we’re interested in looking at how we might get exposure to the growth of unstructured data. To do so, we’ll start by taking a closer look at MongoDB.
The traditional relational database paradigm has been around for 50 years providing a foundation for modern computing. The explosion of data as a result of tech trends like social media, IoT sensors, smartphones, and geospatial imaging have led to an unstructured data boom that’s taxing traditional methods of storing data. Emerging onto the scene are technologies like NoSQL that don’t necessarily threaten RDBMS vendors, but create their own blue ocean total addressable market that only stands to grow if unstructured data grows as fast.
We recently looked at data storage as a logical thesis for the growth of big data. What we found was that traditional data storage methods like HDD were being displaced by new technologies like SSD. In other words, investing in data storage is a good idea, provided you know what technologies are coming out ahead. The same holds true for investing in database software. While the Oracles of the world try to dismiss the potential of upcoming technologies like NoSQL, the growth numbers tell a different story. Provided unstructured data grows as expected, there’s a compelling case to be made for investing in technologies like NoSQL.
Tech investing is extremely risky. Minimize your risk with our stock research, investment tools, and portfolios, and find out which tech stocks you should avoid. Become a Nanalyze Premium member and find out today!
Nice article, but the main problem is the stocks discussed here already raised a lot in the last 2-3 months:
MDB had 52 week low on 26th May, now it is +78%.
CFLT had 52 week low on 24th May , now it is +100%.
SNOW had 52 week low on 14th June, now it is +53%
I wouldn’t buy any of them now with the current price. Maybe there is still some opportunity with PSTG: in May it reached a low $22, now over $30, so it is +36%.
Great comment Stan, thanks for doing the math on returns since 52-week lows. We’re thinking the same thing – too rich. You can be almost certain that at some point they’ll “disappoint” Wall Street’s lofty expectations and that will provide an opportunity. We’ll likely set simple valuation ratio targets and wait for an opportunity. Would buy Snowflake at 20.
Excellent article and reply, Thank you both. Great stories, not great prices …yet.
Cheers for the feedback Leigh!
Your conclusions and investment thesis rests on incorrect assumptions and shaky presumptions.
First off, RDBMS systems are no longer limited by rigid schema structures and difficult modification methods.
Second, RDBMS systems are now encorporating JSON and thus ‘unstructured’ data as a ‘subset’ of structured data – example being the Oracle Database.
Third, the Oracle database has been re-engineered for the cloud and easy scaling.
Fourth, the NoSQL paradigm does not lend itself well to forthcoming Zero-Trust Architectures while RDBMS’s are better suited as the basis to build a ZTA.
Finally, RDBMS’s such as Oracle have had the ability to manage BLOB data – which is what unstructured data really is. Mongo and other players in the NoSQL space are only a few innovations away from irrelevance.
Thank you for the color.
Glad we didn’t invest in Mongo then, but somebody is buying this stuff.
It’s never nice to take advantage of someone’s subject matter expertise, but here it goes 🙂 Any thoughts on where Snowflake sits with traditional vendors like Oracle?