fbpx

Fivetran: Extract, Load, and Transform

September 25. 2021. 7 mins read

There’s nothing worse than being famous and having your tumultuous love life on display for a bunch of shallow muppets to critique. So, have you heard the news? Yesterday, Elon Musk semi-separated with Grimes – that loco-looking tattooed chick that totally wasn’t his type. The key takeaway is that he can now spend an extra 45 minutes per week running his many businesses. And for that, he’s sure to need lots of big data.

Most people haven’t a clue how much data is needed for top-level decision making in complex organizations. If you’re a data engineer, you have some idea. Here’s a tip for all you ambitious PHDs out there (PHD stands for poor, hungry, and determined). Become a competent data engineer and make yourself available for skunkworks projects. You’ll quickly find yourself interfacing with the top brass in no time. That’s because every organization’s unique big data Frankenstack requires highly specialized engineers to make sense of critically important data. Lots of this work can be ad-hoc, depending on what corporate mantra du jour the C-suite is presently occupied with – like “doing more with less.”

The process of moving data around is often referred to as extract-transform-load (ETL), and it’s something Fivetran has made a $5.6 billion business out of.

About Fivetran

Click for company website

Founded in 2012, Oakland, California’s own Fivetran has raised just over $728 million in disclosed funding to build an ETL tool that’s capable of seamlessly navigating across all the cloud-based databases your organization interacts with. They’ve built “data connectors” that allow you to easily connect to various data sources and view them all using a standardized interface. Just provide the appropriate credentials for each data source and you’ll then be able to easily start the ETL process.

Extract, Transform, Load

Extracting the data is the first step in aggregating data sources. Back in the day, data engineers only had to contend with moving data around between internal databases. Today, many organizations have their data spread out across various cloud computing environments. That’s why Fivetran supports over 150 connectors that let you export data from 100s of vendors. Facebook Ads, Google Analytics, Shopify, Google Drive, Oracle, FTP, and Survey Monkey are just some of the data sources easily accessed and manipulated through Fivetran’s connectors. Just imagine how tough it would be to build automated scripts to extract data from a dozen vendors, each with their own rules surrounding permissions, data residency, and usage limitations. Increasingly, an organization’s most important data is stored in the cloud, and Fivetran provides a tool to access all that data and aggregate it in your big data warehouses/lakes/ponds/lakehouses

Fivetran has established partnerships with each data source vendor so they can automatically normalize your data and prepare it for analysis. In layman’s terms, this involves ironing out the differences between how each vendor handles similar types of data. Here’s one of those graphics that tells you something and nothing all at the same time.

chart showing Once the data is extracted, and before it's loaded into the destination database, data engineers often need to do some data transformation tasks. Credit: Someka
Credit: Someka

Once the data is extracted, and before it’s loaded into the destination database, data engineers often need to do some data transformation tasks.

From ETL to ELT

The process of moving data from a source database to a data warehouse involves cleaning up the data a bit. A simple example might be inputting a series of customer phone numbers and appending the appropriate country code to each. Typically, this used to happen in a staging database, and the process involved three environments:

  • SOURCE DATABASE: Extract the data
  • STAGING DATABASE: Transform the data
  • TARGET DATABASE: Load the data

That acronym has now changed to “extract, load, transform” or ELT which is a groundbreaking new method which involves switching two words and eliminating the need for a staging database. The data transformation simply takes place on the target database. We’re completely oversimplifying this transformational technology, but it translates to higher speeds, lower maintenance, and quicker loading. (More info on ETL vs. ELT at this link.)

The usefulness of Fivetran can be found in the high caliber of customers they’ve managed to land. Over 2,400 organizations use Fivetran, names like Asics, Conagra, Square, and DocuSign. The latter is one of the tech stocks we’re invested in, and we’re glad to read how Fivetran is helping them make better decisions.

A DocuSign Success Story

DocuSign found that its SQL Server database was falling short of its needs, and decided to move to Snowflake. Inputting data into Snowflake was a labor-intensive process, so they took a look at Fivetran which easily connected to Snowflake after a 20-minute call. Using data connectors, DocuSign was able to connect their six existing data sources along with an additional 12 data sources.

The connection between a database and a data warehouse is often referred to as a pipeline. Each data source needs at least one pipeline, and it would take a highly paid engineer anywhere from three to six months to build out a data pipeline, then up to 20 hours a week afterwards to keep things running. Fivetran automates this work – truly doing more with less. All that extra data DocuSign brought into their data warehouse led to more business intelligence (BI) use cases. In any given month, 80% of people in DocuSign touch one of the 100+ BI dashboards made possible by Fivetran.

The Competition

We always talk about the importance of investing in leaders. One way to gauge who is leading in a particular domain is by having our lowly paid MBAs examine what highly paid MBAs think. The Gartner Magic Quadrant for Data Integration Tools shows plenty of formidable competition.

The Gartner Magic Quadrant for Data Integration Tools shows plenty of formidable competition for Fivetran
Credit: Gartner

The red “x” shows where Fivetran was located last year, so they’re clearly on the warpath towards entering the “leaders” quadrant. We’ve circled Fivetran and HVR as those two companies are now one and the same. Just days ago, Fivetran acquired HVR for $700 million in stock and cash.

About HVR

Click for company website

Founded in 2012, San Francisco startup HVR raised $51 million in disclosed funding to develop “real-time data replication software designed for enterprises.” In order to understand this value proposition, we need to know a bit more about how development teams operate. Most dev teams have three primary environments they work with – production, staging, development. The most recent data is always found in production, and that’s what developers prefer to work with. However, syncing your production databases with your staging/development environments can be time consuming and taxing on your servers. Here are some commonly used methods for keeping databases synced:

  • DATE_MODIFIED – Just look for everything with a timestamp greater than the last time you checked
  • Diff – Do a “diff” which compares two databases and tells you the differences
  • Trigger – set up triggers to automatically make updates every time something changes

These methods all require intrusion into production databases and consume computing resources you can’t afford to spare. A more modern method of keeping databases in sync is called log-based Change Data Capture (CDC). Every database includes a log of changes that can be used to restore itself if everything goes pear-shaped. However, transaction logs are difficult to decipher because there are no documented standards on how the changes are stored, and not all database vendors provide easy access to their logs. If they do, it’s often bulky and resource intensive. That’s where HVR established relationships with key vendors so that they could perform log-based CDC and replicate data in real-time with minimal impact.

HVR has solved a simple yet pervasive problem – being able to replicate databases of all kinds in all possible scenarios, including systems with extremely high transaction volumes. Being acquired by Fivetran for $700 million is a great vote of confidence for the future of HVR’s technology which is used by more than 420 customers. There are likely to be plenty of synergies between these two companies where solutions can be cross-sold and various functions consolidated. They also both generate revenue from usage-based pricing models.

SaaS vs. Usage-Based Pricing

We can only deduce that Fivetran offers usage-based pricing based on what their pricing page says. They may have an enterprise software-as-a-service offering, but we don’t know that for sure. A Software-as-aService (SaaS) business model is more desirable than usage-based pricing.

The market places a premium on SaaS businesses because they provide consistent and diversified revenue streams that are easy to measure and monitor. SaaS businesses have what technical analysts call “support.” If a SaaS company’s stock price ever drops below a certain level, the valuation will start to attract private equity firms. We’ve seen this happen recently with Blue Prism and Alteryx. We love buying beat-down SaaS stocks for this reason. The difference between SaaS-based pricing and usage-based pricing is that the former is usually contractually obligated to pay you money (usually 2-3 years contracts) while the latter pays you money based on how much they use your product.

For usage-based business models to have the same appeal as SaaS, they cannot offer “nice to have” value propositions. They need to perform mission-critical functions and save companies enough money that usage will remain strong in any economic climate. Fivetran has built a tool that organizations will use regardless of what the economy does. That makes it a very compelling business, even without being SaaS.

Conclusion

It’s hard to appreciate the value on offer here if you’ve never had to create and maintain your own ETL processes in Informatica while some arrogant prick in the C-Suite changes his mind every five minutes. The ability to provide custom analytics on demand – quickly – from dozens of data sources is the new normal for data engineers. It’s hard to see that happening without an ELT tool like Fivetran.

Share

Leave a Reply

Your email address will not be published.