6 Privacy Solutions for Big Data and Machine Learning

Travelers who wander the banana pancake trail through Southeast Asia will all get roughly the same experience. They’ll eat crummy food on one of fifty boats floating around Ha Long Bay, then head up to the highlands of Sa Pa for a faux cultural experience with hill tribes that grow dreadful cannabis. After that, it’s on to Laos to float the river in Vang Vieng while smashed on opium tea. Eventually, you’ll see someone wearing a t-shirt with the classic slogan – “same same, but different.”

The origins of this phrase surround the Southeast Asian vendors who often respond to queries about the authenticity of fake goods they’re selling with “same same, but different.” It’s a phrase that appropriately describes how the technology world loves to spin things as fresh and new when they’ve hardly changed at all. Look no further than startup Panoply, “a five-year-old, San Francisco-based platform that makes it easier for businesses to set up a data warehouse and analyze that data with standard SQL queries.” So, we’re back to Kimball vs. Inmon again. (Rolls eyes.)

In our recent piece on 9 Technology Trends You Should Know For 2021, we talked about something that’s actually different – the notion of “privacy-enhancing computation,” which lets organizations safely share data in untrusted environments. Two computing technology concepts that you’ll hear used in this context are federated learning and homomorphic encryption.

What is Homomorphic Encryption?

Artificial intelligence algorithms – or machine learning algorithms – are only as good as the big data you feed them. Companies with exclusive access to large proprietary datasets have a competitive advantage because they can extract valuable insights from that data. Now, imagine if the data you want to use falls under the growing list of global privacy and data regulations like CCPA, GDPR, HIPAA, BSA, CYA, etc. You’ll then need to convince the stiff collars in compliance that your “citizen developers” need access to it. Think about how much absolute tripe you’ll have to deal with from Mordac, The Preventer of Information Services.

Source: The remarkably good looking, intelligent, funny man who doesn’t age and never sends us cease and desists when we use his comic strips, Scott Adams

Often referred to as “the Holy Grail of cryptography,” homomorphic encryption makes data privacy concerns a non-issue for development teams. Says Gartner, “homomorphic encryption enables businesses to share data without compromising privacy.” Simply put, it acts like a firewall between the actual data and your developers by generating a representative data set which consists of synthetic data. While these methods have been around for a while, they’re only now becoming fast enough to be viable. Today, we’ll look at five startups working on variants of the homomorphic encryption theme.

6 Privacy Solutions for Big Data and Machine Learning

Our first startup believes their competitive advantage is speed, and their pedigree makes that very believable.

Duality – Faster is Better

Click for company website

Those who complain about a lack of women engineers rarely question why women’s magazines often feature celebrities who can’t speak in complete sentences instead of accomplished women like Shafi Goldwasser. She’s a computer scientist whose long list of accomplishments includes a Turing Award in 2012 for pioneering new methods for efficient verification of mathematical proofs in complexity theory. Four years later, she co-founded New Joisey startup Duality Technologies which has taken in $20 million in funding so far from investors that include Intel (INTC) and media giant Hearst.

All that money is being used to build the Duality SecurePlus™ platform which encrypts sensitive data and the machine learning algorithms that learn from it. With applications in financial services, healthcare, and telecommunications, Duality landed a contract with DARPA this summer to use the platform for researching genomic susceptibility to severe COVID-19 symptoms, something they could do 30X faster than alternative solutions. They’re also working with Canada’s Scotiabank to help banks join forces to fight money laundering and financial crime by sharing information without exposing sensitive data. 

Datafleets and Synthetic Data

Click for company website

Founded in 2018, San Francisco startup DataFleets has taken in $4.5 million in disclosed funding which all came in the form of a seed round that closed last week with investors that include LG Electronics and Mark Cuban. That money is being spent by some Stanford dropouts to build a platform that lets developers conduct extract-transfer-load (ETL) operations, business analytics, and machine learning without ever seeing raw row-level data.

Upon connection to a dataset, DataFleets automatedly generates synthetic data that is structurally representative of the underlying plaintext. No individual’s data can be “reverse-engineered” from statistical queries or machine learning, and analytics themselves are always run on the raw data. All the typical use cases are in scope such as fraud analytics, crossing Chinese walls, try-before-you-buy data, and medical image sharing across institutions while adhering to medical privacy rules.

Cosmian’s Cyphercompute

Click for company website

Founded in 2018, French startup Cosmian has taken in around $1.6 million in funding from a bunch of French guys you’ve never heard of. They’ve built a platform, Cyphercompute, that encrypts confidential data such that it stays encrypted during processing and never needs to be revealed in clear text. Any company that wants to extract insights from sensitive and confidential data would be a potential client for Cosmian.

Enveil – Protecting Data in Use

Click for company website

Founded in 2016, Baltimore startup Enveil has taken in $15 million in disclosed funding from investors that include Bloomberg, Capital One, Thomson Reuters, and Mastercard. Enveil’s ZeroReveal® solutions protect data while it’s being used or processed, what they refer to as “data in use.”

Source: Enveil

Founded by U.S. Intelligence Community alumni, they’re the only company certified to provide nation-state level security in the processing layer. The solution is market-ready, scalable, and can integrate without any required changes to existing database and storage technologies.

Inpher – Secret Computing

Click for company website

Founded in 2015, New Yawk startup Inpher has taken in $14 million in disclosed funding from investors that include JP Morgan Chase, the lead investor in their last round – a Series A of $10 million raised several years ago. Some of the world’s largest financial services, technology, and manufacturing companies are using Inpher’s Secret Computing platform for a variety of use cases, many of which the company details on their website.

One good example is healthcare where they’re enabling clinical trials researchers with secure access to distributed, private electronic health record (EHR) repositories for improved patient selection and matching while maintaining privacy and compliance.

Source: Inpher

Today, customers are using Secret Computing® to better detect financial fraud, aggregate model features across private datasets, better predict heart disease, and much more.

Fortanix and Runtime Encryption

Click for company website

Last but not least is a startup that offered the first commercially available runtime encryption back in 2017 using Intel® SGX (a set of security-related instruction codes that are built into some modern Intel central processing units). Founded in 2016, Silicon Valley startup Fortanix has taken in $31 million in funding from investors that include Intel whose technology they’re using to provide a hardware foundation that encrypts sensitive data as it’s being processed. The company provides solutions for confidential computing, encryption, key management, secrets management, tokenization, and hardware security modules. They’ve partnered with VMware to enable cloud service providers to deliver data security as a service, and also appear to be sidling up to Microsoft as well.


The six startups we’ve discussed in today’s article are hardly the only companies working on data privacy solutions for big data and machine learning. Large companies like IBM (IBM) are dabbling in this too, and it was IBM scientist Craig Gentry who first created a working instance of homomorphic encryption. We’ve learned not to expect much from IBM, but homomorphic encryption sounds like the perfect solution for securing a hybrid cloud environment.

The real value in homomorphic encryption is that it unlocks value in all the datasets that were previously inaccessible due to data privacy reasons. It also helps many a CTO sleep well at night with the understanding that there are far fewer ways for a data breach to happen, the ultimate CLM for a CTO. Soon, it may just become a commonly accepted standard for ensuring sensitive data is sufficiently protected.


Leave a Reply

Your email address will not be published.