Managing Medical Imaging Big Data for Research

We do lots of research here at Nanalyze, and over the years we’ve learned that market size estimates are largely useless. This quickly becomes apparent when you go gather every single estimate out there and compare the numbers. Essentially, what you get are a bunch of MBAs who all use different methods to arrive at different numbers. When you start looking at “market forecasts,” the problem gets even worse. That’s why we can’t really say how big the medical imaging market. Instead, we’ll just use our own number. $50 billion. It’s a nice round number that sounds big and is easy to remember.

Joking aside, the medical imaging market is pretty darn big. There are MRIs, x-rays, CT scans, and ultrasounds to name a few. We recently visited the Stanford campus to attend a medical imaging conference where a bunch of people who are way smarter than us presented some really cool stuff in a way that your average layperson could easily understand. The person responsible for putting this all together, Joyce Eileen Farrell, spoke to us about how she believes that a big problem today in the research community is the lack of structure around “medical imaging big data.”

Medical Imaging Big Data

When you receive a medical imaging procedure, it involves some sort of scanning process followed by a visit from the radiologist who can then interpret the medical images. They might be stored for any length of time, something that has a cost associated with it. Along with the image, there’s an accompanying report that describes what conclusion was reached by the medical professional – usually a radiologist – who evaluated the image. The image and the accompanying report collectively provide greater value than just the image since that interpretation can be used to train machine learning algorithms to spot problems in medical images.

Now think about this. Every research institution across the globe approaches medical imaging in a different manner. Different machines, different storage methods, different policies for how long the data is stored, different ways to attach the accompanying interpretation, these are all differences that make it very difficult to combine medical imaging big data sets. We haven’t even started talking about patient privacy issues which are an increasing concern with things like HIPAA. It’s a problem that a startup called Flywheel is trying to solve.

About Flywheel

Click for company websiteFounded in 2015, Minneapolis startup Flywheel has taken in around $8 million in funding to build “a leading medical imaging informatics platform for researchers that’s transforming the way research is conducted in academia, clinics, and the pharma industry.”

Update 02/02/2021: Flywheel has raised $15 million in Series B funding to deepen its presence in the life sciences and AI markets and accelerate product development. This brings the company’s total funding to $23.5 million to date.

The idea is that researchers shouldn’t have to concern themselves with IT matters. When this topic was raised during the conference, many attendees nodded agreeably as it was mentioned that some startups managed to blow through all their cash simply trying to handle the IT aspects of medical imaging, something they never actually managed to get past. Flywheel’s solution provides an easy-to-use workflow that facilitates the sharing of data as needed for research activities.

Flywheel chose to use Google’s Cloud Health infrastructure to host their platform as it excels in data storage, computational power, and security. Private health information is strictly regulated, and Google Cloud and Flywheel are fully compliant with federal and institutional requirements to anonymize and safeguard that information in long-term archives. We sat down to talk with Travis Richardson, CEO and President of Flywheel, to talk about his company’s platform.

Medical Imaging Big Data for Research

Most of us probably think about medical imaging in the context of our own personal experiences getting medical imaging procedures performed at a hospital or imaging center. These single-patient use cases are quite simplistic compared to medical imaging that’s performed for research purposes. The below slide shows the three primary research-centric applications for the Flywheel platform – research centers, clinical research, and life sciences.

In research centers, medical imaging is performed on “research subjects” as opposed to “patients,” and the initial part of the imaging process is generally the same. A research subject undergoes a scanning procedure for an hour or so using a machine that’s used by a whole slew of researchers who are all working on different projects. Afterwards, it takes about 15-20 minutes to copy the data off to a USB drive, DVD, or CD so the researcher can then walk back to their desk and find a place for it on the server. The result is an ad-hoc collection of scans which contain many files, folder structures, and formats for each data set. Other steps then take place, such as visually inspecting the data, running quality control checks, running scripts, and converting formats, all of which take up valuable time for the researcher. Enter Flywheel.

As the patient is being scanned, Flywheel captures the data directly from the machine in raw format. The data is then de-identified for privacy concerns and routed to a secure project workspace which allows for collaboration and data sharing. The workflow can be configured so that all routine tasks can be automated. The data is indexed for search, and every action taken on the data is documented for traceability. Because the raw data is being stored, researchers can always return to ground zero if need be to see what actually came out of the imaging machine. Mr. Richardson talked about how machine learning algorithms aren’t just being used to look at the visual depictions of medical images but also the raw data. This increase in granularity results in more accurate diagnoses. Much of the demand for a solution like Flywheel’s comes from the increasing usage of machine learning to analyze medical imagery.

Machine Learning for Medical Imaging

In a recent article, we talked about some of the latest advances in medical imaging such as “x-rays for all,” a Stanford project that will allow anyone to upload a medical image so that it can be analyzed by a machine learning algorithm that’s at least as good as a radiologist. These projects are a result of research institutions that can access large sets of medical imaging big data and use them to train machine learning algorithms. The research institutions that Flywheel works with aren’t interested in monetizing their data, they’re interested in sharing it so that medical advances can be made across multiple projects. That ability to share data has been historically difficult, and the data being shared also needs to be annotated so that the machine learning algorithms know what they’re looking at. Flywheel helps with curation to ensure data quality and help with labeling and annotation workflows — critical preparation before training models. It also provides the horsepower needed to run the models on demand.

Medical imaging data is unique in that the data sets can be massive. Take electrophysiology data for example (this when we look at the electrical activity of the heart to find where an arrhythmia – abnormal heartbeat – is coming from). Each measurement section takes up about 200GB. Things are changing though, and we’re moving towards techniques that achieve higher resolutions that result in up to one terabyte per measurement section. It’s impossible to imagine trying to do this sort of research work at scale without unlimited data storage and computing power on-demand.

Even if the world’s leading research institutions could afford to purchase the hardware needed to perform resource-intensive computing tasks, they certainly can’t afford to have it sit around idle. Flywheel helps by managing these large computing workloads. For example, brain segmentation tools like FreeSurfer can easily have a few hundred sessions that require analyses that each take 15-20 hours to run. Flywheel automates the processing and can elastically scale up the cloud resources necessary to run the jobs in parallel, thus saving a significant amount of time.  This type of automation helps with productivity as well as quality and consistency.

Flywheel Adopters

Early adopters of Flywheel’s platform include research universities like Stanford which spends a significant amount of money on compute and storage resources to process ever-increasing amounts of data. The Stanford Center for Cognitive and Neurobiological Imaging (CNI) is a shared facility that provides imaging and compute resources to more than 40 research labs. Stanford needed a cloud platform that could handle increasingly complex data computational workloads as well as support HIPAA compliance. That’s why they chose to use Flywheel, and other research institutions have followed. In 2017, Columbia University’s Magnetic Resonance Research Center partnered with Flywheel to connect five New Yawk City biomedical imaging facilities to share data among researchers. Today, nearly 30 top research institutions are using Flywheel’s solution to manage their medical imaging big data to make research happen faster.


Flywheel is in a unique leadership position because their biggest competitor so far has been grant-funded tools that people try to repurpose as platforms. That’s like trying to get a round peg to fit a square hole. Flywheel’s solution has been designed to accept data from all kinds of sources – from the imaging machines to large databases of images stored on servers – and then present that data in a standardized manner so that researchers can access it as needed. It’s solutions like these that will help accelerate the adoption of machine learning for analyzing medical imagery to improve the quality of healthcare for people around the globe.


Leave a Reply

Your email address will not be published.