CODIS – The World’s Largest DNA Database

April 13. 2016. 3 mins read
Table of contents

Last week we wrote a piece on the topic of DNA privacy. When you buy a DNA testing service from any provider such as 23andMe, Family Tree DNA, and Ancestry DNA, you are agreeing to certain privacy rules that dictate what your genetic information can be used for. In some cases, you can opt-in for your genetic information to be “anonymized” and used for various health studies. In all cases, however, the door is left open for your DNA to be made available upon request for various legal reasons. We discussed a case where the FBI “went fishing” in a DNA database to see if they could find any matches and then traumatized some poor individual who had nothing to do with a particular crime.

The kneejerk response here, of course, is “well if you haven’t done anything wrong then you have nothing to worry about”. The problem with that statement is that people don’t necessarily want their DNA profile used so that marketing people can try to sell them things. But what if your insurance company gave you a discount if you disclosed your DNA profile? What if you could live 25 years longer but only if you disclosed your DNA profile?

It seems inevitable that in the future you will be pressured into providing a DNA profile for one reason or another. It also seems that in the future there will be one centralized database which contains all DNA profiles from which you will then “permission” various parties to access your genetic profile for any number of reasons. This is what Illumina wants to do with their Helix venture that has raised $100 million so far. Helix wants to become the genome database that you feel secure about storing your data with and then authorize other vendors to use your data to provide insights. “Sequence once, query often” is the business model being built.

So this begs the question, who has the biggest DNA database right now? 23andMe has about 1.2 million customers. Ancestry DNA was aiming for 1.3 million by the end of 2015 and we can only assume they’ve reached that number by now. The Family Tree DNA database has just over 783,000 records. Assuming there is no overlap between these 3 databases, this means that if all 3 of these companies combined their databases we would have around 3.28 million DNA profiles, or just 1% of the U.S. population. This means that with all the hype around DNA testing companies, they have only managed to achieve a 1% market share combined in the USA alone.

So are these the biggest DNA databases? The answer is no. The U.S. government actually owns the biggest DNA database by far. We can only assume that most readers of this site have managed to keep themselves out of trouble and so they may have never heard of CODIS (Combined DNA Index System) or NDIS (National DNA Index System). In short, CODIS is the name of the FBI’s program of support for managing criminal DNA databases. A key part of CODIS is the database itself, NDIS. This massive database contains 12.2 million offender profiles, 2.6 million arrestee profiles, and 684,000 forensic profiles giving us a grand total of 15.48 million profiles or about 4.85% of the total U.S. population.

So what’s the world’s second-largest DNA database? That belongs to the United Kingdom National DNA Database (NDNAD) which has 4.51 million records with estimated duplicates removed. That means that the U.K. has a DNA profile for 7% of its citizens. One of the reasons this number is so high is because nearly all criminal offenses can be subject to having a DNA sample taken without consent at the time of arrest from anyone arrested in connection with any recordable offense, even if they have not been charged.


We came away with a few takeaways here. Firstly, we’re a long way from having a global DNA database that contains a significant percentage of the population (hence the reason for Illumina’s Helix venture). The second is that we have an awful lot of DNA profiles already that can be used for medical research (similar to what 23andMe is trying to do). Why can’t we just “anonymize” the entire CODIS and NDAD databases and use them both to conduct medical studies? On a side note, we wonder when the FBI will begin using DNA phenotyping to identify suspects (if they aren’t already). If you recall from our previous article on the topic, DNA phenotyping involves taking a DNA and sample and constructing a facial profile from it in the same way a sketch artist would.


Leave a Reply

Your email address will not be published.