My twins each did a 23andMe DNA test and the results verified that they are identical; however, their ancestry composition is not!
How is that possible if their genomes are identical?
-A curious adult from New Jersey
I can see why you’re confused. The same DNA should give the same ancestry results. And yet they haven’t.
This does not mean these companies are doing shoddy work. They aren’t—they are doing outstanding, cutting edge science that brings DNA testing to many, many people.
It is just that the analysis is complicated enough that it is incredibly difficult to get an exact result. There is some wiggle room.
So don’t take percentages as exact numbers. As we talk about in a previous answer, they can vary by 10 or even 20% pretty easily and sometimes even a bit more. So two people might have the same amount of East European even if a DNA test says one has 20% and the other person has 40%.
But even when the percentages are small, this kind of thing is a bit more unsettling when you get different results from the exact same DNA. These differences mostly come from how the computer algorithm splits up the DNA into thousands of windows, analyzing one window at a time. And how blank spots (or “no calls”) in the data affect how the DNA is interpreted and/or split up into those pieces.
These no calls are an inevitable consequence of any test like the ones these companies run.
23andMe reports that on average they get good reads on more than 98% of the markers they look at which is really very good. But it still means that each test has thousands and thousands of spots that could not be read.
And importantly, the same markers do not come up as no calls in different tests. Each time someone’s DNA is read, you can end up with a different 2% being uninterpretable.
These spots on the DNA that can’t be read in a particular assay can tip the ancestry scales one way or another. What might look English with a smattering of these missed reads, looks German with a different set of no calls.
So in the first case a piece of one identical twins’ DNA might look English while the same or an overlapping piece of DNA will look German with the second identical twin. It wouldn’t take too many differences like this to shift enough DNA to make the two not look identical from an ancestry point of view.
All of this is true even though when looked at with another test, the relatedness one, it is obvious these two people are identical twins. Even with the no calls the companies can see that the two have the same DNA.
These companies use what is called reference DNA to figure out where your DNA came from. This reference DNA is the DNA of people whose families have stayed in a part of the world for a long time. Or who have very detailed family trees.
The companies then compare your DNA to the DNA landmarks they found in these families. The parts that match that group of Germans is German, the parts that match that group of people from the British Isles is English and so on.
Sounds easy enough but most people are not like the reference group. They have lots of different ancestral DNA scattered in chunks throughout their DNA. This is where things can get tricky.
To get around this mixed DNA, the computer program analyzes small sections of the DNA at a time. These “windows” have around 100 or so DNA markers in them which translates to thousands of such windows.
The no calls we talked about earlier can affect it so a window can be interpreted in different ways. If by chance the program interprets it as German for twin 1 and English for twin 2, then the two will appear to be a little bit different.
English or German?
Let’s imagine that we are trying to distinguish whether a phrase is German or English but this is all we can read:
The only full word we have is die. Unfortunately die is a word in both English and German. (In English it means to stop living and in German it is a definite article.)
This same sort of thing can happen with DNA. If the no calls happen in the right spots it can be hard to tell the ancestry of that DNA.
One way the program can deal with this is by looking at the windows around it. For example, if a bunch of windows to the right and left are German, then it will call our uninterpretable window German too. And if there are a lot of English windows around, it will call it English.
Imagine this result:
In this case the program would conclude that the unknown window is German because of what is around it. This is called smoothing.
Now imagine that it looks like this:
With German windows on one side and English on the other, it has to make its call without any helpful context. Let’s say this is Twin 1’s result and that the program says it is English. Here now is Twin 1:
Now let’s say Twin 2’s read gives this:
Even though this has the same number of no calls, 6, we can see it is probably German because Katze is the German word for cat. So Twin 2 would end up being this:
Now the twins have different ancestries with the same DNA. Twin 1 is 4/7 English and 3/7 German over this stretch while Twin 2 is 3/7 English and 4/7 German.
And the differences can become larger if nearby windows are also hard to identify because of no calls. Let’s say that the third window can’t be read because of these no calls:
Because with Twin 2 this window is in the middle of a German stretch, it will be called German. The same may not be true for Twin 1. It could get called English or German.
If the window is called English, we now have an even bigger difference between the twins. And this kind of thing can go on, here and there, across the thousands of windows in our DNA.
One Piece of the Puzzle
What I just described is certainly part of the reason for why the same DNA can end up with different results. But it is by no means the only way it can happen.
Analyzing DNA for ancestry is very complicated for lots of reasons (check out this outstanding blog from 23andMe to get a feel for what they are up against). It is technically very challenging.
Still, if there were a way to reduce the number of no calls it might at least make the results more consistent with the same DNA. The problem is that to reduce the no calls, you’d almost certainly need to spend more money for your test. Potentially a lot more money.
So you have the push and pull of affordability versus absolute consistency. Well, not absolute as that would be impossible but near absolute.
All of this shows why ancestry DNA results are really just one part of your genealogy puzzle. The part of the test that identifies relatives who share DNA can provide more of the pieces to fill in your puzzle. And of course good old fashioned family tree building using a variety of public documents is an important part too.
All three together can give you a more complete picture of your ancestry compared to just one or two of the others.
For those interested, here is the actual data:
By Dr. D. Barry Starr, Stanford University