Ancestry unzipped

One of the incredible benefits of the open source approach to genomics is that non-practitioners have a chance to see how interpretations are built. Sometimes it's a real "warts and all" picture of science, as statistical and historical details come into conflict with each other.

The group at Genomes Unzipped includes a group of forward-thinking geneticists and related professionals who have made their 23andMe genotype data public. Soon after their data release, some other folks went to work on the data. Dienekes Pontikos applied ancestry prediction algorithm, finding that the Genomes Unzipped authors were, no surprise, mostly European -- but two of them were predicted to have a high component of Ashkenazi Jewish ancestry.

Genomes Unzipped participant Joe Pickrell was surprised to discover he might have a high fraction of ancestry tracing back to Ashkenazi Jews. So he did some investigation of his own:

Several hours after we released our data, however, I was pointed to a post where Dienekes Pontikos wrote about the results of running all our data through his ancestry prediction program. While just about everyone was quite confidently predicted to be almost entirely of northwestern European descent, this analysis gave me a point estimate of 20% Ashkenazi Jewish ancestry. Within hours, several people had asked me about this, and I had no real response. So I decided to take a look at the data myself; some basic analyses are below.

The post is a great summary of some basic methods, including the strengths and weaknesses of the assumptions that underlie them.

I have found over the last several years that this "surprised to discover" reaction is very common among people who have ancestry testing or other genotyping done. Sometimes the surprises end up being well supported by other historical evidence, of which the subject may not have been aware. But more often, the "surprising genealogy" is just an artificial result of applying erroneous or simplified assumptions in the course of the analysis. I think it is tremendously important to write up case studies where the process leading to a result is explicated, where the sensitivity of the analysis to various assumptions can be probed.