I'm attending a symposium on genetics and genealogy of the African Diaspora this morning. Fatimah Jackson is here giving a very interesting talk about her genetic work in Africa and African-Americans, and in particular her idea of "ethnogenetic layering" (Jackson 2008), which is basically a strategy for describing the fine-scale makeup of present-day populations by examining their genetic ancestry from different regions of the Old World.
Part of her research has involved characterizing the regional distribution of mtDNA haplotypes within African populations. She shared some newer data with us, but I thought it worth pointing people to an earlier publication by Bert Ely, Jackson and others (2006), which gave rise to some strong insights about the poverty of current sampling of African populations.
The study reports on a sample of 3725 mtDNA sequences (HVS-I) from a diversity of sub-Saharan African populations. That's quite a massive sample of sequences, certainly on the scale that had been available earlier. It is substantially more numerous than
When a sample of 74 Gullah/Geechee mtDNA sequences were compared with the sub-Saharan database, approximately half of the mtDNAs were identical to two or more mtDNAs in the database and only seven mtDNAs matched mtDNAs from a single ethnic group. The remaining 28 mtDNAs were not identical to any sequence in the expanded database.
Similar results were obtained when the 97 African-American AFDIL mtDNAs were compared with the databases. Approximately half (49) of the mtDNAs were identical to multiple sequences in the original database. As with the Gullah/Geechee sample, fewer than 10% of the sequences matched a sequence from a single ethnic group, and 40% of the sequences did not have any perfect match in the database (Ely et al. 2006:3).
There are two aspects worth noting in those results. On the one hand, the common haplotypes -- the ones that the African-American samples were likely to have a match to -- were not regionally specific within Africa. They are shared by many ethnic groups, distributed across the continent.
On the other hand, 40% of the African-American sequences have no match among the nearly 4000 sequences taken from continental Africa. That's astounding to me, just from the standpoint of sampling. Most of the common haplotypes will emerge within a relatively small sample, so to find something you haven't already seen, you have to sample disproportionately more -- in fact, exponentially more -- individuals. You can just imagine how many tens or hundreds of thousands of sequences you would have to gather to have an adequate representation of African mtDNA for this purpose -- the purpose of finding matches for a large fraction (say, more than 90 percent) of African-American mtDNA haplotypes that originated in Africa (there are of course a substantial fraction whose recent maternal ancestry originated somewhere else).
One of the features of the symposium is a discussion of the relevance of ancestry testing. Jackson is an expert in this field and well-recognized -- she appeared in several of the "African-American Lives" episodes, for example.
With several companies and organizations now offering various kinds of ancestry tests, these have become increasingly affordable. But the results are often confusing; people don't know how to interpret them. Some of that confusion was evidenced in questions here at the symposium -- as part of a year-long discussion group, several local people submitted cheek swabs for ancestry interpretation. The results are often poor, because the sampling of recent populations is inadequate to really answer many questions. Where were today's populations 300 years ago? Have we adequately sampled the variation of present populations.
Research like Jackson's has shown that even widespread and numerous samples provide a real poverty of information about mtDNA diversity. The situation is vastly worse if we turn to autosomal variation, because the samples are smaller and more scattered.
Of course, for many anthropological purposes, the samples we have today are tremendously useful. My work on recent selection, for example, has made leaps and bounds on samples of a few hundred individuals.
But the converse case -- you take a person and ask whether you can diagnose their origin -- that task requires much larger samples to gain any statistical confidence in the general case. There may be specific haplotypes that are highly specific as to their present distribution -- but then, all of those are rare haplotypes, and you have to be lucky enough to have it within the comparative sample that the organization or company has gathered.
I'm still listening here and some of the later presentations will touch on the issues of genetic ancestry testing more directly. But I thought I would share a quote I really liked, with which Jackson ended her comments:
I'm not against genetic ancestry testing. It's fun. But in the final analysis, you have to look in the mirror, and you decide who you are.
Skip Gates discovers that genetic tests don't mean what he thought they meant.
Anne Wojcicki from 23andMe comments on genomics and race
Unintended consequences of genetic ancestry tests
Ely B, Wilson JL, Jackson F, Jackson BA. 2006. African-American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups. BMC Biology 4:34. doi:10.1186/1741-7007-4-34
Jackson FLC. 2008. Ethnogenetic layering (EL): an alternative to the traditional race model in human variation and health disparity studies. Ann Hum Biol 35:121-144. doi:10.1080/03014460801941752