The Genographic Project

BBC News story

New Scientist story

Information about the project from the Waitt Family Foundation

National Geographic Genographic Project site

As reported in the news stories above, the Genographic Project is a collaborative research undertaking by the National Geographic Society, the Waitt Family Foundation, IBM, and a number of independent research labs around the world. The goal of the project is to sample over 100,000 individuals from diverse global populations in order to achieve a fine-grained understanding of migrations recent in human prehistory.

MSNBC is reporting that the project is funded to the tune of 40 million dollars. For some perspective, this would fund all the research projects under the Physical Anthropology section of NSF for roughly 15 years.

The project leader is Spencer Wells. Wells is best known for his documentary program, "Journey of Man," in which he trekked around the world to illustrate the evidence for human migrations from the Y chromosome. The most memorable scene is one in which he corners a bewildered man in a Kazakh village to tell him he has a sequence inferred to be ancestral to a billion Asians. The poor man thought that Wells had come to tell him he was going to die. In a nutshell, this is the effect that anthropological genetic work has come to have on indigenous peoples around the world.

The current project is audacious in its scope. It bears clear echoes of an earlier proposed project, called the Human Genome Diversity Project, but with even more extensive sample sizes. If it met all of its goals, the Genographic Project would clarify the course of human movements and population growth since the development of agriculture 10,000 years ago, and possibly even earlier.

Who will do the work?

Here's what Wells says in the FAQ at the Waitt Foundation site:

We assembled a team of top human population geneticists from around the world - 10 principal investigators focusing on indigenous peoples around the world, plus one focusing on ancient DNA, from the USA, Brazil, UK, France, Lebanon, South Africa, Russia, India, China and Australia. They are all experts in their respective fields, very thoughtful scientists, and passionate about the work they do. I'm lucky to be collaborating with them - it's like having the "dream team" of human population genetics.

It is fairly unclear to me just what IBM is going to contribute aside from bioinformatics. I don't know, for example, if they have someone who can model population movement to generate simulated samples to test migration hypotheses against. I think it is unlikely that they have anyone who is modeling natural selection, which is the force most likely to affect the geographic distribution of many human genes, the Y chromosome included.

Should you send in your DNA?

You may be thinking, "It sounds great; I just wish I could participate myself!" Well, if you want to, you can! For only $99.95, you can purchase a Genographic Participation Kit from the National Geographic Store:

With a simple and painless cheek swab you can sample your own DNA. You'll submit the sample through our secure, private, and completely anonymous system, then log on to the project Web site to track your personal results online.
This is not a genealogy test and you won't learn about your great grandparents. You will learn, however, of your deep ancestry, the ancient genetic journeys and physical travels of your distant relatives.

But is it safe? Won't they take your DNA to make tiny clones of you and fetal pig implant organs to support them?

Here is what National Geographic says in their <a href=http://www3.nationalgeographic.com/genographic/faqs_results.html#Q11>FAQ</a> about the DNA sequences sent in by people who purchase their "kits":

We will keep your cheek scraping sample only for the Genographic Project. Your sample will not be used for any other purpose without your written permission. The genetic tests we will perform are designed only to research early human origins and movements. The tests do not tell us anything about your health, or about any health problems you (or your family) may have. This is an anthropological study only. Unless you instruct us otherwise, your cells will be destroyed at the conclusion of the Project....During the project, you will have the opportunity to contact Family Tree DNA, the company licensed to perform testing for Genographic Project participants, to request follow up testing if you choose. Unless you do so before the conclusion of our project, your cells will be destroyed and will not be available for follow up testing.

Sounds like a great marketing opportunity. Which isn't, I think, exactly what anthropologists ought to be promoting.

My advice is, don't send in a swab. Not to say I don't trust the National Geographic Society; in fact I think that this part of the project is pretty innocuous. My main concern is that $100 is a lot to pay for something that is very likely to tell you what you already know: "Gee, I come from Europe, and my ancestors got there 15,000 years ago from the Near East." Or something that is very likely to be patently false: "Gee, I have a sequence found in Greece and also found in one village in northwestern Pakistan. My ancestors must have ridden with Alexander the Great!" Please don't waste your money. It is much more useful to learn about your grandparents.

Of course, the kit does come with the video featuring Spencer Wells. If you're into that kind of thing. He does have a high Q-rating.

If anyone does order these, let me know. I would really like to be able to report on the contents, and especially the kind of results that they send. I am imagining this is only a step removed from those genealogy companies that send you your "family crest" and a story to accompany it, but I would be happy to be proved wrong.

Is this the Human Genome Diversity Project?

The short answer is, "Not exactly, but it comes from some of the same people who brought us that one."

Three things set the project apart, as I understand it. One is the apparent lack of public funding sources. This is in part a function of increased efficiency of genetic research: this can be done much more cheaply today than would have been possible in 1995. Also, the National Geographic Society has done very well for itself recently by pushing projects that generate publicity like this one. In a way, it is a perfect match, since it is literally "geographic" and since it involves the possibility of direct public participation. Of course the lack of public funding means that the project is not subject to public oversight, which places it beyond some of the critics of the HGDP. To me, this is a matter of some concern. The advisory board of the project is chaired by Luca Cavalli-Sforza (Stanford University) who was the main figure behind the HGDP.

Second, there is no guarantee of complete coverage of indigenous peoples. With the HGDP, there was the ostensible goal of sampling intensively among language families and other ways of determining "ancient" groups that were worth sampling. Merritt Ruhlen (Stanford University) was one of the principal linguists advocating this approach for the HGDP; he is on the advisory board of this project, so there may be some attempt to do a similar thing. The New Scientist story says, "Collecting genetic information from relatively isolated populations will be a priority because this will provide the clearest picture of humankind's evolutionary past." Of course it was this kind of logic that got the HGDP in trouble in the first place. At this point, the project does not explicitly describe how such sampling will be done, so I assume that its sample of indigenous people will include mainly those groups who have participated in such research before.

Third, there is the ostensible limit of data acquisition to Y chromosome and mtDNA markers related to migration history. This may also be a consequence of technological change since the mid-1990's. Today, researchers can design "gene chips" to very rapidly type an individual's genome for particular markers of interest, without going through the effort of obtaining an entire sequence. This is very efficient and low in cost, and I would expect that this is the technology they are using on most of the samples, including all those from the DNA kits.

On the other hand, this methodology throws away much of the interesting data on diversity. Going without complete sequencing introduces an ascertainment bias that can make it difficult to determine interesting demographic characteristics about the population. This may make it more challenging to determine whether the population expanded at particular times in the past, for example. These biases may partially be overcome by sequencing thousands of individuals, so there is clearly a strategy at work here. But I would be very surprised if a large subset of the individuals -- the ones for which a larger tissue sample is available, or for whom a cell line has been produced -- were not subjected to sequencing of long genomic regions. I expect there will be microsatellite and SNP data coming out of these samples from other genomic regions in addition to the Y and mtDNA analyses.

With these caveats, the Genographic Project clearly carries on the legacy of the HGDP. This means that we should consider the criticisms of the HGDP to see if they apply to this project. The most important criticism was the human rights issue, and in particular the opinion that the human subjects had not been sufficiently protected. In large part this was because informed consent from members of indigenous groups might never have been possible without undergoing specialized training in genetics and medicine, and because the project therefore depended upon approval from "tribal elders" or other individuals chosen to speak for their groups. This procedure was viewed by many critics as fundamentally outside the normal protections of liberal democracies, and such criticisms were never satisfactorily answered.

But this was far from the only criticism of the HGDP, and there were a number of scientific issues that questioned the fundamental worth of the project. One criticism was the sampling strategy. The idea of capturing DNA from small tribes and linguistic groups that are dwindling in numbers doesn't seem like such a bad one at first glance. But the fact is that the changes in process in such groups are not biological ones, they are cultural ones. For the most part, although there are exceptions, the people are not becoming extinct, nor are their genes. They are just adopting new lifestyles and joining new groups.

This cultural change certainly complicates the attempt to find patterns of ancient history and migrations. But sampling the dwindling small groups speaking languages that are nearly gone probably won't help. These groups themselves were the product of similar movements and group losses in the past. To be sure, some of these movements are precisely those that the Genographic Project is attempting to recover. But sampling groups rather than locations confounds the effects of cultural and geographic factors leading to human variation. A better sampling strategy would be designed around geographic coordinates instead of linguistic ones.

The issue of sampling strategy is related to the separate issue of analytical method. There was never a clear methodology that satisfied critics that the results would be valid. The sampling strategy that promoted groups or language families as fundamental elements of analysis invites a cluster or dendrographic-based analytical method. Since human populations do not fit a tree well, this statistical method is guaranteed to mislead about relationships. But geography-based methods, such as Cavalli-Sforza's famous PC plots of genetic variation over space, also yield misleading results.

Today, most analyses of Y chromosome and mtDNA variation use "founder analysis," which is an attempt to delinate the earliest movement of people to a region based on the most recent common genetic ancestors found in both the region and its presumed source. This kind of analysis is also limited in the information that it can generate, and its results are also subject to challenges. Especially, the method is sensitive to the age and distribution of discrete markers (the very markers that this research is designed to look for), which really cannot be aged very precisely, and which may have different distributions today than at the relevant time in the past.

And of course, there is the overarching assumption of no selection, which for the Y chromosome and mtDNA has become increasingly problematic.

Will companies be profiting from this research?

Apparently for the public kits, there will be no research other than the Y chromosome and mtDNA markers useful for a narrow study of migration history.

For the larger samples acquired as part of the "diversity sampling," no such guarantees have been given. Nor should we expect to see any such guarantees, because these samples include thousands of tissue samples from people that have already been taken with no conditions attached.

The news stories about the Genographic Project say that commercialization is not the goal. For example, New Scientist <a href=http://www.newscientist.com/article.ns?id=dn7260">reports</a>:

In the 1990s, Luca Cavalli-Sforza at Stanford University in California, US, attempted to set up an even more ambitious project called the Human Genome Diversity Project, to map genetic diversity around the world. But it foundered after opposition from groups representing indigenous peoples, who saw it as an attempt by western companies to profit from their genes.
IBM says the Genographic Project is different as no medical studies will be done, and none of the data will be commercialised. An independent advisory board, including indigenous advocate Tammy Williams of Cape York, Australia, will oversee sampling and research.

I think this is very misleading. For the research to be minimally useful, it must include markers beyond those on the Y chromosome and mtDNA. Although it has usually been argued otherwise, the fact is that single loci cannot give good information about migration. The evolution of any single locus is stochastic, so that the overwhelming majority of information that it might hold about migration depends on how it compares to other genetic loci. But even excluding the obvious necessity to look at other areas of the genome, both the Y chromosome and mtDNA are of increasing biomedical interest. Several recent research articles have examined the possibility that normal geographic variants of the mtDNA are associated with increased disease risk. So regardless of the intentions of the project, any public information resulting from the project will be applied in biomedical contexts.

The question really is whether anyone will profit from it. For this, a purely private research enterprise can offer nothing but its word, and that of its advisory board. I don't have any reason to think that these people have any motives to profit directly from biomedical information, but I would rather have some very solid guarantees about the way information will be used. Again, I do not doubt their good intentions, but I would be cautious. At this stage, after the project has just been announced, that information has not yet been provided to the public. The participant with the most to lose is the National Geographic Society, which risks squandering much of the goodwill it has in developing nations if it fails to make explicit how research subjects will be protected.

What unforeseen consequences will this research have?

Of course if I can foresee them, then they are by definition foreseen rather than unforeseen! Nevertheless, there are some consequences that you are not going to hear anyone talking about.

Here's something that won't be reported anywhere but seems fairly obvious. If National Geographic and IBM are working with around ten research centers on a project involving 100,000 genetic samples, then the scale of anthropological genetics has irreversibly changed. Labs that are not now capable of dealing with samples of thousands of individuals fairly quickly are in danger of being shut out of empirical research entirely.

That is not to say that smaller projects are irrelevant. Ancient DNA research will never involve more than a few individuals at a time, and the study of genetic structure within small-scale human societies and primate groups will never involve thousands of samples. But the point is coming, if we have not already reached it, when these small projects will be tackled more easily by graduate students at a large lab than by independent scientists with a small lab. It makes little sense to maintain small labs when the economy of scale on DNA sequencing makes it much cheaper to obtain more data from larger setups. And as DNA data becomes cheaper, more reviewers will expect that things are verified by resequencing; so that most small labs may end up sending things out for this reason anyway. Only if technology ultimately provides small, portable solutions to yield sequences in the field will the balance shift to smaller setups. But then one hardly needs to be a molecular specialist to get DNA data, especially if one is using the same computer programs for analysis anyway.

What we will see with the project is probably not greatly different than we would have seen without it. The findings of the project will appear to confirm some arbitrary number of interesting historical population movements. We have already seen this with the Phoenician research project led by Wells, Brian Sykes' research on early Europeans, the "Genghis Khan" sequence, the Cohanim and Lemba connection, and any number of others. Together, they will create an impressive perception of progress toward understanding human history. Individually, each of them will be a flimsy case based on weak evidence.

But they will make for some interesting National Geographic specials.