john hawks weblog

paleoanthropology, genetics and evolution

HGDP

  • How widespread is Denisovan ancestry today?

    Tue, 2011-11-01 00:32 -- John Hawks

    Last month, David Reich and colleagues [1] reported on estimates of Denisovan ancestry for island and mainland Asian populations. Their most memorable conclusion was that they could find no substantial sign of Denisovan ancestry anywhere on the Asian mainland, or indeed on any island that had ever been connected by land to Asia.

    The distribution was stark, as illustrated by the map from the paper:

    I wrote about the paper when it was released ("Denisovan DNA in the islands, and an Australian genome"), noting:

    Notice the apparent lack of Denisovan ancestry in anyone who lives anywhere that was once connected by land with mainland Asia. I say "apparent" deliberately: Abi-Rached and colleagues reported last month on the widespread distribution of Denisovan HLA types among today's Asian populations, and those may well be products of Denisovan genes that were later selected. I've already identified a handful of other loci that seem to reflect Denisovan ancestry in mainland Asian people. According to the comparisons by Reich and colleagues, such loci must be exceptions.

    Abi-Rached and colleagues [2] had argued that HLA alleles found in the Denisovan genome are presently common in some parts of Asia, and likely reflect local adaptive introgression. Substantial introgression of a small number of genes would not be enough to create a strong genome-wide appearance of Denisovan ancestry. Still, it was a little odd that the first genes anybody looked closely at would provide strong evidence of introgression.

    Now, Pontus Skoglund and Mattias Jakobsson [3] say that Denisovan ancestry is widespread across China and Southeast Asia.

    That conclusion contradicts Reich and colleagues, so why do the studies come to such different results?

    Skoglund and Jakobsson suggest that they have succeeded in finding introgression where others failed because their model accounts for ascertainment bias in the available datasets. SNP data come from genotyping chips, which have been designed using known polymorphisms. Five years ago, we knew much more about polymorphisms in Europe than other parts of the world, and so the HGDP, and HapMap to a lesser extent, do a good job of sampling rare alleles in Europe but miss many rare alleles in Africa and other populations. This is the ascertainment bias.

    Some of the most obvious signs of introgression today are cases where rare alleles are shared with an archaic genome. If ascertainment bias causes you to miss the rare alleles, you'll miss the introgression.

    But that explanation isn't really sufficient to explain the differences between these papers. For one thing, Reich and colleagues [1] also worked hard to account for ascertainment biases in their SNP samples. For another, whole genome comparisons between East Asian samples and the Denisova genome have not yielded evidence of Denisovan ancestry, even though whole genomes have no ascertainment bias. The number of whole genomes so far compared is very small, and so the statistical ability to detect introgression is lower, but Skoglund and Jakobsson actually replicate that null result in their current paper.

    Probably most important, it's not clear that Skoglund and Jakobsson's result can actually be explained by rare alleles. Here is Figure 1e from their paper:

    Figure 1e from Skoglund and Jakobsson (2011). Original caption: Interpolated spatial distribution of the frequency of Denisova alleles at SNPs where Denisova is different from chimpanzee and Neandertal. Sample localities are indicated with rectangles.

    This map represents a clever comparison. It is a heat map of the mean local frequency of the subset of alleles that are present in Denisova but absent from chimpanzees and Neandertals. These are presumptively derived alleles relative to the chimpanzee. The SNPs here are all known to vary in human populations, because they are all included in the HGDP sample. So the map does not represent all the Denisova derived mutations in humans today, only a particular subset that is especially likely to be informative.

    Given that the sites have been picked in a special way, we need to examine carefully how strong the pattern really is. Notice the scale of the heat map. The difference between the orange area in south China, from the green area in north China, is around 0.001, or a tenth of a percent in mean frequency. The actual values are reported in the online supplement, in Table S3. An exception of Yizu in south China who have around 0.006 more than their neighbors. The Yizu sample includes only 10 individuals (9 males, 1 female). The paper does not report the number of SNPs included in this comparison, but it must be a very small set relative to the total, because only a small fraction of human SNPs are known to be derived in Denisova and ancestral in Neandertals.

    With this very small difference in frequencies, I would not rule out the hypothesis that the zone of high Denisova derived frequencies in south China is caused entirely by frequency enrichment of a small number of loci. A handful of genes like the HLA loci observed by Abi-Rached and colleagues might be enough to create this very slight elevation in the average. Hence, the best case is that the data here simply provide greater sensitivity to small amounts of introgression. The worst case is that the pattern may be dominated by the Yizu sample, which is really too small to carry this kind of load.

    The strongest evidence presented in the paper is a comparison of north and south East Asian regions directly. Although the comparison of south China against other regions of the world (Africa, Europe) does not yield significant evidence of Denisovan similarity in this paper, south China differs from north China in essentially the same way that the Oceanian people do from other regions. And the Oceanian populations (here, Papua New Guinea and Bougainville) differ from other regions because of their Denisovan ancestry. So Skoglund and Jakobsson infer that the north/south comparison reflects Denisovan ancestry as well.

    I think this comparison is sound, and the question is, how much introgression would this pattern require? The paper answers that question in this way:

    Quantitative estimation of the precise fraction of Denisova-related ancestry in Southeast Asian populations based on genotype data are unfortunately sensitive to ascertainment bias and genetic drift, and such estimates will require genome sequence data that are currently unavailable. However, both the PCA results (Fig. 1B) and the approximately six times lower absolute values of the D statistic in tests between Northeast Asians and Southeast Asians compared with tests between Northeast Asians and Oceanians (Table S4) indicate a relatively low fraction of Denisova-related ancestry. Thus, the fraction is likely to be smaller than both the ~5% fraction of Denisova-related ancestry present in Oceanians and the ~2.5% fraction of Neandertal ancestry present in non-Africans (23, 24), perhaps around 1%.

    One percent is an amount that whole genome comparisons at present do not rule out, and I think it's a reasonable guess. I would not have thought we could rule out a one percent contribution from other, non-Denisovan archaic people, for example.

    We aren't very far from a more definitive answer of this question, as the data continue to accumulate every day. What I find interesting is the way that models can generate these 1% differences in ancestry proportions, depending on sampling and the pattern of migration assumed to have happened in the past. Two estimates that differ by less than a percent are not really different. This paper provides the suggestion of a more widespread Denisovan legacy, and I accept that as a possibility.

    I should mention: less than one percent of a half billion people is still a very large number, added to five percent of the indigenous population of New Guinea and Australia, and smaller fractions of other island populations. The total amount of Denisovan legacy present in living people probably exceeds the population of Earth at the time the Denisovans lived.


    References

    Synopsis: 
    A new paper contradicts earlier work, by suggesting a widespead Denisovan legacy in south China
  • Spatial variation and near-fixed selected alleles

    Thu, 2009-06-11 14:39 -- John Hawks

    I couple of people have asked me about a new paper in PLoS Genetics by Graham Coop and colleagues, titled, "The role of geography in human adaptation." The paper is open access, and while the details of genetic measures and simulations can be hard to follow, I think it's a great example of the way recent work on selection and human diversity has been structured.

    I'll just expand on a few of the topics in the paper, and discuss how they relate to the previous findings about the number and age of selected variants in human populations.

    Here's the paper's abstract:

    Various observations argue for a role of adaptation in recent human evolution, including results from genome-wide studies and analyses of selection signals at candidate genes. Here, we use genome-wide SNP data from the HapMap and CEPH-Human Genome Diversity Panel samples to study the geographic distributions of putatively selected alleles at a range of geographic scales. We find that the average allele frequency divergence is highly predictive of the most extreme FST values across the whole genome. On a broad scale, the geographic distribution of putatively selected alleles almost invariably conforms to population clusters identified using randomly chosen genetic markers. Given this structure, there are surprisingly few fixed or nearly fixed differences between human populations. Among the nearly fixed differences that do exist, nearly all are due to fixation events that occurred outside of Africa, and most appear in East Asia. These patterns suggest that selection is often weak enough that neutral processes—especially population history, migration, and drift—exert powerful influences over the fate and geographic distribution of selected alleles.

    The paper looks for "nearly fixed" genetic differences between populations, and finds relatively few of them. That's relatively well-known; the FST-based test has been done on fewer populations with similar results (e.g., Williamson et al. 2007; Barreiro et al. 2008). This paper has the HGDP panel, which includes many more populations, and therefore is able to add geographic resolution to these older results. They find that the geographic distribution of near-fixed alleles is clinal; there aren't strong boundaries delimiting the geographic distributions of most apparently selected alleles. That means that the same demographic forces affecting neutral genetic variation have also affected recently selected alleles.

    Is that surprising? As we pointed out in our 2007 paper, the recent demographic history of human populations has included a lot of population growth. This means that the number of adaptive mutations should have increased during the last 10,000--20,000 years. High-FST selected alleles can only reflect selected mutations that are older than this (old enough to reach near fixation in one population), or are extraordinarily strong. A few mutations are exceptionally strong in their selective advantages -- SLC24A5 and lactase persistence seem to be examples. But as long as adaptive mutations are intrinsically rare, very few of them could have occurred in the small populations of 20,000 years ago or earlier, even if many happened in the large populations of the Holocene. So I think the new paper actually reinforces the interpretation of acceleration. The pattern we're seeing today with new mutations just can't be a feature of human evolution before around 20,000 years ago.

    If selection is affected by demographic processes, does that mean that it is "weak"? Clearly, "weak" is a matter of scale. Adaptive genes disperse through a spatially structured population very slowly, even if they confer very large fitness advantages. That means that their dispersal is highly dependent upon demographic conditions, such as the disproportionate growth of some populations or occasional long-distance gene flow. Locally, an allele may rapidly increase under selection, but that effect may have little influence on the evolution of distant populations.

    We see that pattern with genes known to be under strong selection in humans, like the ones that help some people resist malaria. Sickle cell, hemoglobin C and E, alpha- and beta-thalassemia, ovalocytosis, G6PD deficiency all have restricted geographic ranges that parallel the clinal pattern of neutral genes. There is an important difference: the patterns of these genes diverge in areas where malaria risk changes rapidly with geography (like coastal versus inland areas of Mediterranean Europe), and some of them have wide geographic distributions compared to their young haplotype ages (like sickle cell). But even in the latter cases, most are too rare to elevate the FST of surrounding SNP markers. Malaria adaptations are a tremendous example of the way that demographic conditions limit strong selection.

    Africa versus other populations

    Derived alleles are expected to have lower frequencies on average than ancestral alleles. So if a population has a bias toward higher-frequency derived alleles, that may be evidence against neutral evolution. The paper finds that this bias is greater in non-African populations than within Africa:

    The overall genic enrichment is present in all three population comparisons, and each tail seems to be similarly enriched for high- FST genic SNPs. However, the number of derived alleles in each tail does differ substantially and is biased towards derived alleles outside Africa and especially in east Asia. Thus, the statistical evidence for enrichment of events inside Africa is weaker than for the other two populations (we return to this point later).

    In general, populations outside Africa have a genome-wide bias toward higher frequencies of derived alleles. The causes of that bias aren't clear -- ascertainment may account for some of the bias but cannot account for all of it; it's possible that early demographic events may explain some of the bias but the pattern isn't obvious.

    The FST-based tests of neutrality are most powerful when a new allele has swept several rare mutations with it to near-fixation. Rare mutations tend to be derived ones. So the power of the test depends on how many rare mutations there are to start with, and what their frequencies are in other populations that didn't have the same selected allele.

    It's one of many issues that make finding selection in African populations slightly different from elsewhere. I think that Africans have undergone as much, and very possibly more, selection by new adaptive mutations as other populations. But our 2007 work suggested that the modal age of the selection we ascertain in Africa may be older than in other regions. That would be consistent with demographic history, since Late Pleistocene African populations were larger than others. But it's possible that genome-wide features like faster LD decay, higher heterozygosity, and more ancestral versus derived variants may also influence our estimates of the timing and number of selected alleles in Africa.

    Polygenic adaptation

    Toward the end of the paper, the authors discuss the pattern of local adaptation in a more general sense. Why should there be relatively few near-fixed genetic differences between populations, if human ecological changes suggest that local adaptation should have been a powerful force in our recent evolution? One possibility is acceleration -- most of the variants are too recent to have reached near-fixation in any single population.

    But the authors mention another possible influence that we've also been thinking about: epistatic interactions among new variants. For example, lots of skin pigmentation loci are known to have been under recent selection, but only a couple of them have reached near-fixation in any population. The rest are at lower frequencies. Since these alleles all affect the same phenotype, they're subject to diminishing returns. As one lighter-pigment allele becomes common, it reduces the strength of selection on the others. The population doesn't have to fix for any of them; in fact, selection probably cannot drive more than one or two up to fixation since the rest of them compete with each other.

    Over the very long term, this situation would be sorted out. A handful of loci that optimize skin pigmentation might ultimately go to high frequencies or fixation, for some alleles the costs may exceed the benefits and they will disappear. Others, relatively neutral to each other, may fix by drift. But the "very long term" is a span of hundreds of thousands of generations. Here we're talking about a few hundred generations at most. So human populations aren't anywhere near an optimum, they're in a transient where epistatic interactions may be quite important.

    Greg Cochran and I have been discussing this idea for some time. We call it the "Stooge effect". Think of the Three Stooges all trying to run through a door at the same time and getting stuck in the middle. That's what these genes are doing -- all of them are competing to respond to selection, but each is slowed by the presence of the others.

    It's not a new idea -- Frank Livingstone used to talk about this general concept with different malaria adaptations. What's new is the increasing evidence that humans are really in a transient with a lot of genes out of equilibrium. It's very possible that for some phenotypes, standing variation has been an epistatic block on the selection of new mutations. For others, the emergence of some new mutations has limited the trajectory of selection on others.

    Conclusion

    All in all, I think this paper is a nice contribution to our understanding of the pattern and rate of recent positive selection in human populations. Certainly, the HGDP sample will continue to be a very informative addition to our understanding of spatial dynamics in ancient humans. The addition of the new HapMap v.3 samples may be even more important, because these represent further regions with roughly the same discovery power as the initial three HapMap samples. And of course, we have the 1000 Genomes sample coming up, adding significant potential for discovering rarer selected variants.

    References:

    Coop G, Pickrell JK, Novembre J, Kudaravalli S, Li J, et al. 2009. The Role of Geography in Human Adaptation. PLoS Genet 5(6): e1000500. doi:10.1371/journal.pgen.1000500

  • Overstating the obvious

    Tue, 2009-03-24 01:41 -- John Hawks

    I'm reading this interesting paper by Joseph Pickrell and colleagues, titled, "Signals of recent positive selection in a worldwide sample of human populations". The paper recounts the results of a selection scan in the Human Genome Diversity panel, which was reported in two publications last year. This is an interesting sample because it includes individuals from 53 population samples around the world.

    I was waiting to present any observations about selection from the HGDP set until Pritchard's lab had published on them, since the initial publications had mentioned that this analysis was forthcoming. Now that it's appeared, I'll be pointing to a lot of these data in upcoming posts.

    So I was reading with great interest. Then I found this statement:

    Reports of ubiquitous strong (s = 1-5%) positive selection in the human genome (Hawks et al. 2007) may be considerably overstated (8).

    I'm a little concerned that someone reading that might think that Pickrell and colleagues had actually tested our hypothesis about the number of recent strongly selected alleles. I'm also uncertain about the word, "ubiquitous", which means "everywhere." I mean, does that really sound like the kind of word I would use? It's just begging for trouble. It's like saying there's "ubiquitous" evidence of Neandertal contribution to the later European gene pool. Even if I thought it was true, I wouldn't put it in a paper!

    We reported that roughly seven percent of genes appeared to be selected. Pickrell and colleagues list a rather large number of candidate loci for selection, and don't give any estimate or test of the number genome-wide. I think one might be able to count the regions listed in the data supplement for an estimate of what they thought was important enough to list, but I can't get the supplement yet. Since these candidate loci require 16 supplementary figures to list, maybe there are a lot of them. They do list a subset of more than 110 in the paper itself.

    So what's the basis for saying we overstated anything? They suggest one reason for caution about the interpretation of candidate loci for selection:

    We find that putatively selected haplotypes tend to be shared among geographically close populations. In principle, this could be due to issues of statistical power: broad geographical groupings share a demographic history and thus have similar power profiles. However, strongly selected loci are expected to show geographical patterns largely independent of demography—depending on the relevant selection pressures, they can be highly geographically restricted despite moderate levels of migration, or spread rapidly throughout a species even in the presence of little migration (Nagylaki 1975; Morjan and Rieseberg 2004) (8).

    But wait a minute! If a gene were selected strongly and still polymorphic in human populations, it shouldn't be very old. So it can't have spread rapidly throughout the human species even in the presence of little migration. There hasn't been any time for this kind of spread.

    To give a little mathematical perspective, one common way of modeling the dispersal of an advantageous gene is the Fisher diffusion wave model. In a Fisher wave, the gene grows logistically at any single point in space, and the allele frequencies form a standing wave that travels through space at a constant velocity. That velocity in a population uniform across 2-dimensional space is σ times the square root of s, where s is the selection coefficient and σ the root mean square dispersal distance -- basically, the average distance a person moves between his birth and the birth of his children.

    If we want to know about dispersal of selected genes in early agriculturalists, we will need to know how far they move -- that's generally less than 10 km on average. So a gene selected strongly with a 5 percent advantage should move around 2.2 km/generation. Over the 400 generations since the beginning of agriculture, we'd expect a new allele to have dispersed across an area with a radius of less than 1000 km.

    So in other words, it's just implausible that a selected allele would have a geographic distribution very different from drift, at least under the Fisher wave model. But obviously, some alleles have gone a lot farther than 1000 km in the last 10,000 years. Humans don't disperse strictly according to a Gaussian distribution, as assumed by the Fisher model; they sometimes disperse long distances. This can have a large impact on the spread of an advantageous allele. But it is an irregular phenomenon -- a stochastic event.

    Let's consider the results a bit further. Here's a passage from page 1:

    We find extensive sharing of putative selection signals between genetically similar populations, and limited sharing between genetically distant ones. In particular, Europe, the Middle East, and Central Asia show strikingly similar patterns of putative selection signals.

    Which is exactly what we would predict from the history of these populations. Most signals of selection in Europe are Neolithic in date. The Neolithic was not only a time of massive population growth, but also the time of greatest mismatch between the human population and its novel agricultural environment. The dispersal of Neolithic lifeways from West Asia into Europe, and the recurrent incursions of Central Asian languages westward across the steppe into Europe and southward into the Indian subcontinent are the major features of the last 10,000 years of history in those regions. Don't we expect them to share a lot of selection? And if it took the massive migrations and interactions in those regions to generate this shared pattern of selection, shouldn't we expect other regions of the world, which lacked as extensive long-distance movements, to share fewer?

    In this case, the critical information for evaluating the evidence is historical and archaeological. We can't just say that the candidate loci for selection have a similar geographic distribution to those that aren't selected. We need to evaluate the likelihood that they would have some other distribution. That likelihood is very low for most instances of selection, but may be high for a fraction of cases, or for some regions where long-distance dispersal was a more important aspect of population history.

    So if we have a locus that is inconsistent with drift on the basis of linkage, we can reject drift. What if the geographic distribution is still consistent with drift? Should we doubt the linkage analysis? I don't see why -- basic biogeography says that most recently selected genes should have similar geographic distributions to drift.

    References:

    Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK. 2009. Signals of recent positive selection in a worldwide sample of human populations. Genome Res (early online) doi: 10.1101/gr.087577.108

Subscribe to HGDP

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.