john hawks weblog

paleoanthropology, genetics and evolution

spatial dynamics

  • Spatial dispersal, parallel adaptation, and the "Stooge effect"

    Thu, 2010-10-14 00:06 -- John Hawks

    Peter Ralph and Graham Coop have an interesting paper in the current Genetics, titled, "Parallel Adaptation: One or Many Waves of Advance of an Advantageous Allele?" [1]

    Fisher [2] famously considered the case in which an advantageous allele is dispersing through a spatially dispersed population, showing that the dispersal forms a "wave of advance". This work was the foundation for a lot of progress in understanding spatial dynamics of organisms.

    As I discussed in 2008 ("Overstating the obvious"), one of the consequences of the Fisher wave model for human evolution is that advantageous alleles will spread very slowly through the population. During the course of the Holocene, a strongly selected mutation might move only across a radius of a thousand or so kilometers. That provides one explanation for why new advantageous alleles haven't spread very far beyond their points of origin -- they just haven't had time yet.

    Another reason why an allele might not have spread widely is interference from other alleles with similar effects. I mentioned this process last year ("Spatial variation and near-fixed selected alleles"):

    Greg Cochran and I have been discussing this idea for some time. We call it the "Stooge effect". Think of the Three Stooges all trying to run through a door at the same time and getting stuck in the middle. That's what these genes are doing -- all of them are competing to respond to selection, but each is slowed by the presence of the others.

    Ralph and Coop have cleverly combined the "Stooge effect" phenomenon with spatial dispersal. They suppose a case in which two separate advantageous mutations arise in different geographic locations, each affecting the same trait. Each begins to spread independently as a Fisher wave of advance. What happens when they meet?

    As they show, the dynamics in this case give rise to a static equilibrium -- once the "waves of advance" meet, they stop moving, forming a stable boundary. A new favorable mutation makes headway only so long as it has no equally favorable mutation to compete against.

    I like the way they used both analytical approaches and simulations to come to this outcome. The appearance of stable boundaries in a reaction-diffusion system has long been known (demonstrated first by Alan Turing, actually!). But to my knowledge, no one has considered this specific case from an analytical perspective.

    The Fisher equation is not all that simple for most students to work with. If you become familiar with the equation, you will notice the key aspect is that it has two separate components -- a logistic (or reaction) component representing the increase in frequency at a single point in space, and a diffusion component representing the dispersal across space.

    The muscle of the dispersal process comes from the logistic component. Without the intrinsic growth of the selected allele, the dispersal of individuals along the boundary would not carry many copies of the selected allele into new geographic areas. If the local selective advantage dies, the wave of advance rapidly stalls. A static equilibrium arises, with the frequency of the selected allele forming a cline that correlates with the local selection pressure.

    Ralph and Coop's model approximates this case, in a dynamical sense. Each new selected mutation forms an increasing zone in which the selective advantage of other mutations is zero. When those other mutations encounter this zone, they form a stable cline. The cline is stable in the short term, but the diffusion component still disperses copies of an allele; they just lack the muscle to continue their deterministic expansion.

    The most interesting simulations by Ralph and Coop show the two-dimensional case, in which the stable boundaries emerge in a "tesselation" pattern.

    Tesselations

    Figure 6 from Ralph and Coop (2010), showing "tesselations" in 2-d simulations of waves of advance.

    The lower three panes in the figure show the stability of the boundaries between the selected alleles. They proceed to fixation locally, but their dispersal stops where they come into contact with other adaptive alleles. Over the very long term, the population will mix -- the diffusion process will slowly carry all these alleles throughout the species' range. Look at the process after a million generations and the entire zone will be gray. But this dispersal occurs at the neutral rate, where the diffusion term is the only factor driving the dispersal.

    What about humans?

    My graduate student Zach Throckmorton and I have been working in this area for a while now. One of the things that impresses us is the way that much more interesting dynamics can emerge when you alter the assumptions. I learned some of this stuff by talking to Frank Livingstone, who gave a lot of thought to these issues of spatial dispersal and selection as applied to malaria resistance alleles.

    In particular, Frank thought about the case where one allele has a slightly larger advantage than another. In some contexts, this allows the "better" allele to overtake and swamp the expansion of the "weaker" (but nonetheless adaptive) one. In others, the two come to a near standstill, one displacing the other only very gradually. Much depends on the timing of the two mutations and the local conditions controlling their initial dispersal.

    Ralph and Coop briefly consider this case in their paper, noting that the difference in fitness advantage of two alleles will allow one to advance into the range of the other, albeit at a slower rate. In humans, we may be seeing a smaller subset of cases, where one or more of the alleles have not yet established a wavefront. In these cases, the arrival of another wave can disrupt the spatial pattern of the rarer allele. The diploid case gives rise to the possibility of more complex epistases. Well-defined boundaries between selected alleles are rare, and where they occur (as may be the case with HbC and HbS in Africa), many have focused on negative epistasis as an explanation.

    Also, alleles are unlikely to substitute perfectly for each other. In many cases, they may work synergistically -- individuals carrying two selected alleles that affect the same function may outperform those carrying only one such allele. At some point, new selected mutations may start to have diminishing returns, even on a trait like skin pigmentation where dozens of alleles may have been selected in widespread human populations. So the current distribution may to some extent be "frozen", but by a more complicated dynamic than the simple intersection of waves of advance.

    As Coop and colleagues showed last year [3], and we discussed in 2007 [4], there are really only few genes that have approached local fixation in recent human evolution. The current spatial pattern of recently selected alleles doesn't look like a tesselation with many alleles near local fixation. Over most of the Old World, it looks like populations have a very large number of very new alleles, far from fixation, and few up over 70 percent in frequency.

    So the specific scenario in this paper by itself probably does not explain the overall empirical pattern in humans. But if we consider the current pattern as a transient, approximating the early stages of dispersal for many selected alleles, we may not be terribly far off the mark.

    Mutation-limited evolution

    This is a long dense paper and there's a lot in it. One further aspect of the paper that I think is essential is the way that Ralph and Coop reiterate the basic point that more people means more mutations. In their case, they focus on population density over space (population number, when you multiply them) as a constraint on the number of possible adaptive mutations. They apply this idea as a hypothesis to account for parallel adaptations that may have emerged in recent human evolution.

    Multiple mutational origins are likely if the characteristic length is shorter than the physical dimensions of the region. Eurasia measures >8000 km across, and so Table 1 suggests that multiple origins at a single base pair are very unlikely at the lower population density. On the other hand, if the mutational target is large, then multiple origins are likely at low densities, while at high densities independent origins are ubiquitous. The complementary cases of (rho = 2, µ = 10–8) and (rho = 0.002, µ = 10–5) give identical characteristic lengths of 3000 km, although the timescale on which the mutations spread differs. Thus for these two parameter combinations we can expect a few mutations to dominate within continents and for multiple mutations to be common in a population spread across an area the size of Eurasia. Obviously these calculations are very crude, as population densities vary through space and time, and dispersal across continents is not simply a function of geographic distance and individual dispersal. Nevertheless, these calculations suggest that it is plausible that for adaptive traits with reasonable mutational targets (e.g., a change anywhere within a gene or pathway) even low population densities can lead to parallel adaptation across an area the size of Eurasia, and higher densities almost certainly will.

    We note that as human population densities have increased dramatically over time, so too has the probability of parallel adaptation. It is interesting therefore to note that a number of recent human adaptations (e.g., sickle cell alleles) involve repeated changes at very small mutational targets in relatively small geographic areas, while older adaptations from single changes (e.g., skin pigmentation) are more broadly spread.

    They are describing a scenario in which small human populations would have been mutation-limited -- that is, the number of new mutations is small, making it unlikely that adaptive mutations will happen in any given generation. In such populations, the rate of adaptation is limited by the availability of new mutations. In an extreme -- in the very small effective sizes of Pleistocene human populations -- the rate of adaptation may be extremely slow and regional populations may come to differ at many weakly selected loci, which spread very slowly.

    As the population grows, strongly adaptive mutations become more and more likely to happen somewhere in the species' range. Yet they are still relatively rare -- meaning that they have an opportunity to spread fairly far before encountering another equally strongly selected mutation affecting the same trait.

    This process can give rise to very large differences on a continental scale, even when the selection pressures in different regions do not differ. In humans, the dispersal of selected alleles across space may have been significantly accelerated by actual dispersals of populations. It is not a mere coincidence that very widespread alleles in Eurasia also tend to be much older than 20,000 years old -- long-distance dispersals prior to that time had a higher chance of leaving a lasting influence on subsequent populations.

    But as the population gets bigger and bigger, parallel mutations are more and more likely to happen. As Ralph and Coop point out, at the extreme of large population size and likely mutations, you shouldn't see any new mutations emerging and spreading over very large areas. Any of these mutations would be very likely to encounter other new mutations that do the same thing.

    Is this likely in humans? Clearly some mutations have happened recurrently. Making a broken gene is easy -- there's a large mutational target, since a large fraction of nonsynonymous substitutions might do the job. So if there's a net selective advantage to breaking a gene, we ought to see that happen recurrently in human populations.

    In contrast, if the mutational target is very small, then mutations will still be rare even in a very large population. If only one base change can have an adaptive effect, that precise change will happen less than once in 109 births (remember that not just any mutation at a site, but some particular mutation is what we may need). If a rare duplication or gene conversion is the necessary change, then it may be much rarer.

    Looking across the last few million years, when human population numbers were much smaller than the Holocene, we can be pretty sure that some aspects of our evolution were mutation-limited. The changes that took hold in our ancestors were the ones that happened, and that survived the winnowing of genetic drift. Many changes that would have been adaptive didn't happen in our ancestors. They just weren't lucky enough.

    But some of those changes would still be adaptive now, if we could get them. And we have had much larger numbers in the last 10,000 years. Homo erectus needed these mutations, but we only now are seeing them selected in the human population.

    Malaria adaptation

    Hemoglobinopathies are among the cases of easy mutations -- where breaking a gene is adaptive. It's not just any broken version of alpha- or beta-globin that does the job, though. The hemoglobin needs to be impaired in certain ways to impede the parasites while maintaining blood function. This provides many of the classic cases of human adaptation, and Ralph and Coop turn to this system for examples of parallel adaptation:

    The sickle cell allele HbS at the β-globin gene in humans provides a particularly interesting case of putative parallel adaptation. The HbS allele (β6 Glu-Val) has been driven to intermediate frequencies by selection within the past 10,000 years due to increased resistance to malaria of heterozygotes for the allele (HALDANE 1949; ALLISON 1954; CURRAT et al. 2002; KWIATKOWSKI 2005). The HbS allele is present on at least four major distinct haplotypes in Africa, each at intermediate frequency within a different geographic region; the haplotypes are named after the population sample where they were first discovered (Central African Republic, Senegal, Benin, and Cameroon). This is consistent with multiple origins of this single-base-pair change. Note that a distinct, malaria resistance allele, HbC (β6 Glu-Lys), has also arisen in Africa at the same codon as the HbS allele (TRABUCHET et al. 1991; AGARWAL et al. 2000; WOOD et al. 2005a), increasing our confidence that the mutational input was high enough to allow multiple types to arise. However, FLINT et al. (1998) thought the hypothesis of multiple new mutations arising at a single base pair was extremely unlikely and proposed that it was more likely that gene conversion had spread a single mutation across multiple haplotypes.

    The theory we have developed can be used to assess the plausibility of the multiple mutational origins of the sickle cell allele, by exhibiting parameter combinations that yield characteristic lengths consistent with the separation of the sample locations. [Recall that the wave of advance, and thus also our model, works in the case of heterozygote advantage (ARONSON and WEINBERGER 1975).] The different HbS haplotypes co-occur within a few thousand kilometers of each other (see Table 5 of FLINT et al. 1998) (noting that these locations are unlikely to reflect the geographic mutational origins, and mutations will have been spread by large population movements). As the HbS changes occur at a single base pair, the mutation rate would have been 10–8, and we take an s = 0.05 (as in CURRAT et al. 2002). If human dispersal at that time was well approximated by a Gaussian kernel with sigma = 100 km, then a characteristic length of 1000 km would require an effective density of individuals of rho = 25 km–2, while if sigma = 10 km, then we would require only rho = 2.5 km–2. This latter set of parameters does not seem unrealistic, considering our knowledge of population density and dispersal parameters, so our model suggests that the hypothesis of multiple origins is not unreasonable.

    I think they've got the basic idea correct here, but there are some additional details to consider. The distribution of HbE is not quite so easy to understand if parallel mutations are really so likely, and of course there is the negative epistasis of different alleles (and the thalassemias) which impacts their dispersal ability when they become moderately common. The dynamic may be of similar form to the one described here, but boundaries between alleles may be reinforced by the fitness costs of carrying multiple ones.

    This situation raises the issue of path dependence. Some mutations have "first mover" advantages. Once they are common, other adaptive mutations may still occur -- even mutations that are better from the standpoint of fitness -- but be lost or grow very slowly because their net fitness advantage over the common mutant is slight. Where HbE is common, new HbS alleles are unlikely to invade quickly. Where HbS is common, new HbE mutants are similarly unlikely to invade -- even though HbE has a higher fitness.

    Network effects among genes may also dominate the spatial dynamics. HbS spread most widely in the context of populations that were already Duffy null, and in which G6PD deficiency was rapidly increasing. The first conditioned the parasite environment -- P. vivax had a strong disadvantage in Duffy null populations, P. falciparum made up most of the parasite load. G6PD deficiency should have impacted the relative advantage of HbS, more and more as it became more common. Those are two loci among many that alter malaria dynamics in Africa compared to South and Southeast Asia.

    Conclusions

    There is much more to say about this paper -- it's 22 journal pages. But I think I've given an impression of what's there and how the ideas may impact our interpretation of recent human evolution. Many of the central concepts were presaged by earlier work in 2007 and 2008, as reviewed here on the blog. The new analytical and simulation work, I really like.

    Hopefully we can get out some shorter papers that will focus on aspects of these problems as applied to humans. A message that comes across very clearly in our work and this new paper is that different time periods in our evolutionary history must have had very different selection dynamics. Pleistocene humans were not only in a different ecology than us, they experienced a radically lower potential for adaptation.


    References

  • The other story about the mammoth DNA

    Wed, 2010-03-24 00:04 -- John Hawks

    I got to writing about a story a couple of years ago, and then stalled out. That happens every so often -- remember, most of my research-related entries are my own notes. You can only imagine how many half-written posts I have, but the AI on my computer has gotten better and better at archiving them.

    In this case, the half-written post lately has grown in relevance, so I've revisited it. In the summer of 2008, Thomas Gilbert and (many) colleagues reported on a phylogenetic analysis of 18 mtDNA genomes from extinct woolly mammoths.

    That's pretty cool, by the way. We now know a lot more about woolly mammoth mtDNA variation than we knew about human mtDNA variation in 1980.

    The mammoth mtDNA is an example of something slightly different than the usual phylogeography -- it adds the dimension of time. Call it phylotemporogeography, if you like. The best comparison? Neandertals -- a group for which the number of mtDNA sequences is very similar, over a similarly wide Palearctic geographic range. I wrote about Neandertal phylogeography last year ("Neandertal races?"), and the topic will surely return sometime this year.

    Different mammoth mtDNA clades, which originated millions of years ago, apparently became extinct at different times. The paper divided the mammoth mtDNA variation into two clades, which diverged approximately 1.7 million years ago. These two clades have different geographic distributions. One, which the authors termed, "clade I," was broadly distributed across Siberia and Beringia. The other, "clade II," appears to have been restricted to one area of Arctic Siberia, between the Taymyr Peninsula and the Lena River. Each of these clades has highly restricted diversity, and taking all the mammoth mtDNA sequences together, they are roughly as diverse as the within-subspecies diversity in living elephants. So that deep branch dividing the two clades accounts for a lot of the restricted diversity within mammoths.

    The interesting thing is that the two clades also have different temporal distributions, based on the radiocarbon dates associated with the remains. The geographically restricted clade II is systematically earlier. The time distributions overlap somewhat, but there is no clade II mtDNA after 30,000 years ago, while clade I lasts up to the extinction of the mammoths in the early Holocene.

    First question: why the deep branch? The simple answer is probably that it's just one of those things. It's difficult to weigh the importance of different parts of the geographic range of mammoths, so I hesitate to guess whether the relatively smaller region of clade II mammoths is "peripheral". It's not at a geographic extreme, but it's hard to judge the migration potential among these regions.

    The region occupied by a minor clade doesn't have to be peripheral or geographically isolated. The oldest branch point in a mtDNA tree is unlikely to be evenly balanced, and given that one clade is likely to be less numerous than the other, it is also likely to be geographically restricted. For all we know, the spatial distribution found among these mammoth mtDNAs is perfectly consistent with neutrality.

    Moreover, given the disappearance of clade II after 30,000 years ago, there aren't very many contemporary sequences that are clade I. We don't really know that they weren't evenly balanced at that time -- nor do we know what mtDNA clades may have been present in the broader range of mammoths across Europe and Beringia (although subsequent papers may have given some information on this).

    Second question: why the replacement of one clade by another? The authors first considered whether the mammoth mtDNA might have undergone a selective sweep:

    All of the observed substitutions appear to be between closely related amino acids. For those proteins having a close homolog with an experimentally determined structure (namely, COX1, COX2, COX3, and Cytb), we also modeled the structure of the mammoth proteins. All substitutions appear in regions on the surface or in loop regions that neither seem essential for proper folding nor would be expected to alter protein function in any obvious way. Therefore, the evidence from the modeled structures suggest [sic] that it is unlikely that the nonsynonymous differences found in the mitochondrial genomes of the two mammoth clades have resulted in any physiological disparities, and thus a selective advantage for clade I based on mtDNA sequence differences alone is not expected (Gilbert et al. 2008:8331).

    I think the authors have done as much analysis of this question as possible, given the available data, but I still think this is very weak evidence against selection as an explanation for the clade II extinction. After all, positively selected mtDNA variants are unlikely to change function in a major way -- big changes being much more likely to be bad under the usual Fisher model of adaptation.

    At any rate, the alternative hypothesis is local extinction, taking a geographically-localized clade with it.

    A more likely alternative is that the loss of clade II is a consequence of its restricted geographical distribution, because taxa with small ranges are generally more prone to extinction compared with widespread taxa. It is therefore conceivable that clade II was lost because of a demographic bottleneck resulting in genetic drift or a local population extinction.

    This seems contradictory. Given that there are no noticeable phenotypic differences between these clades, and that mtDNA clades I and II coexisted in the Lena-Kolyma region, a purely local demographic bottleneck doesn't make much sense. Now, there are alternatives that retain mtDNA neutrality -- for example, a demographic replacement of the Arctic Siberian mammoths by populations expanding from elsewhere (either east or south). This might have been driven by selection involving other aspects of physiology, enhanced by climate forcing. For instance, a long-lasting locally adapted population might give way to a more generalized form due to climate oscillations.

    Bottom line: mammoths were a dynamic population, capable of high mobility and rapid clade replacements on the scale of tens of thousands of years. And the Late Pleistocene was a time of high population turnover even across what should have been ideal mammoth habitat. That dynamism is not unusual for large, long-lived mammals, and is something we should be looking for in the DNA phylogeography of Late Pleistocene hominins.

    References:

    Gilbert MTP and 32 others. 2008. Intraspecific phylogenetic analysis of Siberian woolly mammoths using complete mitochondrial sequences. Proc Nat Acad Sci USA 105:8327-8332. doi:10.1073/pnas.0802315105

  • Quote: Fisher on the limits of diffusion

    Thu, 2009-09-17 22:16 -- John Hawks

    R. A. Fisher and Sewall Wright introduced diffusion approximation methods into genetics; Fisher (1937) was the first to consider spatial disperal using a reaction-diffusion model. I found this quote a useful expression of his acknowledgment of the limits of the model:

    The use of the analogy of physical diffusion will only be satisfactory when the distances of dispersion in a single generation are small compared with the length of the wave. In reality diffusion is a complex process, compounded often of the diffusion of gametes, and that of larvae, in addition to adult forms; a more exact treatment than that supplied by a simple coefficient would involve the interaction of these components, and the stages at which the selective advantage was enjoyed. So far as it is applicable, the analogy of physical diffusion, therefore, greatly simplifies the problem (355-356).

    The paper has no references.

  • Spatial variation and near-fixed selected alleles

    Thu, 2009-06-11 14:39 -- John Hawks

    I couple of people have asked me about a new paper in PLoS Genetics by Graham Coop and colleagues, titled, "The role of geography in human adaptation." The paper is open access, and while the details of genetic measures and simulations can be hard to follow, I think it's a great example of the way recent work on selection and human diversity has been structured.

    I'll just expand on a few of the topics in the paper, and discuss how they relate to the previous findings about the number and age of selected variants in human populations.

    Here's the paper's abstract:

    Various observations argue for a role of adaptation in recent human evolution, including results from genome-wide studies and analyses of selection signals at candidate genes. Here, we use genome-wide SNP data from the HapMap and CEPH-Human Genome Diversity Panel samples to study the geographic distributions of putatively selected alleles at a range of geographic scales. We find that the average allele frequency divergence is highly predictive of the most extreme FST values across the whole genome. On a broad scale, the geographic distribution of putatively selected alleles almost invariably conforms to population clusters identified using randomly chosen genetic markers. Given this structure, there are surprisingly few fixed or nearly fixed differences between human populations. Among the nearly fixed differences that do exist, nearly all are due to fixation events that occurred outside of Africa, and most appear in East Asia. These patterns suggest that selection is often weak enough that neutral processes—especially population history, migration, and drift—exert powerful influences over the fate and geographic distribution of selected alleles.

    The paper looks for "nearly fixed" genetic differences between populations, and finds relatively few of them. That's relatively well-known; the FST-based test has been done on fewer populations with similar results (e.g., Williamson et al. 2007; Barreiro et al. 2008). This paper has the HGDP panel, which includes many more populations, and therefore is able to add geographic resolution to these older results. They find that the geographic distribution of near-fixed alleles is clinal; there aren't strong boundaries delimiting the geographic distributions of most apparently selected alleles. That means that the same demographic forces affecting neutral genetic variation have also affected recently selected alleles.

    Is that surprising? As we pointed out in our 2007 paper, the recent demographic history of human populations has included a lot of population growth. This means that the number of adaptive mutations should have increased during the last 10,000--20,000 years. High-FST selected alleles can only reflect selected mutations that are older than this (old enough to reach near fixation in one population), or are extraordinarily strong. A few mutations are exceptionally strong in their selective advantages -- SLC24A5 and lactase persistence seem to be examples. But as long as adaptive mutations are intrinsically rare, very few of them could have occurred in the small populations of 20,000 years ago or earlier, even if many happened in the large populations of the Holocene. So I think the new paper actually reinforces the interpretation of acceleration. The pattern we're seeing today with new mutations just can't be a feature of human evolution before around 20,000 years ago.

    If selection is affected by demographic processes, does that mean that it is "weak"? Clearly, "weak" is a matter of scale. Adaptive genes disperse through a spatially structured population very slowly, even if they confer very large fitness advantages. That means that their dispersal is highly dependent upon demographic conditions, such as the disproportionate growth of some populations or occasional long-distance gene flow. Locally, an allele may rapidly increase under selection, but that effect may have little influence on the evolution of distant populations.

    We see that pattern with genes known to be under strong selection in humans, like the ones that help some people resist malaria. Sickle cell, hemoglobin C and E, alpha- and beta-thalassemia, ovalocytosis, G6PD deficiency all have restricted geographic ranges that parallel the clinal pattern of neutral genes. There is an important difference: the patterns of these genes diverge in areas where malaria risk changes rapidly with geography (like coastal versus inland areas of Mediterranean Europe), and some of them have wide geographic distributions compared to their young haplotype ages (like sickle cell). But even in the latter cases, most are too rare to elevate the FST of surrounding SNP markers. Malaria adaptations are a tremendous example of the way that demographic conditions limit strong selection.

    Africa versus other populations

    Derived alleles are expected to have lower frequencies on average than ancestral alleles. So if a population has a bias toward higher-frequency derived alleles, that may be evidence against neutral evolution. The paper finds that this bias is greater in non-African populations than within Africa:

    The overall genic enrichment is present in all three population comparisons, and each tail seems to be similarly enriched for high- FST genic SNPs. However, the number of derived alleles in each tail does differ substantially and is biased towards derived alleles outside Africa and especially in east Asia. Thus, the statistical evidence for enrichment of events inside Africa is weaker than for the other two populations (we return to this point later).

    In general, populations outside Africa have a genome-wide bias toward higher frequencies of derived alleles. The causes of that bias aren't clear -- ascertainment may account for some of the bias but cannot account for all of it; it's possible that early demographic events may explain some of the bias but the pattern isn't obvious.

    The FST-based tests of neutrality are most powerful when a new allele has swept several rare mutations with it to near-fixation. Rare mutations tend to be derived ones. So the power of the test depends on how many rare mutations there are to start with, and what their frequencies are in other populations that didn't have the same selected allele.

    It's one of many issues that make finding selection in African populations slightly different from elsewhere. I think that Africans have undergone as much, and very possibly more, selection by new adaptive mutations as other populations. But our 2007 work suggested that the modal age of the selection we ascertain in Africa may be older than in other regions. That would be consistent with demographic history, since Late Pleistocene African populations were larger than others. But it's possible that genome-wide features like faster LD decay, higher heterozygosity, and more ancestral versus derived variants may also influence our estimates of the timing and number of selected alleles in Africa.

    Polygenic adaptation

    Toward the end of the paper, the authors discuss the pattern of local adaptation in a more general sense. Why should there be relatively few near-fixed genetic differences between populations, if human ecological changes suggest that local adaptation should have been a powerful force in our recent evolution? One possibility is acceleration -- most of the variants are too recent to have reached near-fixation in any single population.

    But the authors mention another possible influence that we've also been thinking about: epistatic interactions among new variants. For example, lots of skin pigmentation loci are known to have been under recent selection, but only a couple of them have reached near-fixation in any population. The rest are at lower frequencies. Since these alleles all affect the same phenotype, they're subject to diminishing returns. As one lighter-pigment allele becomes common, it reduces the strength of selection on the others. The population doesn't have to fix for any of them; in fact, selection probably cannot drive more than one or two up to fixation since the rest of them compete with each other.

    Over the very long term, this situation would be sorted out. A handful of loci that optimize skin pigmentation might ultimately go to high frequencies or fixation, for some alleles the costs may exceed the benefits and they will disappear. Others, relatively neutral to each other, may fix by drift. But the "very long term" is a span of hundreds of thousands of generations. Here we're talking about a few hundred generations at most. So human populations aren't anywhere near an optimum, they're in a transient where epistatic interactions may be quite important.

    Greg Cochran and I have been discussing this idea for some time. We call it the "Stooge effect". Think of the Three Stooges all trying to run through a door at the same time and getting stuck in the middle. That's what these genes are doing -- all of them are competing to respond to selection, but each is slowed by the presence of the others.

    It's not a new idea -- Frank Livingstone used to talk about this general concept with different malaria adaptations. What's new is the increasing evidence that humans are really in a transient with a lot of genes out of equilibrium. It's very possible that for some phenotypes, standing variation has been an epistatic block on the selection of new mutations. For others, the emergence of some new mutations has limited the trajectory of selection on others.

    Conclusion

    All in all, I think this paper is a nice contribution to our understanding of the pattern and rate of recent positive selection in human populations. Certainly, the HGDP sample will continue to be a very informative addition to our understanding of spatial dynamics in ancient humans. The addition of the new HapMap v.3 samples may be even more important, because these represent further regions with roughly the same discovery power as the initial three HapMap samples. And of course, we have the 1000 Genomes sample coming up, adding significant potential for discovering rarer selected variants.

    References:

    Coop G, Pickrell JK, Novembre J, Kudaravalli S, Li J, et al. 2009. The Role of Geography in Human Adaptation. PLoS Genet 5(6): e1000500. doi:10.1371/journal.pgen.1000500

  • Upcoming appearances

    Sun, 2009-02-08 13:55 -- John Hawks

    I will be giving two public lectures out of town later this week.

    The biggest is this Thursday evening, February 12, when I will be giving the Darwin Day lecture at the University of Wisconsin-Whitewater. This is a really great venue, and I'm really looking forward to it! So I'm bringing out all the good stuff:

    Neandertals, Darwin and the Sicilian Mafia: What do they have in common?

    If you're in the SE Wisconsin area, the lecture is Thursday 2/12, at 7:00 pm, in the Young Auditorium at UWW.

    Earlier in the week, on Wednesday, I'll be giving a lecture in the Human Genetics department at the University of Chicago. This talk will cover some of my current research on recent selection in humans, as well as the connections between our evolutionary history and documented written history. The title is:

    Spatial dynamics of positive selection, language dispersals, and human history

    If you're familiar with UC, you're ahead of me in finding the place. The talk will be Wednesday 2/11, at 4:00 pm in CLSC 101.

  • Surfing and recent selection

    Sat, 2009-01-10 19:45 -- John Hawks

    Genetic Future and Gene Expression have commented today on the relative roles of selection and demography in shaping the genetic differences between populations. They are reacting to a paper by Hofer and colleagues (2009) that examined the differences in frequency among human populations for a number of genetic markers, including STR (microsatellite), SNP and insertion-deletion mutations.

    That paper's abstract:

    Several studies have found strikingly different allele frequencies between continents. This has been mainly interpreted as being due to local adaptation. However, demographic factors can generate similar patterns. Namely, allelic surfing during a population range expansion may increase the frequency of alleles in newly colonised areas. In this study, we examined 772 STRs, 210 diallelic indels, and 2834 SNPs typed in 53 human populations worldwide under the HGDP-CEPH Diversity Panel to determine to which extent allele frequency differs among four regions (Africa, Eurasia, East Asia, and America). We find that large allele frequency differences between continents are surprisingly common, and that Africa and America show the largest number of loci with extreme frequency differences. Moreover, more STR alleles have increased rather than decreased in frequency outside Africa, as expected under allelic surfing. Finally, there is no relationship between the extent of allele frequency differences and proximity to genes, as would be expected under selection. We therefore conclude that most of the observed large allele frequency differences between continents result from demography rather than from positive selection.

    OK, so that abstract concludes that demography (including population bottlenecks and geographic dispersals) is a better explanation for the genome-wide pattern of interpopulation frequency differences than selection.

    I agree completely.

    When I teach Anthropology 105, our introduction to biological anthropology, I always force my students to learn how to calculate Wright's FST. They really don't like it. They think it's cruel and unusual punishment to have to do math in an anthropology course.

    Well, if they're going to take my courses, they'll have to get used to it. Because with me, it's all about the math.

    So, let's consider FST. The statistic represents the reduction in heterozygosity in subpopulations due to isolation, compared to the expectation under panmixia. The expression is:

    Fst equation

    Where HS is the average heterozygosity of subpopulations, and HT is the expected heterogosity of the total population, given the allele frequencies.

    I always use a two-allele locus as an example in class, and I always choose a case in which the frequency of an allele in one subpopulation is 70 percent, and the frequency of the same allele in the other subpopulation is 30 percent. Big difference in frequencies -- the frequency is 40 percent higher in one population than in the other. In fact, that frequency difference is well within the range considered "extreme" in the current paper by Hofer and colleagues.

    Well, if the subpopulations are the same size, the average allele frequency is 50 percent. So the expected heterozygosity of the total population is 0.5. (that's 2pq, where p and q are the frequencies of the two alleles). And the average heterozygosity of the two subpopulations is 0.42. So applying the formula above, we come to an FST of 0.16.

    Now, the average FST among human continental populations is between 0.1 and 0.15. A value of 0.16 for a single gene should not be in the least bit unusual. Under neutrality, there ought to be lots and lots of gene loci that show allele frequency differences this great or greater. And indeed, Hofer and colleagues find a large set of such loci -- something like one out of 10, which actually seems a bit low to me.

    Other surveys that have tried to test the neutral hypothesis have considered a much smaller range of frequencies -- essentially, genes in which an allele is 80 percent or higher in one population and rare or absent in others. This study included much smaller allele frequency differences as part of their "extreme" and thereby found that a very high fraction of sites had such differences.

    For the broader meaning of "extreme" used in this paper, which under neutrality would include one out of every 10 loci, it is no surprise that most would look, well, neutral. There are so many neutral loci fitting these characteristics that they completely swamp out any statistical expectation of selection. There might be a handful of selected sites among the high-FST loci in the paper (and the authors identify a few candidates from other studies), but most must be neutral. The study tests the adequacy of neutral hypothesis to explain low FST genes, and finds that population differences at that level have not been driven primarily by selection.

    I'm not sure why the authors didn't include the prosaic mathematical prediction of neutrality in their paper. It seems to me that the results were foreordained by theory.

    Still, several of the observations in the paper are interesting. In particular, the excess of STR alleles outside of Africa that have increased in frequency is a sign of a long-term demographic bias toward population growth outside of Africa. I have heard that observation from other research groups in other contexts, but this is the first paper I can think of that reported it clearly. The "allele surfing" explanation is a very credible explanation for that observation -- essentially, geographically-dispersed founder effect.

    The end of the discussion includes a statement about positive selection:

    While we find that positive selection is unlikely to have shaped the allele frequency spectrum at most loci, it may certainly have acted on fewer genes than previously believed, and our current results do not allow us to discriminate between the effects of demography and selection for an individual locus. Loci which are candidates for being under positive selection should therefore be more carefully scrutinized to find links between potentially selected alleles and a phenotypic effect (see e.g. Sabeti et al. 2007).

    I find nothing to disagree with here. Any individual instance of positive selection should be tested with reference to phenotypic effects, and collectively, most of the genome's diversity was not shaped by positive selection. Our own research on positive selection (discussed in this post from last year) addresses a relatively small subset of haplotypes across the genome. Even though the number of affected genes is quite large (on the order of several thousand), it did not strongly influence the genome-wide diversity parameters assessed by Hofer and colleagues.

    The limited genome-wide effect of selection, in the face of a large apparent number of selected alleles, is one of the strongest arguments that the rate of positive selection has recently accelerated. If the rate had been high throughout human evolution, we would find a much stronger effect on the genome-wide variation than we in fact observe. The demographic changes proposed by Hofer and colleagues in fact bolster the case for a recent acceleration -- the very demographic changes that might create "allelic surfing" would also tend to generate more positively selected mutations.

    References:

    Hofer T, Ray N, Wegmann D, Excoffier L. 2009. Large allele frequency differences between human continental groups are more likely to have occurred by drift during range expansions than by selection. Ann Hum Genet 73:95-108. doi:10.1111/j.1469-1809.2008.00489.x

Subscribe to spatial dynamics

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.