john hawks weblog

paleoanthropology, genetics and evolution

bottlenecks

  • Mailbag: Noah's Ark

    Tue, 2011-10-11 23:32 -- John Hawks

    From a reader:

    Hello Dr Hawks I am a reader of your blog and respect your expertese so I thought you would be the right person to ask this question to. I was debating a creationist about human genetic history the creationist is a literal believer in Noah's ark andi was saying to the creationst that one of the reasons we know the story of the global flood is nor true is because if it were all species including humans would have a bottleneck of two individuals dating to the exact same time. The creationist then cited this article as proof that humans could have been bottlnecked to 2 or six individuals

    "However, the global extent of ß[beta]-globin divergence has at first sight some startling demographic implications because the hunter-gatherers who migrated from Africa. Europe and Asia have rather similar haplotype frequencies. Hence, the emigrants must have undergone the major change in haplotype frequency in the interval between leaving Africa and dispersing throughout the rest of the world. Assuming--and this is little more than an informed guess--that this interval was 20,000 years, population-genetics theory tells us that the mean effective size of the ancestral population for all non-Africans throughout this period must have been 600 individuals; or alternatively ;that ;the bottleneck was 6 individuals for 200 years, or even a signle couple for 60 years. (The expected time for the loss of a neutral gene present in thepopulation at frequency p is E(T) = -4N plnp/1-p, where N is the population size. We assume a generation interval of 20 years and that the 4 common haplotypes were present at equal frequencies in the ancestral African population.) If this is the case, much of mankind was an endangered species during an imporant part of its evolution." ~ J.S. Jones and S. Rouhani, "How Small was the Bottleneck?" Nature, 319, Feb. 6, 1986, p. 450

    What is this article actually saying? Is it saying that it really is possible for every human alive today to have sprung from only 2 or 6 people? Because that contradicts everything Ive read that says genetics shows that our population could neevr have been bottlnecked below at least a few thousand individuals. Can you explain it to me. kind regards

    A single gene can never provide evidence showing such a bottleneck, it requires every gene in the genome to show a consistent pattern. In this case, the most obvious genes to examine are those with the *most* variation. For example, the human HLA genes have hundreds of allelic variants in human populations that have existed for thousands of years. Each of these genes (including HLA*A, HLA*B, HLA*C, DRB1, DRB2, DQB) has old variations, the oldest alleles have been retained from our common ancestors with chimpanzees and gorillas. These could never have been retained for so long if we had undergone a bottleneck to two or a few individuals.

    It is true that human genetic variation is low relative to some other mammals, but it is not indicative of a bottleneck to a handful of individuals. When geneticists today refer to bottlenecks, they are estimating many hundreds of individuals at the least, and 10,000 individuals as a more likely value.

  • Toba "cut down to size"

    Wed, 2010-12-01 15:29 -- John Hawks

    Thanks to a reader:

    Science last week carried a news article by Naomi Lubick, describing a new model for the climatic effects of the Toba volcanic eruption, around 74,000 years ago.

    The simulation revealed that Toba's impact was not as extreme as some scientists believed. Temperatures dipped only 3˚ to 5˚C across the globe, for example. The model also showed that the high concentrations of sulfur particles were short-lived; they settled out of the stratosphere—where they can have the largest cooling effect—within 2 to 3 years, the team reports online this month in Geophysical Research Letters. Extreme temperature changes in Africa and India lasted only a year or two, with a temperature decrease of at most 10˚C in the first year after the eruption, followed by 5˚C the second year. Overall, Toba didn't wipe out flora and fauna, Timmreck says, but it would have made life harder for a few years.

    The issue comes down to the assumptions they have to make when they scale up the measured effects of recent volcanic eruptions such as Mt. Pinatubo, Philippines. The new model is argued to be consistent with ice core data about atmospheric sulfate concentrations after the eruption.

    I think these climate models continue to shift too much to really interpret the importance for ancient human populations. A global reduction in temperature and biosphere productivity is not going to be happy times for most Pleistocene hunter-gatherers. But the kind of extreme, prolonged population contraction seems like it must require a rather more severe event, seriously forcing global climates out of their

    I've been a very consistent Toba skeptic, because a global catastrophic event in the Late Pleistocene really is not required to explain the present pattern of human genetic diversity. But with a little clever science, it might become possible to look for more temporary effects, or those limited to a few regions of the world. What's necessary is to bring the expectations into the same range of realistic alternatives.

    In that view, a more precise climate model that may show a shorter and smaller range of climate effects may be very useful.

  • Orangutan dynamics of Borneo

    Wed, 2010-11-24 01:46 -- John Hawks

    Bornean and Sumatran orangutans are the most highly divergent subspecies within any of the living species of great apes. The two farther apart even than chimpanzees and bonobos, which are good biological species. The time of the Bornean-Sumatran orangutan divergence as estimated from mtDNA is around 3.5 million years ago.

    This is old enough that many primatologists consider the two populations as separate biological species. The species distinction is supported by some aspects of morphology, but as yet we have no good nuclear DNA information about the extent of divergence. In chimpanzees, nuclear genetic comparisons suggest a relatively recent founding of one subspecies and recurrent gene flow between the others, despite high mtDNA divergence between the subspecies. So information from across the genomes of Bornean and Sumatran orangutans may be necessary to substantiate the hypothesis of long isolation suggested by mtDNA.

    Within Borneo, different local populations of orangutans have strong genetic differentiation, with few shared mtDNA haplotypes among them. A new study by Natasha Arora and colleagues [1] has provided further detail about these relationships within Borneo. Based on earlier work, they expected to find high population differentiation within Borneo, and that is what they found:

    [O]ur analyses revealed high and significant mitochondrial differentiation, with populations within currently recognized subspecies generally displaying as much differentiation as those between subspecies. Of notable interest is the great extent of subdivision and lack of reciprocal monophyly for the morphologically recognized subspecies P. p. morio and P. p. wurmbii. MtDNA haplotype sharing is uncommon and for populations separated by rivers occurs only in two instances: (i) for SA and GP and (ii) for the northern and southern populations across the Kinabatangan river. In both cases, very recent common ancestry could explain the incomplete mtDNA lineage sorting. For North Kinabatangan (NK) and SK, Jalil et al. (27) proposed an expansion from a recent common refugium further west in Mount Kinabalu, as posited for other Bornean species (46, 47, 49). DV, with its low haplotype diversity, might also be the result of a recent range expansion. GP is located proximally to the Bangka–Belitung–Karimata–Schwaner divide, from where orangutans are presumed to have dispersed to the rest of Borneo (12) and where we might expect a rich haplotype diversity. However, the presence of only one mtDNA haplotype shared with populations further east suggests that the current population in GP is recent and/or underwent a severe recent bottleneck. This and other local bottlenecks make it impossible to reconstruct a colonization of Borneo through the southwestern “choke point” (52).

    They were able to confirm the relatively strong differentiation of Bornean populations by examining nuclear microsatellites. These do not give a great indication of the time period over which the populations may have developed their differentiation, but the microsatellites do document the relative lack of allele sharing between the populations, attesting a history of low gene flow in the recent past. The populations they identify as strongly differentiated do not correspond entirely with the subspecies recognized along morphological lines, but there are strongly differentiated populations here.

    The "news" aspect of the paper is the one unexpected observation: the mtDNA ancestor of Bornean orangutans lived relatively recently, only around 176,000 years ago (with a range of error stretching from 72,000 to 320,000 years ago. The data in the study do not allow us to distinguish whether this was a time when the Bornean population may have been founded, or whether instead the mtDNA lineage spread through pre-existing populations. The authors pursue the hypothesis that Bornean orangutans were limited to a refugium sometime during the early Late Pleistocene:

    Assuming that orangutans arrived in Borneo around the same time as gibbons and macaques, the recent coalescence of Bornean orangutans could be explained by a bottleneck through a severe rainforest contraction. Such a bottleneck would have had a more dramatic impact on the mtDNA structure of orangutans compared with other species as a result of their low densities and slow life histories (18) as well as habitat requirements.

    The comparison with gibbons and macaques is necessary because both have substantially deeper mtDNA coalescence times within their Bornean populations. If the forest had been substantially reduced to a small area where orangutans could survive, we might expect the other primates to reflect this event -- and they don't. Nevertheless, a grab-bag of climate change scenarios appear next:

    Geomorphological and palynological data indicate the presence of dryer, more open vegetation in southern and western Borneo during the last glaciation (2, 41), and by extrapolation also during other glaciations (but c.f. refs. 42, 43). Climate change was especially severe during an extended cold period within the penultimate glaciation between 130 and 190 ka (44, 45), which occurred approximately at the time of mean coalescence of Bornean mtDNA haplotypes. More recently, the last Toba eruption approximately 74 ka resulted in a short, albeit signi␣cant, decrease in regional temperatures, ensued by a 1,800-y cold stadial (9, 10). Our data do not provide clear signals to make conclusive statements about potential Toba effects. Nonetheless, the coldest period of the penultimate glaciation (44, 45) was more prolonged than the cold period following the last Toba eruption, suggesting more severe effects of the former on the extent of rainforest across Sundaland. In any event, suitable rainforest habitat for orangutans should have existed in certain regions in Borneo where a refugium population survived the dry glacial conditions.

    A coalescence time of 176,000 years ago does not point to a short-duration bottleneck that began 74,000 years ago. If orangutans in the Middle Pleistocene of Borneo had high genetic differentiation, a crash would have to have been very severe -- eliminating all but one small regional population -- to have effected the present distribution. Still, the great uncertainty in the actual coalescence time leaves open many possibilities, and the refugium hypothesis in the general case is worth testing, even if the Toba eruption in particular cannot explain the data.

    Given the uncertainty about the habitat structure of the now-submerged areas of Sunda, we may also want to consider the hypothesis that the present orangutans arrived recently on Borneo from mainland Southeast Asia. Even if orangutans had lived on Borneo during the Middle Pleistocene, they may not have been the current orangutans. Or even better, they may have been Neanderorangs -- an initial population that was genetically swamped by migrants arriving from elsewhere. The deep Sumatra-Borneo divergence means that the Bornean population was probably not recently derived from Sumatra, but that's a very restricted source compared to the Late Pleistocene distribution of orangutans across mainland and island East and Southeast Asia.

    Some other animals walked from Sumatra to Borneo repeatedly during the Pleistocene, including humans. In the human case, we know that a large fraction of the genetic ancestry of Bornean and Javan people was derived from Asia within the last 100,000 years -- in other words, Late Pleistocene gene flow. The movement of genes may have happened in the context of a dispersal of Asian (or ultimately, African-derived) populations into island Southeast Asia. The paper includes some discussion of other primate species:

    For instance, the south Bornean gibbon Hylobates albibarbis and the Sumatran–Malaysian gibbon Hylobates agilis have a TMRCA of 1.56 Ma (36), and Bornean and Sumatran pig-tailed macaques have one of 3 to 4 Ma (37). By contrast, the Bornean–Sumatran common ancestor of both the silvered langur(39) and clouded leopard (40) is much more recent than that of orangutans, gibbons, and pig-tailed macaques, probably because of a higher ␣exibility in habitat use.

    The pig-tailed macaque divergence time is more or less the same as the orangutan divergence; the others are more like the time range for human dispersals into island Southeast Asia. We can add to the primates a few other medium-sized mammals; for example, clouded leopards are highly differentiated between Sumatran and Bornean populations, and their mtDNA divergence occurred sometime after 3 million years ago.

    There may be no contradiction between the recent mtDNA common ancestor and the high degree of population structure in Bornean orangutans; the mtDNA could have been selected. We really would want resequencing of a lot more loci in these orangtuan populations, for which we may not have to wait too long. Mitochondrial DNA is convenient in many ways, including its greater sensitivity to restricted population size and higher mutation rate. But the intrinsic variance of a single gene system under genetic drift is so high that this disadvantage probably outweighs all advantages for reconstructing population sizes.

    At any rate, the orangutans now provide an additional case where the subspecies-level history of hominoids is more complex than depicted five or six years ago. Uncovering these kinds of dynamics highlights the need for better modeling of demography and dispersal within a geographically widespread species. Isolation-by-distance and long-lasting subspecies are well-defined models, but when they are refuted, we have a lack of well-defined alternatives.


    References

  • Battlestar mitochondria

    Sun, 2010-10-31 00:39 -- John Hawks

    Wired has an interview with the authors of a book titled, The Science of Battlestar Galactica. I wasn't a viewer of the show, so I wasn't aware that the mitochondrial Eve scenario turned out to be a major plot point in the series' finale. Wired chose to excerpt that part of the book.

    The excerpt does a good job differentiating the most recent ancestor of humans from the most recent ancestor in the exclusively maternal line -- the mitochondrial Eve:

    It’s important to emphasize that Mitochondrial Eve and her contemporaries had offspring, and those offspring had other offspring. But throughout the subsequent generations, for one reason or another, the lineages of Eve’s contemporaries all died out. Of all the women alive then (and in our case, that means the entire female population of Galactica and the fleet), only one has offspring alive today. We know her as Hera Agathonv.

    This does not necessarily mean that Hera is our Most Recent Common Ancestor (MRCA). Hera populated today’s Earth solely through her daughters and daughters’ daughters. The MRCA is the person who, while no doubt descended from Hera, populated today’s Earth via their daughters and/or sons. By adding males to the mix, the MRCA almost certainly cannot be the same as Mitochondrial Eve. In fact, most researchers today feel that the MRCA lived only about five thousand years ago, 145,000 years after Hera.

    That's a good two-paragraph summary of the issue, though it could use more fleshing out. Unfortunately, the book excerpt goes off on a Toba tangent, discussing the near-extinction of our species as a "real population bottleneck."

    This is a hard part of population genetics to get right, the distinction between effective and census population size, and the relationship between demographic events (like bottlenecks) and heterozygosity. A good description of the science should be appropriately skeptical -- I would expect no less for various "faster than light" drive technologies, which surely are harder to explain than population models. In this case, the reconstruction of population bottlenecks is highly speculative, and there is positive evidence against the Toba scenario having been a catastrophic event on the scale described here.

    Still, I don't have any problem with a science fiction series making use of such a scenario as a plot element. It's the perfect kind of thing for fiction. Beats the heck out of "midichlorians"!

  • New data on Ashkenazi population history

    Thu, 2010-08-26 19:37 -- John Hawks

    Bray and colleagues [1] report on genotyping of 471 people of Ashkenazi Jewish descent. This is one of the largest samples of a single human population, and is therefore very interesting for studies of population history and recent natural selection.

    There's a lot in the paper. One of the key findings in the paper is that the Ashkenazi population doesn't look bottlenecked -- in fact, it looks outbred compared to Europeans generally. The paper also documents a high amount of admixture with non-Ashkenazi Europeans, ranging from 35% to 55%. Figuring out the actual history of the population -- when and where its ancestors lived and how they interacted with other people -- is beyond the scope of this kind of analysis. But I expect that somebody can put together a really compelling historical account using these data.

    I turned quickly to the issue of selection. They are able to substantiate evidence of positive selection on several disease-causing alleles in the Ashkenazi population, including the Tay-Sachs allele. The lack of evidence for bottlenecks or founder effects pretty much takes away the alternative explanation. Yet they were unable to show statistical evidence of selection on some other disease-causing alleles in Ashkenazi populations:

    To explore whether regions of selection in the AJ population included any loci of known Ashkenazi diseases, we examined 21 disease- and cancer-susceptibility loci with known mutations found at higher frequency in the Ashkenazi population. Only 6 of the 21 genes fell in or near (within 500 kb) the top 5% of the AJ iHS windows (Table 2). Among these is the Tay-Sachs disease gene, HEXA, whose selection has been widely debated (4, 5, 14–16) and was found ~400 kb downstream of a window on chromosome 15 identified in the top 1% of the AJ iHS hits. Although none of the SNPs interrogated immediately adjacent to the HEXA locus showed elevated iHS signals, it is possible that the nearby region may contain regulatory elements under selection that affect HEXA expression. Cochran et al. (14) speculated that selection of many of the AJ- prevalent disease loci, especially the lysosomal diseases, conferred an increase in intelligence that was necessary historically for the AJ economic survival. Our data shows evidence of strong selection at or near only six disease loci, including only one out of the four AJ- prevalent lysosomal storage diseases, thus arguing that most AJ disease loci are not under strong positive selection, but rather rose to their current frequency through genetic drift after a bottleneck. However, we cannot exclude the possibility that selection of some AJ disease loci are outside the limits of detection by the extended haplotype tests, which are known to have less power to detect se- lection of lower frequency alleles (38, 41).

    It seems to me that this passage probably wasn't written by the same author who showed the lack of evidence for founder effects a few pages before. In this case, the confusion probably comes from the fact that the "detection of positive selection" is actually a refutation of the hypothesis of genetic drift. With a larger sample it will be possible to test the hypothesis with greater power.

    Ddisease-causing alleles are at low frequencies currently, making them unlikely to rise to the top percentages of the statistics. It would be interesting to control for current frequency, but I haven't seen a test that uses frequency information in this way.

    It's quite remarkable to reflect on the idea that positive selection has now been demonstrated on six disease-causing alleles in the Ashkenazi population. Every one of these is a case of overdominance -- where the heterozygote carrying an allele has some selective advantage, while the homozygote carrying two copies has a disorder. I was having a conversation with a very prominent geneticist a few months ago, who claimed that no case of overdominance in humans had ever been demonstrated except sickle cell. Now, that was obviously false even at the time -- as I pointed out, the many hemoglobinopathies are fairly clear examples. But we've come an awfully long way.

    From data like these, we're going to learn a huge amount about low-frequency selected alleles. The Tay-Sachs-causing allele is one of the most common recessive lethal genes in any human population, but like all genes subject to strong selection in homozygotes, it remains rare. Finding selection on these kinds of alleles is very hard unless sample sizes increase to several hundred individuals. Here we are seeing evidence of selection in historic populations -- within the last 2000 years. More will be coming.


    References

    1. Bray SM, Mulle JG, Dodd AF, Pulver AE, Wooding S, Warren ST. Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population. Proceedings of the National Academy of Sciences of the United States of America [Internet]. 2010;107:16222–16227. Available from: http://dx.doi.org/10.1073/pnas.1004381107
  • Passing on your fertility to your kids

    Fri, 2010-05-14 10:37 -- John Hawks

    From the NY Times earlier this spring, a profile of a New York woman with an exceptional legacy:

    WHEN Yitta Schwartz died last month at 93, she left behind 15 children, more than 200 grandchildren and so many great- and great-great-grandchildren that, by her family’s count, she could claim perhaps 2,000 living descendants.

    The story talks about her history and how she came to have such a large family. By itself, having 15 children would be unremarkable except that the children and grandchildren themselves all went on to have large families ("Like many Hasidim, Mrs. Schwartz considered bearing children as her tribute to God."). After a couple of generations, it adds up to a lot of descendants.

    I don't think the story is all that unique. Within the United States there are many communities, like the Hutterites, Old Order Amish, and Hasidic Jews, where large family sizes are the norm. Probably hundreds of women on earth can claim more than a thousand living descendants, and thousands more have only to wait until they are old enough, while their children and grandchildren's families continue to grow.

    You can get there by having 10 children, each of which has 10, and each grandchild has 10 -- that adds up to 1110, giving some extra for different generation times and losses. Of course, it's a trick to live long enough to see the 1000 great-grandchildren, but the early ones should already have given you a fraction of your 10000 great-great-grandchildren.

    What's surprising here? Not the family sizes themselves -- big families are common in most human populations. The high offspring numbers are not as apparent in populations that have high juvenile and infant mortality, but many pregnancies was the norm prior to the industrial transition.

    No, what's surprising about huge numbers of living descendants is the correlation between generations. In these cases, the correlation is driven by religion and various social proscriptions related to religious observance.

    I often talk about models and real human population structures in my classes. One obviously unrealistic aspect of the Wright-Fisher population model is its reproductive variance. In the Wright-Fisher model, reproductive variance is binomial -- every gene in an offspring population is equally likely to descend from each gene in the parental generation. In the model, it is possible -- albeit extraordinarily unlikely -- for a single parent to give rise to the entire offspring generation. That just can't happen in a real population, certainly not in humans. The effect of that unrealistic assumption of the model is not great, however, because even in the model the chances have having more than 10 offspring, while possible in theory, are negligible. If anything, the Wright-Fisher model is too conservative about the variance of offspring number -- real human populations have a non-negligible fraction of women who have 10 or more live births.

    I get more concerned about other deficiencies of simple models, which are sometimes harder to deal with. One of those is the correlation of offspring number between generations. If there is even a slight correlation, women tending to have more children because they came from larger families, it has a major effect on the amount of inbreeding in the population.

    You can think about it genealogically. Suppose you live in a small town with a few big families. The chances that you yourself were born into one of those big families is small. But if today's big families tended to come from yesterday's big families, with each generation we go back in time, it becomes more and more likely that one of your ancestors came from one of those big families. Still looking backward in time, your genealogy becomes captured by those big families, branch by branch. Since there are few big families in the town, once two or more lines of your ancestry trace to them, those lines will rapidly share a common ancestor. That's inbreeding, from the perspective of your genealogy.

    In small towns, that process isn't inevitable because people move in from elsewhere. Most of the lines of your genealogy will probably come from other towns within a few generations. But if we consider the human species as a small town, well, there's nowhere else to move in from. If the population structure of our species has included a strong correlation of offspring number between generations, it will have massively reduced our genetic variation.

    Since we have low genetic variation as a species, you can see why this is potentially interesting.

    Masatoshi Nei and Motoi Murata back in 1966 worked out a relation between intergenerational correlation in offspring number and effective population size. That's before the days of computer models, for you simulation jocks out there. The "effective" size of a population, as I've noted here many times, is the one parameter of a Wright-Fisher model, as estimated from the genetic variation within a population. It's a statement about how inbred the population looks, assuming that its evolution followed a random-mating model throughout its history. Now, that model is wrong in pretty much every interesting case, and so there are various mathematical transformations that attempt to account for the effects of different mating structures.

    In the case of intergenerational correlation of offspring number, Nei and Murata derived an expression to predict the reduction of effective size to be expected from this correlation, assuming a model in which the variance in offspring number is distributed in a certain way. The solution isn't general -- if offspring number were distributed in some other way, the effect of the same measured correlation may be quite different. And in their model, they were concerned with the case where the correlation of offspring number is influenced by genes that determine fitness -- in other words, genes under selection in the population. So it's not a complete answer, but it's a start.

    Nei and Murata cited empirical data from several earlier studies that showed a correlation of 0.20 to 0.40 between generations of human offspring number. Under the assumption of their model, a correlation of 0.30 would causes a reduction of the effective size by roughly half.

    That's a big effect. We already expect a reduction of effective size compared to the census count of a human population, because human populations include many non-reproductive individuals -- kids and postreproductive adults make up half to two-thirds of small-scale foragers. If big families have an additional effect of half, it means that the effective size of the population starts out at a fourth to a sixth the census count. So that an effective size of 10,000 really means 40,000 to 60,000 people on the ground.

    Still low, but as one factor among many it may be very important -- and possibly the distribution of variance caused a further decline. It's much worth investigation.

    A correlation of offspring number between populations can be caused by many ecological or cultural factors. Nei and Murata (1966) had considered the case where fitness itself is inherited, because of the presence of selected genes. But in humans, a more pervasive force is cultural inheritance. This factor was discussed in 1976 by the demographer Samuel Preston, attending to the importance of cultural preferences in contemporary populations:

    Since children of each generation are drawn disproportionately from families of women with high fertility achievements in the past, it may be expected that a pronatalist selective bias operates each generation with respect to the transmission of "tastes" for children. It has also been suggested that personality traits which may affect fertility achievement, such as the ability to defer gratification, may be transferred to some extent between parent and child (Kantner and Potter, 1954). It is also reasonable to suggest that biological fecundability is partially inherited. The positive correlation between the social classes of parent and child implies that economic constraints impinging on the childbearing process tend to be similar for the two generations (Preston 1976:110).

    In small-scale societies, these forces are somewhat different. But I wouldn't expect them to be less -- indeed, the social competition between families is probably more intense. The entire "Macchiavellian intelligence" model of cognitive evolution implies that these kin-level effects were pervasive throughout human evolution over the past 2 million years or more. A strong cultural inheritance of fitness is really necessary for selection on genes that influence prosocial kin-related behaviors.

    How intense? Seems like a good question to investigate, as it may have a lot of importance to understanding genetic variation in our ancestors -- including our common ancestors with the Neandertals, whose genetic variation was limited just as much as our own.

    On the subject of effective population size, I'll be posting next week about chimpanzees and bonobos. More genetically variable than us? Well, some of them...

    References:

    Preston SH. 1976. Family sizes of children and family sizes of women. Demography 13:105-114.

    Nei M, Murata M. 1966. Effective population size when fertility is inherited. Genet Res 8:257-260.

  • Double the bottlenecks

    Fri, 2009-10-09 20:49 -- John Hawks

    Amos and Hoffman (2009) describe a study of microsatellite (STR) data taken from 53 populations -- the HGDP dataset. They suggest that the worldwide diversity of STR loci is consistent with a double-bottleneck population history: An initial bottleneck accompanying a dispersal of humans from Africa some 50,000 years ago, followed by a second bottleneck as people moved into Beringia much later. Despite the cline of diversity across Eurasia leading to lower diversity farther from Africa, they do not see any evidence for successive (sequential) bottlenecks across this region.

    In general, I think that people are going farther than the data on this question of human migrations. Even considering only 53 population samples, there are thousands of ways that they might have been connected to each other over time. Rarely does anybody test simple null hypotheses, like isolation by distance. Often they mix two or more distinct scenarios in an attempt to find a closer fit to data. The problem is that two or more distinct scenarios will inevitably provide a closer fit, merely by virtue of adding parameters. The question is whether the fit is significantly better.

    I find some things to like in this paper. They treat STR data in a better way than many other studies, and I think they've done the right thing in examining the question of heterozygosity versus allele number. That statistic is worth more consideration.

    From their introduction:

    The question of how many bottlenecks account for the distribution of modern human diversity has been relatively little studied (Rogers & Harpending 1992) and yields conflicting results. First, simulations indicate that the observed pattern is consistent with a linear stepping-stone model featuring a long series of founder events (Ramachandran et al. 2005; Liu et al. 2006). However, this does not preclude equally good fits based on other models. Equally, at the other extreme, large steps in single nucleotide polymorphism diversity between adjacent populations have been used to argue for two dominant bottlenecks, one ‘out of Africa’ and one around the Bering land bridge where humans crossed into the Americas (Hellenthal et al. 2008). The latter event is supported by both mitochondrial data (Wallace et al. 1985; Fagundes et al. 2008) and data from a few nuclear markers (Hey 2005). However, mitochondrial sequences only inform on female lineages, while the adjacent population approach is least reliable in regions like the Bering Strait where population samples are extremely sparse.

    They make a good point here: that "other models" may produce "equally good fits." That is a routine problem in "modern human origins" research -- alternative models are rarely evaluated, and people almost never take a null hypothesis testing approach.

    I've always been hesitant to give much credence to demographic studies based on microsatellites. The evidence for mutation-drift disequilibrium in a stepwise mutation model is an unusual pattern of variance among the individual length variances of STR loci. That's a complicated statistic, and it responds poorly to deviations from the pure stepwise mutation model. In particular, any constraints on allele length will eliminate outliers in allele length variance, making the population look like it went through a bottleneck of some kind.

    The current study looks for disequilibrium in the relation of heterozygosity to allele number -- the logic being that a population crash will eliminate rare alleles but not common ones, leaving heterozygosity nearly the same but cutting allele number substantially. They also are explicit about the problems of the stepwise mutation model:

    One problem with the Bottleneck test is that microsatellites do not follow a strict SMM. Known deviations include mutation biases favouring expansion or contraction (Xu et al. 2000), interruption mutations within the repeat tract that slow the rate of slippage (Jin et al. 1996; Kruglyak et al. 1998), occasional larger ‘jump’ mutations of several repeat units (Di Rienzo et al. 1994; Schlötterer et al. 1998) and some form of upper length boundary that prevents indefinite expansion (Amos & Clarke 2008).

    They go on to argue that their test is less susceptible to deviations in the mutation model. That claim deserves further evaluation. The information they provide about the pattern of variation of different classes of STR loci is useful, as it points to ways that the mutation model may have influenced the appearance of a bottleneck. But since they find "strong and consistent evidence of a bottleneck at the lowest variability loci." Given a correlation between diversity and strength of evidence of a bottleneck, and remembering that the signature of a bottleneck in their test is high diversity per allele, I would want to look for some mechanical explanation for the correlation.

    A (possibly additional) problem: The number of rare alleles within any single subpopulation is rapidly increased by migration from other subpopulations. All you need is a single migrant to bring in a new allele from somewhere else, and you've got another allele. Now, obviously this allele may not be sampled in any given dataset, so you have to account for sampling. But the point remains: it doesn't take much migration to increase allele numbers.

    Migration after any bottlenecks should, then, hide the evidence for them. It's not hard to imagine scenarios equally consistent with the data. For example, if every different population underwent a single bottleneck simultaneously, they would be unlikely to lose the same rare alleles, but would retain nearly the same heterozygosity. As migration resumed between them after the bottleneck, the rare alleles would be replenished in a way the reflects the subsequent migration rate and population size (the same number of migrants has a faster effect on allele frequencies in a smaller population).

    Or, if there were any interaction between a single bottlenecked ("founder") population and other pre-existing populations, it would tend to reduce the sign of a bottleneck. It seems plausible that the data in this paper might be explained by such interactions in South or East Asia, providing them with a store of rare alleles that didn't make it to the Near East.

    This is why it's useful to start simple. The simplest model in this case would include migration and expansion (which we know from non-genetic evidence happened) and no bottlenecks. Plumb these parameters to see if any acceptable fits turn up. They do with the question of the heterozygosity cline alone -- a simple trend of directionally biased migration is sufficient for that. Looking at the allele number together with the heterozygosity may reject that model, in which case you'd want to pick out the next simplest. In that sense, mtDNA may actually be giving more information than the hundreds of STR loci, because with mtDNA clades there is a way to estimate haplotype ages -- meaning that the demographic hypothesis must give rise to those haplotype ages in addition to the geographic dispersion of haplotypes and within-subpopulation diversity.

    References:

    Amos W, Hoffman JI. 2009. Evidence that two main bottleneck events shaped modern human genetic diversity. Proc R Soc Lond B (online) doi:10.1098/rspb.2009.1473

  • More on the X variation conundrum

    Sun, 2009-05-17 13:30 -- John Hawks

    Last winter I noted the contradiction between two papers that each attempted to explain variation on the X chromosome compared to the autosomes. They had come to opposite conclusions, based on discrepancies in their data. I noticed that they had used different methods of determining mutation rates for X chromosome loci:

    So, for their current paper, Keinan and colleagues (2008) try to correct for the recent divergence of human and chimpanzee X chromosomes. Simple enough -- rescale all X chromosome mutation events by the some ratio proportional to the human-chimp divergence discrepancies. In this case, they attempt to rescale to the human-macaque divergence. Since that divergence happened in the Oligocene, the discrepancies among chromosomes should slight compared to the overall divergence. I'd feel better if they actually tested this idea.

    Meanwhile, Mike Hammer and colleagues scaled X chromosome diversity to the human-orangutan divergence. They claimed that this gave the same results as the human-chimpanzee divergence. Which, if true, would obviously give a different outcome than the procedure followed by Keinan and colleagues, which was predicated on the idea that the human-chimpanzee X divergence is the wrong number to use.

    I had sort of forgotten about this (which drove me crazy at the time), but another question led me to revisit it late this week. In the intervening time, I see that Carlos Bustamante and Sohini Ramachandran (2009) happened across the same explanation that I had offered:

    It appears that the rest of the discrepancy is explained by different normalizations for background mutation rate differences between the X chromosome and autosomes (Hammer et al.10 used human-orangutan divergence and Keinan et al.9 used human-macaque divergence).

    So you read it here first. Which I suppose means that I should submit letters to journals more often. I don't because it seems to me that all I'm doing is reading and trying to understand papers, which sometimes takes more work than it should. On the other hand, I wonder how many people are really putting much effort into their reading...

    Meanwhile, Bustamante and Ramachandran add an additional explanation -- the different means of ascertainment, since Mike Hammer's group used resequencing to find variation, while Keinan and colleagues (2008) had used HapMap SNPs under a specific ascertainment model. They end their short piece by pointing out the value of further resequencing data:

    In order to address continuing questions on the nature of sex-biased processes, full genome sequencing of large numbers of individuals sampled from diverse populations will be needed. The upcoming 1,000 Genomes Project (http://www.1000genomes.org/), for example, will provide orders of magnitude more data for these types of analyses. We share the enthusiasm of the population genetics community that this will bring the potential for resolving continuing questions regarding how human history and cultural practices have shaped global patterns of genomic diversity.

    Ascertainment is a serious issue with the existing SNP data, because different SNPs were ascertained in different, non-commensurable ways. That's how I was led into reconsidering this issue this week, another set of data seem to have features that are partially explained by ascertainment, but partially not. It's hard to use existing data for some kinds of population genetics analysis, although others are less affected by ascertainment biases.

    So the 1000 Genomes effort will make some kinds of analyses simpler to accomplish. I suppose if ascertainment becomes less of a problem, we may see people focus more effort into understanding non-genetic sources of information, too!

    References:

    Bustamante CD, Ramachandran S. 2009. Evaluating signatures of sex-specific processes in the human genome. Nat Genet 41:8-10. doi:10.1038/ng0109-8

  • Plumbing for bottlenecks

    Fri, 2009-03-20 16:43 -- John Hawks

    My series on mutual information and tests of selection (which began with "Information theory: a short introduction") is at a branching point. One of the critical factors determining the power of such tests is the ancient rate of genetic drift. So it's important to come to some understanding of the archaeological record and our best estimates of ancient demography, so that we can independently test the hypothesis that genetic drift was very strong in recent human evolution. That's a long project, potentially the topic of several review papers. Since nobody else has put together these data in useful way for population genetics, I'm going to do it in one place. What you see in this series are my notes about this project. Being notes, they are not complete, but they may occasionally be better than any other sources. Where it's appropriate, I'll spin off the results for review and publication, and point to them here.

    Many geneticists believe that there were massive population bottlenecks within the last 30,000 years, citing both genetic and archaeological evidence in support of this proposition. Some claim that there have been significant population bottlenecks in the last 5000 years.

    Some archaeologists agree. However, I think this is one of those Inigo Montoya cases: "That word, I do not think it means what you think it means." Archaeology and genetics have completely different interpretations of the words, "bottleneck," "contraction," and "expansion." The result has been a lot of confusion about the relation of archaeological and genetic estimates of population size.

    A population bottleneck impacts genetics by increasing the rate of inbreeding. This takes time to change gene frequencies, and does so in inverse proportion to population size. It may seem surprising that a truly massive die-off, on the scale of the Black Death, will have no measurable genetic impact. But cutting a population of millions down by half just doesn't impact gene frequencies. That is, unless you are looking at genes that helped people to survive the plague, in which case you're looking at natural selection, not a bottleneck.

    A significant genetic bottleneck is not just any population contraction -- it's an event in which the population is cut by a large fraction for a long time. In paleontological terms, we're usually considering cases where the ratio of the number of individuals and the number of generations is near one. In other words, if you cut the population down to a thousand individuals, and keep it there for a thousand generations, you're going to have a large genetic impact. Likewise, you can have a significant bottleneck that's ten generations long, but you need to cut the population down to around ten people.

    You can do a bit better measuring inbreeding by looking at lots and lots of people to study very rare alleles, like a rare genetic disease in a founder population. There, you may spot changes that unfolded in ten generations, even in a relatively large population of a hundred people. Increasingly, as we develop larger and larger datasets of gene variations, we will add power to detect such events in human prehistory.

    In archaeology, a significant event is one in which fewer sites were occupied by ancient people in a well-studied region. The length of such a contraction depends on the sampling intensity and dating methods available -- it might be a hundred years or many thousands. Likewise, the magnitude of population contraction will be uncertain -- you can get an accurate estimate, but with substantial sampling error. As in genetics, there are other possible explanations for an apparent contraction. We might lack geological exposures of the right age, or people may simply have moved from formerly favored locations to new ones. Worse, it might just be that archaeologists haven't looked hard enough at a given time interval.

    Archaeology is necessarily imprecise about the census population that existed at any given time. So is genetics. Both have their strengths and weaknesses. We want these different areas of evidence to bear on the same prehistoric events.

    Too much, instead of testing hypotheses, people just line up chronologies and look for matches. A geologist may claim that African paleoclimate is important because it may explain ``modern human origins.'' An archaeologist may claim that a hiatus at a site is consistent with ``genetic bottlenecks.'' And the geneticist may claim that inbreeding in a modern-day genetic sample dates to a period of time corresponding to the replacement of one tool industry by another.

    Any might be a valid hypothesis, but we need to take it further, to actually provide some tests. I believe we can do better now, because of the growing amount of genetic information. But we're going to have to do away with the facile idea that we're looking for massive bottlenecks, we need to introduce a recognition of the role of selection in human genetic variation, and we need to start addressing the archaeological record as it really exists.

    That's a forward to what follows. I'm going through regions of the world at different time intervals, to discuss what we know about population size from the archaeological record.

    Next: No Late Pleistocene bottleneck in southern Africa

  • Data supplements driving me crazy

    Mon, 2008-12-22 23:58 -- John Hawks

    I'm about to pull out my hair reading "supplementary information" for papers.

    Two recent papers (by Mike Hammer's group and David Reich's group) attempt estimates of the diversity level of the X chromosome versus the autosomes. As discussed on Gene Expression this week, the two papers came to completely opposite results.

    In the olden days, ten years ago, I would simply put the two papers side by side and find the discrepancies. But nooooo, we can't do that any more. Now, all the relevant parameters from one of the papers (you guessed it, the one published by the Nature Publishing Group) are hidden away in a supplement.

    You'd think that might not be so bad, since I have the supplement. But I have to keep tracking the cross references to the paper to find out where the methods apply. It's a pain in the neck. Nobody else ever seems to complain. But that's because they simply don't read the papers! AAARGGGH!

    So what's the discrepancy in this case? I'm still working through these darned things.

    My first impression is that both papers use different methods to estimate the mutation rate on the X chromosome. It was Reich's group, after all, who claimed that the human-chimp divergence was followed by extended hybridization, a process that took over 4 million years in their estimation. The evidence was the X chromosome.

    So, for their current paper, Keinan and colleagues (2008) try to correct for the recent divergence of human and chimpanzee X chromosomes. Simple enough -- rescale all X chromosome mutation events by the some ratio proportional to the human-chimp divergence discrepancies. In this case, they attempt to rescale to the human-macaque divergence. Since that divergence happened in the Oligocene, the discrepancies among chromosomes should slight compared to the overall divergence. I'd feel better if they actually tested this idea.

    Meanwhile, Mike Hammer and colleagues scaled X chromosome diversity to the human-orangutan divergence. They claimed that this gave the same results as the human-chimpanzee divergence. Which, if true, would obviously give a different outcome than the procedure followed by Keinan and colleagues, which was predicated on the idea that the human-chimpanzee X divergence is the wrong number to use.

    The human-chimpanzee divergence discrepancy, if it exists to the extent claimed by Patterson et al. (2006), is probably enough to explain the discrepancies in the results of these two papers, and clearly in the correct direction. By assuming a low divergence date for the human-chimp X chromosome comparison, Keinan et al. have assumed a low mutation rate for the X. That means that the X variation in humans represents relatively less time, and therefore lower genealogical diversity and a lower effective size, than estimated by Hammer et al.

    But I don't think that's the end of the story. In fact, I think there are quite a few strange aspects of the results of both papers. Even though both papers explain their results in terms of demography, I don't think that avenue is very promising. The kinds of demographic changes that happened in the Late Pleistocene just don't look very much like those coming out of these papers. More on that later...

    What the Keinan et al. paper is showing is some substantial differences in the derived/ancestral ratio between populations, and large discrepancies in X diversity across different regions of the X. Large discrepancies would be expected between small regions due to the intrinsic variability of the coalescent process. But these large discrepancies exist between regions 3 centimorgans in length -- large enough regions that there ought to be less dispersion among them. The Asian and European samples have a strong deficit of derived alleles at frequencies lower than 30 percent, but the African sample has a slight excess.

    We'll apply some more simpleminded analysis to these data and see if anything interesting pops out. As they say, garbage in, garbage out -- but when the garbage consistently looks like banana peels, you can guess there's a monkey somewhere.

    UPDATE (2008/12/21): More craziness -- this article from New Scientist includes a quote from David Reich:

    However, the chance of finding archaeological evidence for these migrants is slim. "You're looking for a population that was there only a short period of time, perhaps only 10 generations, so the physical impact of that population in that environment wouldn't be enough to detect," Reich says.

    Surely he's not talking about a bottleneck 10 generations long, which by the estimate in the paper would mean an effective size of around 50 individuals. Surely not. No. It's just a quote in an article.

    Oh, heck. I think the point of all these recent papers that use "inbreeding ratio" instead of effective size and time as bottleneck parameters is to hide these kind of crazy numbers from peer review. We've got people out there who are talking about biblical models of human migration, like Noah-and-the-Flood level bottlenecks.

    And archaeology makes no difference. All those archaeological sites you've got? Well, they're not the ones who founded the world's population. Our actual ancestors made no impact on the environment that we can detect today. They were invisible.

    And hey, if results contradict each other? No worries. It's not like this is a refutationist science, after all:

    Their analysis also challenges a study published earlier this year, which found that all humans descend from fewer numbers of males than females. The researchers suggested that polygyny, where few men procreate with many women, accounts for this result.

    "It's possible, in principle, that both are true in some level," says Reich.

    Polygyny that occurred over the last million years of human evolution could have left an imprint in our genomes, says Michael Hammer, a geneticist at the University of Arizona, who led that study.

    Reich and Keinan, on the other hand, focused their analysis on the period when anatomically modern humans left Africa.

    "We'll have to figure out this issue in future work," Reich says.

    GAAAAAAAAHHHHHH! And you thought I was silly to be driven crazy by these papers! "It's possible, in principle, that both are true in some level."

    Pfui.

Pages

Subscribe to bottlenecks

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.