john hawks weblog

paleoanthropology, genetics and evolution

Asia

  • Neandertal introgression, 1000 Genomes style

    Sat, 2011-12-10 18:16 -- John Hawks

    For our project to understand pigmentation genetics in archaic humans, we had to find a good comparative sample of sequence data from recent humans. The original publication on the draft Neandertal genomes compared them to five low-coverage genomes from different Old World populations, along with the publicly available genomes from Craig Venter and others [1]. The first publication on the Denisova genome added an additional handful of genomes to these comparisons [2].

    Some of these handful of genomes from living people are more similar to the Neandertal and Denisova genomes than others. That simple fact is the proof that some living people have Neandertal and Denisovan ancestors.

    But until now, the comparison has been limited to a very small number of human genomes. That became a focus for critics of the Neandertal and Denisovan results. How could three or four genome sequences possibly provide an adequate representation of human variability? We could imagine scenarios in which the similarities between Neandertal and humans could be explained by some unsampled population, for example, northeast Africans [3]. Denisova does not present the same problem, because African population structure cannot possibly explain its resemblance to populations in Wallacea, Australia, and Oceania [2] [4]. But to compare either of these genomes, we should seek a broader sampling of genomes from living people.

    As I wrote yesterday, my students and I have been working to understand pigmentation genetics of the archaic human genomes ("Pigmentation of archaic humans: introduction"). I've emphasized the need to break the analysis into small steps. For this question, we need to examine whether the pattern of introgression around pigmentation genes is characteristic of the genome as a whole. If genes involved in pigmentation have systematically higher or lower levels of Neandertal ancestry, that will tell us a lot about the evolutionary history of pigmentation in recent and archaic humans. For this, we need a good comparative sample, and the 1000 Genomes Project provides the best sample available.

    The first step in assessing the pattern of introgression for pigmentation genes is to characterize the pattern of introgression across the whole genome.

    Yes, a whole-genome introgression analysis sounds awfully big for my "small steps" concept. But actually this is simpler than it might sound. Here's a teaser:

    The figures in this post are not from a whole-genome analysis; they include data from eight chromosomes that we prioritized because of our pigmentation analysis. I am licensing all of them under a Creative Commons ShareAlike license so that anyone can use them anywhere.

    UPDATE (2011-12-10): I finished the whole genome analysis and am updating this post and figures accordingly. The results are the same throughout, with the exception of the Europe-East Asia comparison, which now shows these populations to be significantly different across the genome as a whole. I have partially updated the figures and will finish these later today.

    The value of sequences

    The 1000 Genomes Project data have been updated several times in the last year, as both sequencing and analysis of the genomes have progressed (more information on 1000 Genomes Project website). We downloaded a release of SNP genotype calls from 1094 individuals, based on the low-coverage (average 4x) sequencing that has been carried out on the sample.

    A SNP (single nucleotide polymorphism) is a nucleotide site with at least two alleles present in the global human sample. These sites represent only one kind of genetic variation in today's populations. Many of the differences between people's genes are caused by insertions, duplications, deletions, transpositions, or inversions. But those kinds of polymorphisms can be challenging to study in low-coverage genomes, and we already understand quite a lot about SNPs in human populations from the earlier HapMap project [5] [6]. The HapMap provided the data underlying our 2007 paper on the acceleration of recent human evolution ("Why human evolution accelerated") [7].

    The drawback of earlier SNP variation projects is that they examined only a subset of SNP variation in a sample of people. To design a microchip that could provide a million or more SNP genotypes from a saliva sample, somebody first had to discover where in the genome SNPs could be found. So they took small samples of people, sometimes only a single person's two copies of the genome, and sequenced. Adding together SNPs found by several methods, they could get a representation of SNP variation across the whole genome in a population. But this process introduced a bias: the SNPs were ascertained in a sample that inevitably could not represent humans in other samples with the same accuracy. Initially, SNP samples were heavily biased toward people of European ancestry (upon whom most genetic work was originally done), and the HapMap project went to great efforts to increase the representation of other populations. But even with the best possible ascertainment, interpreting SNP variation requires us to jump through some theoretical hoops.

    Sequence data make life much easier for the population geneticist. Seriously, working on this stuff on the whiteboard is fun instead of a constant nightmare of sampling biases and spaces between markers. I have a bias myself, in that I find recombination hard to deal with. I love reticulation among populations, but I'd rather work with genealogies that look like proper trees instead of a liana-strewn mess. So looking at sequence data over short intervals makes me happy. Not as happy as beer aged in bourbon barrels, but happy.

    The 1000 Genomes Project SNP files represent every SNP mutation observed in the sample. In other words, these are sequence data, just with all the fixed (and therefore redundant) sites removed. Even so, these sequence data are not perfect. Low coverage means that some rare mutations in the sampled individuals will go unreported. We aren't typically interested in singleton mutations in the sample, except that missing them will introduce a bias upon our estimates of the time that common ancestors lived. Next-gen sequence reads are usually fairly riddled with errors. High coverage allows these errors to be removed with some confidence, but low-coverage genomes risk throwing out real SNPs along with the spurious ones. The publicly available files represent some analytical steps that we do not here control, so we have to work with the understanding that the data are not perfect.

    The 1000 Genomes SNP files have had a phasing algorithm applied to them, which attempts to assign genotypes to chromosomes. In essence, phasing tries to figure out whether adjacent SNP alleles belong to the same copy or to different copies of the same chromosome. The details of this phasing are not yet apparent, and for many reasons I am cautious about using phased data. The inference is often inaccurate for rare mutations, and the whole process tends to sneak assumptions about population history into the resulting dataset. I hate being forced to live with someone else's assumptions about human population history, and I typically try to avoid needing phased data. In this case, it looks like the data over short intervals are as accurate as they can be, given the limitations on coverage and sampling. We have moved forward by applying methods that make a bare minimum of assumptions.

    Counting derived SNP alleles

    David Reich and colleagues came up with an appealingly simple test of introgression, which they applied to both the Neandertal and Denisovan genomes. Eric Durand, Reich, Nick Patterson and Monty Slatkin described the method formally this year [8], which they call the D-statistic. Informally, this has become known as the ABBA-BABA test, after their labels for the discordant genealogies that the test compares. By and large, across the genome, humans living today share many more new mutations with each other than they do with an archaic human like a Neandertal. But sometimes two genomes are different from each other, and one of them shares a new mutation with the Neandertal.

    A human might share a mutation with a Neandertal because it actually isn't very new, and both inherited the mutation from some much more ancient population of humans. This scenario is called "incomplete lineage sorting", because humans today have multiple gene lineages that existed within some very ancient population, instead of these having been "sorted" cleanly into the different human and Neandertal populations. Incomplete lineage sorting does happen a lot between humans, Neandertals, and Denisovans. ILS is the normal mode of variation among recent human populations, who trace their genealogical histories back much further than the earliest "modern" humans. So if one human has a Neandertal allele, and another human has a different allele, it's probably no big deal. They both just inherited gene variants that already existed in our distant common ancestors.

    You can probably see already that if we had a way to estimate the age of an allele, we could tell whether incomplete lineage sorting is a credible explanation for any particular site. I'll leave that point for another post.

    In the meantime, if we pretend that we know nothing at all about the ages of alleles, we must find some other way to tell whether incomplete lineage sorting can explain Neandertal similarities. Reich and colleagues recognized that incomplete lineage sorting from ancient pre-Neandertal ancestors ought to be distributed equally among living people. If we look at every site in the genome where we have data from Neandertals, we should find that one living human genome should look like the Neandertal just as often as another.

    This insight led to their test. Take a pair of humans, count the number of times sequence A is like the Neandertal and sequence B is like a chimpanzee, and then do the inverse — B then A. ABBA-BABA.

    Why a chimpanzee? In most cases the chimpanzee allele will represent the ancestral state for humans. Living people can inherit ancestral alleles from Neandertals as well as derived ones, but the derived ones tend to be rarer and younger within human populations. If one living genome shares an ancestral allele with the Neandertal genome, we don't need incomplete lineage sorting or introgression to explain the pattern. For all we know, such a mutation originated after Neandertals were already gone. So we need to pay attention to the derived mutations, ones that are present in Neandertals but not in chimpanzees. Do a count of these across the genome, and if you find a living genome with significantly more than another, you've found evidence for introgression.

    Ed Green, David Reich and colleagues [1] [2] did a comparison of every possible pair of genomes in their modern human sample. These sequence data were gappy, so that sequence A might share different coverage with B than with sequence C. So it was necessary to consider each pair separately, counting all the sites where both human sequence and the Neandertal and chimpanzee sequences had data.

    The 1000 Genomes Project sample reports genotypes for every SNP for every sampled individual. So in principle, every pair of sequences should have data for every one of these sites. Again, we have to be cautious about the nature of the sequencing, attending to the possibility of systematic biases due to low coverage. But we really don't have to take the time-consuming step of comparing every possible pair of the 2188 resulting haploid genomes. We can just find the derived SNP alleles that are present in Neandertals and count how many of them are in each of the human sequences. If one sequence has significantly more Neandertal derived alleles than another, it had to get them somehow.

    That magic three percent

    The figure at the top of the post represents that count. Every individual in the 1000 Genomes Project dataset has two copies of the autosomal genome. Separating these two copies of the genome (basically arbitrarily) and counting up the shared derived features between each of those copies and the genome of Vindija 33.16, we obtain the histogram. Here it is again:

    The African genomes in the 1000 Genomes sample include Yoruba from Nigeria and Luhya from Kenya. The Asian populations sampled are Japanese and Chinese, including people of Han Chinese ethnicity in Beijing and southern China. The European ancestry samples include the CEU sample from Utah, as well as British, Tuscan, Spanish and Finn samples.

    The histogram shows that Asian and European genomes have significantly more Neandertal derived SNP alleles than do the African genomes. The averages for the Asian and European samples are around 3% higher than the average for the African samples. Whatever gave Africans some degree of similarity to Neandertals, non-Africans seem to have gotten around 3% more of it.

    Green and colleagues [1] assumed conservatively that Africans share derived SNP alleles with Neandertals only because of incomplete lineage sorting from the human-Neandertal ancestral population. This fraction should be the same in all human populations, under the assumption that Africans were mostly isolated from Neandertals for some period of time. The 3% Neandertal bonus outside Africa should then represent introgression from Neandertals into recent populations outside Africa.

    Both previous studies noted that genomes outside Africa are not significantly different in the fraction of derived SNP alleles shared with Neandertals. A genome from China and a genome from France carried the same fraction of shared derived SNP alleles with Neandertals. Here, we've confirmed that basic identity in the level of introgression in these populations.

    I have told several people now that I find the distributions in China and Europe spookily similar. On parts of the genome, the two distributions have means that are not significantly different. Indeed, I worked for a week with an analysis of eight chromosomes, in which the East Asian and European means were fewer than 100 SNP alleles apart. Even across the whole genome, Europeans average only 700 derived SNP alleles more than the East Asian sample. This small difference a bit more than a tenth of a percent) is strongly significant on these sample sizes. A t-test yields a p-value of 1.1 times 10-26 on the difference in means. Even so, the distributions of these two populations overlap across most of their ranges.

    Seeing these hundreds of genomes arrayed on a histogram provides much more information than we had from a handful of genomes. It is remarkable how much dispersion there is among genomes from a single population. Although the means of these two samples are nearly the same, you can see that each of them has a large range of variation in the shared derived SNP alleles with Neandertals. This variation means that people within a single population have very different proportions of Neandertal ancestry.

    This is not a graph of people, but a separation of the two copies of SNP alleles carried by these people. That separation is phased at short scales but arbitrary on the scale of a whole chromosome, so the histogram likely understates the variance among single genomes while it overestimates to some extent the variation among people with their diploid genomes. Still, it looks likely from these comparisons that some people in Europe carry more than a percent higher Neandertal ancestry than the average, and some carry a percent less. We can use statistical methods to test this hypothesis directly as applied to individuals in the sample.

    Neandertal genes in recently admixed populations

    A sample of hundreds of people allows us to demonstrate significant differences among the genomes of different populations. Some of the 1000 Genomes Project samples are from populations that represent historically recent admixture of people who trace their ancestry to different parts of the world.

    For example, the "ASW" population sample includes African-American people who live in the Southwest United States. We know from many other genetic studies that African-Americans vary in the fraction of ancestry they derive from Europeans and from Africans. The average amount of African and European ancestry varies among African-Americans who live in different parts of the U.S., as low as 3% and as high as 20% or more in some parts of the country. The proportion among individuals varies even more. So when we consider the ASW sample, we should expect to see a lot of variation in the number of shared derived SNP alleles with Neandertals, with a mean higher than African populations.

    Which is exactly what we do see:

    The ASW sample overlaps substantially with the Yoruba sample from West Africa (Nigeria) and slightly with the CEU sample, which includes people of European ancestry in Utah. The total in the ASW genomes is more variable than either the Yoruba or CEU population samples. If the higher mean in the ASW genomes reflects European ancestry from a population like CEU, the proportion of European ancestry would be around 17% for that sample of people. It would be hard to tell from these numbers alone how much of the variation in ASW is attributable to variation in ancestry fraction, and how much is expected within a population of homogeneous ancestry. As we'll see in some other populations, there are some appreciable differences among populations within a given region, and ancestry differences may add to the variation among individuals within populations.

    We see a similar pattern when we look at the Puerto Rican sample. Individuals in this sample have some ancestry from European, Native American and African ancestors. The comparisons by Reich and colleagues [2] and Green and colleagues [1] suggested that Native American populations have the same fraction of Neandertal ancestry as other people outside Africa. In the comparison with YRI and CEU samples, Puerto Rican (PUR) genomes are intermediate, with a mean suggesting around 15% ancestry from the West African population.

    The two outlier points in the Puerto Rican sample are the two genome copies from one individual, who we would hypothesize had much higher African ancestry than the average in the sample.

    Next...

    This post has taken me much longer than I expected to get to the point of talking about variation among samples within continental regions. It turns out that, despite the similarity of European and East Asian samples in their averages, there are substantial differences between samples within each of these regions.

    For example, here's a comparison of north and south Chinese samples:

    People of Han Chinese ethnicity sampled in Beijing appear to have on average a half percent more Neandertal ancestry than people of the same ethnicity sampled in southern China. I found these kinds of differences almost everywhere I looked within regions. More later...


    References

    1. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, et al. 2010. A Draft Sequence of the Neandertal Genome. Science [Internet] 328:710–722. Available from: http://dx.doi.org/10.1126/science.1188021
    2. Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PLF, et al. 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature [Internet] 468:1053–1060. Available from: http://dx.doi.org/10.1038/nature09710
    3. Hodgson JA, Bergey CM, and Disotell TR. 2010. Neandertal genome: the ins and outs of African genetic diversity. Current biology : CB 20:R517-9.
    4. Reich D, Patterson N, Kircher M, Delfin F, Nandineni MR, Pugach I, Ko AM-S, Ko Y-C, Jinam TA, Phipps ME, et al. 2011. Denisova admixture and the first modern human dispersals into southeast Asia and oceania. American journal of human genetics 89:516-28.
    5. The International HapMap Consortium. 2005. A Haplotype Map of the Human Genome. Nature [Internet] 437:1299–1320. Available from: http://dx.doi.org/10.1038/nature04226
    6. McVean G, Spencer CCA, and Chaix R. 2005. Perspectives on human genetic variation from the HapMap Project. PLoS genetics 1:e54.
    7. Hawks J, Wang ET, Cochran G, Harpending HC, and Moyzis RK. 2007. Recent acceleration of human adaptive evolution. Proceedings of the National Academy of Sciences, U. S. A. [Internet] 104:20753–20758. Available from: http://dx.doi.org/10.1073/pnas.0707650104
    8. Durand EY, Patterson N, Reich D, and Slatkin M. 2011. Testing for ancient admixture between closely related populations. Molecular biology and evolution [Internet]. Available from: http://dx.doi.org/10.1093/molbev/msr048
    Synopsis: 
    We're quantifying the amount of Neandertal ancestry in whole genome data from living people.
  • Asian Homo erectus

    Mon, 2011-11-07 23:59 -- John Hawks
    Synopsis: 
    Examining a sample of crania from the Early and Middle Pleistocene of Asia and Indonesia

    Homo erectus entered Asia as early as 1.8 million years ago. One of the earliest specimens of the species is the Modjokerto skull, from Java. The spread of this species across the tropical Old World was a major event in our evolution. After Homo erectus reached East and Southeast Asia, it had a long history — up to 200,000 years ago or even more recently.

    This station has several representatives of this Asian dispersal of early humans.

    • Trinil 2, Java, 1.2 million years old.
    • Sangiran 2, Java, 1.0 million years old.
    • Zhoukoudian L2, China, 700,000 years old.
    • Zhoukoudian L1, China, 700,000 years old.
    • Ngandong 10, Java, 200,000 years old.
    • Ngandong 8, Java, 200,000 years old.
    • Nganding 4, Java, 200,000 years old.

    What to do: Overall, these fossils are very similar. However, they come from a wide range of times. Make an attempt to seriate the fossils by cranial size. List the results of your seriation. Does it correlate with time?

    Try seriating the skulls according to the form of their frontal bone or supraorbital torus. This feature differs between fossil specimens from Java and China. Does your seriation indicate this difference in geography?

  • Denisovan DNA in the islands, and an Australian genome

    Thu, 2011-09-22 18:09 -- John Hawks

    David Reich and colleagues today report on the persistence of Denisova-like ancestry in island Southeast Asia and Australia (citation not yet available). Meanwhile, Morten Rasmussen and colleagues (citation not yet available) report on the whole-genome sequencing of hair from an Aboriginal Australian who lived some 100 years ago.

    The most obvious story: These data utterly destroy the hypothesis of a single out-of-Africa colonization of Southeast Asia by modern humans. Many human geneticists have argued our present pattern of diversity originated in a wave of successive founder effects coming from a single recent African origin. They were wrong.

    Instead, we can turn to a complex model with successive dispersals and episodes of population mixture. This is not a static model of isolation-by-distance; it is a dynamic model in which populations grow and spread across large spans of the Old World, again and again and again. By my count, at least three massive episodes of population dispersal and mixture are necessary in Reich and colleagues' model. A picture of their admixture hypothesis:

    Denisova admixture model from Reich et al. 2011

    This model depicts (a) an early divergence of an African (represented by Yoruba) and Asian/Australasian populations. These mix with first Neandertals and then (for the Australian/New Guinea/Mamanwa populations) with Denisova-like people. Later (b), after the initial habitation of the Philippines by the ancestors of Mamanwa, a population like Andamanese Onge pushes into the islands, mixing with the ancestors of New Guinea and Australian populations. Later still (c), a population ancestral to today's Chinese people mixes with Philippines and other Southeast Asian people.

    As complicated as it looks, even this model must be a vast oversimplification. I don't like or attribute much belief to mixture models like this, as they assume too much about relative population sizes and the timing of mixture. Many recent hunting and gathering populations of Southeast Asia are not included in the current samples, and the Chinese sample is itself the result of very recent demographic events, covering what once may have been a wider diversity of peoples. Depicting Australian and New Guinean populations as monolithic is an artifact of the small sample; these places themselves housed a tremendous diversity of peoples. Nevertheless, the true model won't be simpler than this one; it will involve many more events that the data cannot yet resolve.

    Hints of that complexity emerge from the Aboriginal Australian whole genome. Rasmussen and colleagues show that this individual shares some ancestry with East Asian peoples, but on the whole populations in Europe and East Asia are much more genetically similar to each other than to this genome. The picture from the whole genome is essentially the same as that drawn by the SNP comparisons by Reich and colleagues, but with the potential (in the long run) to actually trace the histories of individual genes. And I think the gene-by-gene account of history will be important, because we already have some evidence that a few Denisovan genes do persist in mainland Asia, even though most are gone.

    To explain why, we can look at the proportion of Denisovan ancestry in different populations as depicted in a map by Reich and colleagues. The pie charts are confusing here, because they report the fraction of ancestry from Denisovans in each population relative to the 5% estimate for New Guinea. So Australians also have 5% in this figure, Timorese have around 2.5%, and Bougainville has more than 4%.

    Notice the apparent lack of Denisovan ancestry in anyone who lives anywhere that was once connected by land with mainland Asia. I say "apparent" deliberately: Abi-Rached and colleagues reported last month on the widespread distribution of Denisovan HLA types among today's Asian populations, and those may well be products of Denisovan genes that were later selected. I've already identified a handful of other loci that seem to reflect Denisovan ancestry in mainland Asian people. According to the comparisons by Reich and colleagues, such loci must be exceptions.

    At the same time, the mixture model presents an important idea: Once there were people in Southeast Asia who had much more Denisovan ancestry than any populations still remaining today. Both Australian/New Guinea populations and Philippine populations like the Mamanwa have subsequently mixed with new immigrants who lacked any sign of Denisovan ancestry. Prior to this later mixture, the ancestors of those populations must have been more Denisovan -- Reich and colleagues estimate 7%. This is the first evidence that ancestry from archaic people of Eurasia was diluted to a lower value by later population movements. If the population mixture originally happened somewhere in mainland Asia, any traces of Denisovan ancestry in those areas has been diluted almost to nonexistence. But the persistence of some genes would be predicted if natural selection were maintaining them in the face of demographic pressure from elsewhere.

    About the Australian genome, there will be much more interesting analyses to come, I expect. As whole-genome data come to represent more of the variation within human populations, we get a larger store of information about how we came to be variable. Variation traces not only to population movements and demography, but also to natural selection. Australia's population history has been very different from many populations of the Old World, and this genome should give us new perspective on the effects of that demographic history.

    Synopsis: 
    The hypothesis of a single out-of-Africa dispersal is rejected by new data about Denisovan mixture and whole-genome sequencing of an Aboriginal Australian.
  • Agriculture, population expansion and mtDNA variation

    Mon, 2011-05-23 11:50 -- John Hawks

    Earlier this spring, I wrote about a paper by Brenna Henn and colleagues that presented new data on SNP variation in recent African hunter-gatherer populations [1] ("Population structure within Africa: has 'modern human origins' become a non sequitur?").

    Another paper that came out this spring from the same research group is also very interesting. Christopher Gignoux, Henn and Joanna Mountain [2] examined the evidence for Holocene population growth in Europe, Africa and Southeast Asia, from within-haplogroup variability of mtDNA haplogroups. The idea is that earlier samples were not finely resolved enough to examine events of the last few thousand years, either because they included only small sequences (e.g., control region) with limited variation, or because they included whole mtDNA genomes with too few individuals to look at within-haplogroup coalescents. So here they add more individuals. It is still a small number (425 total) and so I expect that we will see better ones in the next few years.

    The results are nonetheless useful because they provide some nice matches for the archaeology of early agriculture. For example, in Africa:

    We find two periods of population expansion within our sample of lineages originating during the Holocene in western Africa. Although the majority of coalescent events occur during the Holocene, a number of lineages from this sample also coalesce during the Upper Paleolithic. The earliest growth begins at ≈38,000 ya (CI: 33,500–45,000 ya) (Table 1 and Fig. S1) and the second period begins at ≈4,600 ya (CI: 3,000–10,000 ya) (Table 1 and Fig. 1B). The correspondence between the timing of genetic evidence for a sharp increase in population size at 4,600 ya in our Holocene sample of sub-Saharan Africans and the archaeological evidence for origins of agriculture in western Africa is quite close (Fig. 1B and Table 1). In contrast, our southern African Upper Paleolithic sample representative of hunter-gatherers shows no growth over the past 20,000 y. We suggest Bantu-speaking farmers and other pastoralist groups migrated throughout southern Africa 2,000 ya (27) without impacting southern African mtDNA lineages (Fig. 1B).

    We can't really understand the pattern of genetic variation within Africa without understanding when the population grew. In Africa, Middle Stone Age genetic variation must have been more extensive than that in other regions of the world. But the survival of that MSA variation to the present day depends on the demography of populations over the past 50,000 years. In a growing population, fewer lineages will be lost by random genetic drift. So if Gignoux, Henn and Mountain are right about the growth of West African populations by 35,000 years ago, we might expect that region to preserve some extensive variation from MSA times. That might explain why that population preserves very deep Y chromosome lineages [3]. Regarding only mtDNA, one might conclude that a historical paucity of migration between hunter-gatherer and agricultural groups would be the most important reason why MSA variation remains in the present-day African population. This has been the explanation for survival of deep mtDNA lineages in southern Africa, for example. The Y chromosome result and the current paper remind us that population growth can also preserve variation from earlier time periods.

    I think this proposal of African population history matches very well the model that we assumed in our acceleration paper [4], which we based on the archaeological record. We suggested early population growth in Africa by 35,000 years ago followed by an agricultural expansion after 5000 years ago. The evidence for relatively late agricultural intensification, within the last 4000-5000 years in sub-Saharan Africa, is very clear archaeologically. Less clear: How big was the earlier, pre-agricultural human population? The LSA might correspond to a demographic intensification, generally after 45,000 years ago. Genetics has certainly seemed to support such a view, and we found it consistent with the evidence that positive selection had increased in rate much earlier in Africa than in other regions. Still, the more detailed study by Gignoux and colleagues helps to clarify this picture.

    The results also show agricultural population growth to have been late in Southeast Asia.

    Direct archaeological evidence for rice agriculture in southeastern Asia dates to only ≈4,400 ya in Thailand (28). Agriculture spread throughout Island Southeast Asia, with evidence of rice in Taiwan again dating to ≈4,400 ya. Our Southeastern Asian Holocene population size curve indicates expansion beginning ≈4,700 ya (CI: 3,000–5,700 ya) (Fig. 1C and Table 1).

    Again, useful. I think we need to exert some effort making sure that the initial dispersal of people into South/Southeast Asia can be differentiated from the post-agricultural history. But assuming that Gignoux and colleagues are correct, it makes sense in an overall picture of slowly adapting early crops to tropical climate regimes, or replacing early domesticates with different ones in those areas.

    I am less sanguine about their results for Europe. They show a gradual period of growth associated in time with the Younger Dryas (around 12,000 years ago), which could make sense in the archaeology. But I am not convinced that the "European" haplogroups here are really European to that time depth. We know that the Neolithic and post-Neolithic saw some large-scale shifts in the frequencies of mtDNA haplogroups in Central and Western Europe. Some Upper Paleolithic Europeans probably contributed mtDNA to this later population, but I have no confidence that the proportion was great enough to accurately infer the demography of that pre-Neolithic population. (This is also a problem with the current paper in Current Anthropology by Peter Rowley-Conwy. I'll discuss this sometime soon.)

    The next frontier in reconstructing the population history of Europe will be ancient DNA. A good sample of Neolithic and pre-Neolithic whole mtDNA genomes would settle this question and allow inferences about the kind of demographic recovery Europe underwent after the Last Glacial Maximum.

    An open question is to what extent the other populations have similar problems. The European population of today reflects West Asian population dynamics 10,000 years ago. The East African population today reflects West African population dynamics from before the Bantu expansion, possibly to a similar extent. The population of Southeast Asia reflects the population dynamics of early rice agriculturalists in South China. And so on.

    Adding large-scale migration and partial population replacement to this kind of demographic analysis is not easy, but it will be essential if we want a better picture of how agriculture affected human populations. Considering these problems, I think it's easy to see why I started working on Holocene population dynamics. Evidence about Late Pleistocene populations, like MSA Africans and Neandertals, still lies within our genomes. But we see it through a lens. Holocene population dynamics -- movements and population growth -- distort that lens. If we don't account for those Holocene dynamics, we will conclude wrongly about the earlier dynamics.

    I like this a lot, because this is what anthropology is really good for. We can bring a lot of archaeological and historical knowledge to bear on the question of post-agricultural population dynamics. But it's a deep, deep field with a lot of specialized literature.


    References

    Synopsis: 
    A study of mtDNA variation attempts to find the times and magnitudes of population expansions in early agriculturalists.
  • Older and younger Acheulean in India

    Sun, 2011-03-27 00:37 -- John Hawks

    Shanti Pappu and colleagues [1] report on date estimates resulting from new excavations at the old site of Attarampakkam, India. The news element is that they date an Acheulean occurrence to as old as 1.5-1.6 million years ago. At the oldest, these dates would make the Acheulean in India equal in age to the earliest occurrences in Africa.

    The dates themselves depend on the decay of cosmogenic nuclides in the artifacts themselves. This is a kind of exposure dating -- as the artifacts are exposed to cosmic rays at the Earth's surface, they build up radioactive isotopes of beryllium and aluminum (10Be and 26Al), which have half-lifes of 1.39 million and 717,000 years, respectively. When they are buried deep underground, their exposure to cosmic rays stops, and the radioactive isotopes can only decay. Then the ratio of the two isotopes in the sample reflects the time since deep burial. But like other exposure methods, in practice this depends on a model of exposure time, burial speed, and radioactivity within the soil, which lends substantial uncertainty to the dates. The lower 95% confidence interval of each of the date estimates reported in the paper is still over a million years, leading to the minimal conclusion that the site is that age or older.

    Robin Dennell has written an accompanying short essay that gives a broader view of the Acheulian in South Asia [2]. The essay includes a great paragraph summarizing the now-obsolete idea that Acheulean reached India only a half million years ago:

    How does this new evidence affect our understanding of the South Asian Acheulian? Previously, the general consensus was that the Indian Acheulian was less than 0.6 to 0.5 Ma (5) and was thus much younger than that in the Levant (eastern Mediterranean). There, the earliest dates of 1.4 Ma, from ‘Ubeidiya in Israel, probably indicate a dispersal of hominins from Africa (6). A second influx of African immigrants is indicated by the discovery of African types of cleavers and hand axes at Gesher Benot Ya'aqov (GBY), in Israel, dated to 0.78 Ma (7). This evidence implied that the Acheulian dispersed eastward toward South Asia only several hundred millennia after it first appeared in the Levant. It also implied that the spread of Acheulian bifacial technologies into South Asia was broadly contemporaneous with its first appearance in Europe, where the earliest sites date from ∼0.5 to 0.6 Ma (8). Some have attributed this expansion of the Acheulian into South Asia and Europe to Homo heidelbergensis. This Middle Pleistocene type of hominin is known mostly from Europe, where it was first defined, but is also recognized by some (but not all) researchers at African sites such as Bodo, Ethiopia, and Kabwe, Zambia, and even at some sites in China (9).

    The "Homo heidelbergensis" model is in such utter disarray right now, I'm not sure many paleoanthropologists have realized the full extent of the problems. You should know that I don't believe in Homo heidelbergensis, never have. A couple of months ago, I was discussing some of the issues about mutation rate estimation with a very prominent geneticist, and the conversation turned to Homo heidelbergensis. What a shock the Denisova sequence should have been to those itching to see a H. heidelbergensis incursion into Asia!

    Notice however, the intrinsic nuttiness of archaeological interpretation. Oh, we have the first evidence for Acheulean in India around 600,000 years ago? Well, that's around the same age as the Bodo fossil from Ethiopia! What a coincidence! Maybe this new kind of hominin expanded from Africa and carried the Acheulean to India! And Sima de los Huesos is around 600,000 years old, too -- and there's a handax in the pit! My gosh, we need a name for those hominins!

    Well, the nice thing about a hypothesis built on mere coincidence, is that it only takes one observation to falsify it. Million-year-old handaxes in India ought to do it, and how. That's the message of Dennell's essay, and the subtext of the paper by Pappu and colleagues. What I find interesting is the extent to which the fact was hinted by earlier discoveries in South Asia but hampered by weaknesses in stratigraphic control and dating. From Pappu and colleagues:

    Sparse radiometric ages from sites in India have situated the Acheulian within the Middle Pleistocene, with a few dates suggesting an early Middle to Early Pleistocene age. However, these ages often exceed the limits of confidence of the methods used (2). They include an electron spin resonance (ESR) mean age of 1.27 ± 0.17 Ma, assuming linear U uptake, on two herbivore teeth from Isampur (23); an ESR age of ~0.8 Ma (lacking uncertainty envelopes) on calcrete from the Amarpura formation, Rajasthan (24), which has been correlated with the Acheulian site of Singi Talav (4); dates ranging from ~1.4 to 0.67 Ma for the tephra at Bori (Kukdi river) (25); and paleomagnetic measurements with evidence of reversals at the sites of Bori, Morgaon, Gandhigram, Andora, and Nevasa (26). However, the reliability of these ages has, in each case, been questioned on various grounds (5, 27, 28). Likewise, the age and stratigraphic position of artifacts and faunal remains from the Early Pleistocene Dhansi formation along the river Narmada are yet to be firmly established (29). Based on data from controlled excavations and two independent dating methods, our ages from Attirampakkam show that the Acheulian in India is older than previously thought. Evidence from other sites in South Asia should be reconsidered and redated.

    Much evidence already exists in the South Asian Acheulean that could be more accessible. The Acheulean in the region has been a long block of undifferentiated time, despite some very well-resolved sites. In addition to this much older dating for early Acheulean, India also has some of the youngest Acheulean assemblages anywhere -- for example, Haslam and colleagues [3] earlier this month reported on an Acheulean assemblage from around 130,000 years ago in northeastern India. That's long after the large biface tradition begins to give way to Middle Paleolithic and MSA toolkits in Europe and Africa.

    On the topic of Denisova, Haslam and colleagues were writing before that genome was reported. But they did know about the Neandertal genetic results, including the evidence of Neandertal ancestry within India. Nevertheless, they assert a scenario in which the makers of earlier and later Acheulean in South Asia are the same biological population, without substantial gene flow from regions to the west, including the Neandertals.

    Recent reports of the draft Neanderthal genome suggest that Neanderthals and H. sapiens likely did interbreed successfully soon after the latter had left Africa (Green et al., 2010), with the probable location of such contact to the west of India, in the Middle East. The southern limit of the Neanderthal range is unknown (Dennell and Roebroeks, 2005), but we emphasise that the continuity seen in the Middle Pleistocene South Asian technological record suggests that taxa derived from earlier hominin dispersals, and not Neanderthals, were the creators of the Indian Late Acheulean. Greater biological separation between dispersing humans and resident Indian hominins may have precluded viable genetic mixing (although see Liu et al., 2010 for an alternate view from East Asia), while similarities in certain technological strategies may have rendered cultural exchange a somewhat more likely occurrence.

    Well, the Denisovans didn't have to live in India when the ancestors of Melanesians ran across them and intermarried. But Denisova and the Neandertal genomes now make it very likely that the inhabitants of South Asia were one or the other. And even if South Asians were yet a third group, as yet unattested from genomes, it is no longer credible to suppose that they were isolated from Europe or Africa for a million years previous. The tools just don't have that much to do with the populations.


    References

    Synopsis: 
    Long known from India, new papers are adding detail to the temporal extent of the Acheulean.
  • Orangutan dynamics of Borneo

    Wed, 2010-11-24 01:46 -- John Hawks

    Bornean and Sumatran orangutans are the most highly divergent subspecies within any of the living species of great apes. The two farther apart even than chimpanzees and bonobos, which are good biological species. The time of the Bornean-Sumatran orangutan divergence as estimated from mtDNA is around 3.5 million years ago.

    This is old enough that many primatologists consider the two populations as separate biological species. The species distinction is supported by some aspects of morphology, but as yet we have no good nuclear DNA information about the extent of divergence. In chimpanzees, nuclear genetic comparisons suggest a relatively recent founding of one subspecies and recurrent gene flow between the others, despite high mtDNA divergence between the subspecies. So information from across the genomes of Bornean and Sumatran orangutans may be necessary to substantiate the hypothesis of long isolation suggested by mtDNA.

    Within Borneo, different local populations of orangutans have strong genetic differentiation, with few shared mtDNA haplotypes among them. A new study by Natasha Arora and colleagues [1] has provided further detail about these relationships within Borneo. Based on earlier work, they expected to find high population differentiation within Borneo, and that is what they found:

    [O]ur analyses revealed high and significant mitochondrial differentiation, with populations within currently recognized subspecies generally displaying as much differentiation as those between subspecies. Of notable interest is the great extent of subdivision and lack of reciprocal monophyly for the morphologically recognized subspecies P. p. morio and P. p. wurmbii. MtDNA haplotype sharing is uncommon and for populations separated by rivers occurs only in two instances: (i) for SA and GP and (ii) for the northern and southern populations across the Kinabatangan river. In both cases, very recent common ancestry could explain the incomplete mtDNA lineage sorting. For North Kinabatangan (NK) and SK, Jalil et al. (27) proposed an expansion from a recent common refugium further west in Mount Kinabalu, as posited for other Bornean species (46, 47, 49). DV, with its low haplotype diversity, might also be the result of a recent range expansion. GP is located proximally to the Bangka–Belitung–Karimata–Schwaner divide, from where orangutans are presumed to have dispersed to the rest of Borneo (12) and where we might expect a rich haplotype diversity. However, the presence of only one mtDNA haplotype shared with populations further east suggests that the current population in GP is recent and/or underwent a severe recent bottleneck. This and other local bottlenecks make it impossible to reconstruct a colonization of Borneo through the southwestern “choke point” (52).

    They were able to confirm the relatively strong differentiation of Bornean populations by examining nuclear microsatellites. These do not give a great indication of the time period over which the populations may have developed their differentiation, but the microsatellites do document the relative lack of allele sharing between the populations, attesting a history of low gene flow in the recent past. The populations they identify as strongly differentiated do not correspond entirely with the subspecies recognized along morphological lines, but there are strongly differentiated populations here.

    The "news" aspect of the paper is the one unexpected observation: the mtDNA ancestor of Bornean orangutans lived relatively recently, only around 176,000 years ago (with a range of error stretching from 72,000 to 320,000 years ago. The data in the study do not allow us to distinguish whether this was a time when the Bornean population may have been founded, or whether instead the mtDNA lineage spread through pre-existing populations. The authors pursue the hypothesis that Bornean orangutans were limited to a refugium sometime during the early Late Pleistocene:

    Assuming that orangutans arrived in Borneo around the same time as gibbons and macaques, the recent coalescence of Bornean orangutans could be explained by a bottleneck through a severe rainforest contraction. Such a bottleneck would have had a more dramatic impact on the mtDNA structure of orangutans compared with other species as a result of their low densities and slow life histories (18) as well as habitat requirements.

    The comparison with gibbons and macaques is necessary because both have substantially deeper mtDNA coalescence times within their Bornean populations. If the forest had been substantially reduced to a small area where orangutans could survive, we might expect the other primates to reflect this event -- and they don't. Nevertheless, a grab-bag of climate change scenarios appear next:

    Geomorphological and palynological data indicate the presence of dryer, more open vegetation in southern and western Borneo during the last glaciation (2, 41), and by extrapolation also during other glaciations (but c.f. refs. 42, 43). Climate change was especially severe during an extended cold period within the penultimate glaciation between 130 and 190 ka (44, 45), which occurred approximately at the time of mean coalescence of Bornean mtDNA haplotypes. More recently, the last Toba eruption approximately 74 ka resulted in a short, albeit signi␣cant, decrease in regional temperatures, ensued by a 1,800-y cold stadial (9, 10). Our data do not provide clear signals to make conclusive statements about potential Toba effects. Nonetheless, the coldest period of the penultimate glaciation (44, 45) was more prolonged than the cold period following the last Toba eruption, suggesting more severe effects of the former on the extent of rainforest across Sundaland. In any event, suitable rainforest habitat for orangutans should have existed in certain regions in Borneo where a refugium population survived the dry glacial conditions.

    A coalescence time of 176,000 years ago does not point to a short-duration bottleneck that began 74,000 years ago. If orangutans in the Middle Pleistocene of Borneo had high genetic differentiation, a crash would have to have been very severe -- eliminating all but one small regional population -- to have effected the present distribution. Still, the great uncertainty in the actual coalescence time leaves open many possibilities, and the refugium hypothesis in the general case is worth testing, even if the Toba eruption in particular cannot explain the data.

    Given the uncertainty about the habitat structure of the now-submerged areas of Sunda, we may also want to consider the hypothesis that the present orangutans arrived recently on Borneo from mainland Southeast Asia. Even if orangutans had lived on Borneo during the Middle Pleistocene, they may not have been the current orangutans. Or even better, they may have been Neanderorangs -- an initial population that was genetically swamped by migrants arriving from elsewhere. The deep Sumatra-Borneo divergence means that the Bornean population was probably not recently derived from Sumatra, but that's a very restricted source compared to the Late Pleistocene distribution of orangutans across mainland and island East and Southeast Asia.

    Some other animals walked from Sumatra to Borneo repeatedly during the Pleistocene, including humans. In the human case, we know that a large fraction of the genetic ancestry of Bornean and Javan people was derived from Asia within the last 100,000 years -- in other words, Late Pleistocene gene flow. The movement of genes may have happened in the context of a dispersal of Asian (or ultimately, African-derived) populations into island Southeast Asia. The paper includes some discussion of other primate species:

    For instance, the south Bornean gibbon Hylobates albibarbis and the Sumatran–Malaysian gibbon Hylobates agilis have a TMRCA of 1.56 Ma (36), and Bornean and Sumatran pig-tailed macaques have one of 3 to 4 Ma (37). By contrast, the Bornean–Sumatran common ancestor of both the silvered langur(39) and clouded leopard (40) is much more recent than that of orangutans, gibbons, and pig-tailed macaques, probably because of a higher ␣exibility in habitat use.

    The pig-tailed macaque divergence time is more or less the same as the orangutan divergence; the others are more like the time range for human dispersals into island Southeast Asia. We can add to the primates a few other medium-sized mammals; for example, clouded leopards are highly differentiated between Sumatran and Bornean populations, and their mtDNA divergence occurred sometime after 3 million years ago.

    There may be no contradiction between the recent mtDNA common ancestor and the high degree of population structure in Bornean orangutans; the mtDNA could have been selected. We really would want resequencing of a lot more loci in these orangtuan populations, for which we may not have to wait too long. Mitochondrial DNA is convenient in many ways, including its greater sensitivity to restricted population size and higher mutation rate. But the intrinsic variance of a single gene system under genetic drift is so high that this disadvantage probably outweighs all advantages for reconstructing population sizes.

    At any rate, the orangutans now provide an additional case where the subspecies-level history of hominoids is more complex than depicted five or six years ago. Uncovering these kinds of dynamics highlights the need for better modeling of demography and dispersal within a geographically widespread species. Isolation-by-distance and long-lasting subspecies are well-defined models, but when they are refuted, we have a lack of well-defined alternatives.


    References

  • Mailbag: The Neandertal fraction

    Tue, 2010-09-07 15:22 -- John Hawks

    Re: Neandertal DNA

    I have a question about your "Neandertals Live!" entry written on May 8, 2010.

    When you say that living non-African populations (ancestry) derive
    1-4% of their genomes from Neandertals, does this mean all living
    individuals of non-African descent have some genomic contribution from
    Neandertals? In other words, could one say if you or myself
    specifically have some kind of Neandertal DNA contribution? Or, does
    the 1-4% only refer to certain populations outside of Africa, while
    nothing can be said about individual non-Africans?

    For example, would having Neandertal genes be analogous to certain
    populations, like certain ethnicities, having a particular founder
    mutation on a haplotype, like sickle-cell anemia in people of African
    descent? In other words, some living groups of individuals have them,
    but not all living individuals have them?

    The comparison results from the greater similarity of European (and other non-African) people to the Neandertal sequence, compared to African people. It takes 1-4% genetic contribution to explain this similarity.

    That's an unusual comparison, and it leads to unusual limitations. The number is genome-wide and we don't know (yet) whether some parts of the genome are more consistently Neandertal than others. We also don't know (yet) whether Africans have no Neandertal at all, or just 1-4% less than non-Africans.

    We know nothing at all about individuals (at this moment) although I expect we'll be able to say something about the heterogeneity of Neandertal contribution fairly soon.

    I expect that some genes will have a very common Neandertal-derived haplotype outside of Africa because of selection, and that these will account for a predominant fraction of the admixture. But I can't say we know this yet empirically.

  • Ngandong interview

    Fri, 2010-07-30 10:58 -- John Hawks

    Nature News has run a nice interview with Russell Ciochon about the new excavations at Ngandong, Java.

    We've been excavating for 24 days without a break. The days blur together and we often lose track of time. There is a routine to systematic palaeoanthropological excavation: opening an excavation pit, digging down to the bone bed, carefully mapping the strata as we proceed, exposing the fossils, assigning the fossil a number, charting its xyz coordinates, removing the fossil, and then sampling the strata for geological analysis and dating.

  • More on Tibet, demography and selection

    Tue, 2010-07-06 12:30 -- John Hawks

    My post about the Tibetan high altitude selection story last Friday summarized the research and included some criticism of the demographic model applied in the paper by Yi and colleagues. This weekend, I had some correspondence from study coauthor Rasmus Nielsen.

    Nielsen was kind enough to provide a lot of information about how they arrived at their demographic model. Also, his comments are of substantial interest as a perspective on science journalism. I have posted them in their entirety, and have added my own perspective below them. Click through to read on:

    Nielsen:

    I read your blog on the EPAS1 gene. You write that my answers to Nicholas Wade in the NYT article are lame. I couldn't agree more. Reading the quotes Wade put together from a long phone interview and two replies to follow-up requests by email for further information - I could get quite convinced about my lameness myself. Let me give you our side of the story:

    (1) Regarding effective population size estimation: we fit several different demographic models to the data. The best fitting one according to the Akaike information criterion was chosen in the paper to use for the coalescence simulations. But notice that we made no strong claims about population sizes in the paper. They appear in the supplementary information to ensure that other people could reproduce our study. The main objective for fitting a demographic model was to allow us to perform coalescence simulations under a model that fit the data well. The model described in the paper fits the data very well and was the best fitting model we could find. As such - it was our best option for how to calculate p-values - and was certainly, in our opinion, better than providing no p-values, or use p-values based on some simpler model that did not fit the data. Had we used another model with different values of Ne, we would have obtained less accurate p-values.

    However, we did not interpret the effective population size estimates strongly - mostly because we do not believe they have very much to do with census population sizes. I would argue that this is true for both this study and other similar studies on other populations. Estimated effective population sizes are not only a function of changes in population size, natural selection, male/female ratios and variance in offspring number. They also rely on the structure of the populations. A population organized into many small sub-populaiton might have an Ne that is substantially larger than N, while a population without sub-structure might have a much smaller Ne than the census size if there has been fluctuations in the population size or higher variance in offspring number than that expected from a Poisson. Therefore, it is wrong to interpret estimates of Ne as estimates of actual number of individuals - or to believe that there is some simple general relationship between effective population size and true number of individuals. For this reason, we did purposefully not provide an interpretation of the estimates of Ne in terms of actual values of N and I feel that our work is not being represented accurately by arguing that we obtained estimates of the number of Han individuals or Tibetans living 3000 years ago. That does not mean that we cannot try to understand why we get such a small Ne for Hans 3000 years ago and such a large estimate for Tibetans. The most likely explanation for the Hans is that there have been other bottlenecks that we have not modeled - before or after. If we estimate Ne for Europeans today using a model that does not take all the bottlenecks into account, we get estimates of about 5-15,000 individuals. I don't think anybody would claim that there are only 5-15,000 Europeans alive today. Similarly, our estimate for Ne for the Hans 3000 years ago is in the hundreds presumably because there were some previous bottlenecks that we have not modeled. Ancestral bottlenecks can be extremely hard to date from frequency spectrum data - and you end up getting the same likelihood for a long time period with small population sizes and a short time period with extremely small population sizes. The have been several published papers making this point, the first one I believe to be Adams and Hudson. 2004. Genetics 168:1699-171. Changing our model to having a larger population size 3000 years ago but with an appropriately modeled preceding bottleneck would produce more or less the same p-values - because it would produce the same expected frequency spectrum (or at least something very similar).

    Regarding the large Tibetan population size, it may likely be affected by population structure within Tibet and/or by admixture with other individuals. Both of these factors would inflate the estimate of Ne. We did try some other models - but ended choosing this particular model because if fit the data the best. It seemed, therefore, most appropriate for the coalescence simulations. Again, I want to emphasize that we did not attempt to estimate number of individuals living in particular places during particular times - we were interested in finding a model which fit the distribution of allele frequencies well so that we at least could make some attempt at estimating relevant p-values. We never claimed that there were just a few hundred Han individuals alive 3000 years ago - in the same way that we are not arguing that there are only 5-15,000 Europeans alive today.

    (2) Regarding the divergence time: none of the models we fitted could explain the data with a divergence time much larger than 3000 years. If you look at the figure in the paper, you can see that there is an extremely strong correlation between the allele frequencies in Hans and in Tibetans. This is very difficult to explain with a long divergence time of genetically separated populations. To maintain such a strong correlation for a large amount of time, the Tibetan population (and the Han population) would have to be enormously large - and this is incompatible with the observed levels of variation in the population. We could not find a model that fit the data and which included a large divergence time no matter what we did. But there are of course many factors going into these estimates - including a calibration of number of mutations with the chimp, a number of demographic assumptions, and assumptions regarding generation times. If we are making errors on these assumptions - the estimates could change in one way or another. For that reason I feel it is most conservative to avoid arguing that our analysis definitely rejects that the divergence time could be 6000 years. The main objective of the paper was after all to investigate the evolution of altitude adaptation. The demographic analysis was there mostly to allow us to do the coalescence simulations - but we also used them to make the argument that this selection has occurred quite recently - and not say 10k or 20k years ago. It is quite clear from the data that such long divergence times cannot be supported by the data

    This being said, we of course want to know if this short genetic divergence time is compatible with other evidence. I would argue that it is. There has been several migrations into Tibet. It is entirely compatible with the archaeological record that individuals living in Tibet today genetically mostly are descendants of migrants arriving around 3000 years ago even though the first migrants appeared much earlier. In terms of the selection - and when it has been acting - we want to determine when selection acted to increase the frequency on EPAS1 mutations in the ancestry of the individuals living in Tibet today. If they are genetically descendants of individuals migrating into Tibet just a few thousand years ago - then this is the relevant data for describing when selection has been acting on the EPAS1 mutations. As an aside I should also say that this has nothing to do with when the mutation(s) arose. Selection has in this case most likely been acting on standing variation.

    You argue in your blog that more could be done with this data in terms of demography. We agree. The paper was about altitude adaptation not demography. We are still working on the data and are hoping to produce a follow-up paper on the demographic analyses. We weren't sure how much interest there would be in the results - but the interest from you and other people in this is certainly a motivation to keep working on it as hard as possible.

    I hope you will post this reply on your blog and comment on it. If you do so - I would ask that you post it in its entirety. I learned a lot from the interview with Wade. I certainly now understand why politicians keep giving the same 2-line reply over and over again to journalists asking them questions. If a journalist talks sufficiently long with an interviewee - it will be possible for them to find some sentences that they can put together in some way to make the interviewee look foolish - if that's what they want to do.

    Me:

    Thanks so much for writing with this! I will of course post your comments, and I appreciate very much the time you spent detailing the work, especially on a holiday weekend.

    What you've written here basically agrees with my take on the text of your paper; the demographic model is useful as a test because it is conservative, it is not an attempt at population history. I've reviewed effective size at some length [readers can find a review that I wrote, and I can forward reprints on request]. As you write, this study does not differ substantially from many others in the use of effective size estimates.

    As an anthropologist I am very concerned at the proliferation of population models that are nonsensical from a demographic standpoint. Yes, the p-value will be much the same for EPAS1, but the model is hugely conservative with respect to anything with less extreme differentiation. Other studies are essentially alike; lowball demographic numbers are useful in their conservatism but give an incorrect view about the relation of demography and selection.

    Besides, you have to consider the mechanism by which the best-fit model has come to be so extreme. As you note, the effective size estimated under the assumption of neutrality actually will reflect the non-neutral dynamics across the exome. The HapMap doesn't give rise to anything like the model of an extreme and recent bottleneck that the exome data yield, yet of course both these genome-wide sets must have undergone the same demography. The difference is that the exome is limited to the coding fraction of the genome, pointing to selection on some (probably large) fraction of coding loci. The small effective size within the last 3000 years is mathematically equivalent to a statement that the data include genealogies with many coalescences in those 3000 years. Again, this doesn't happen in a population of hundreds of thousands of individuals unless there was rapid selection.

    So it seems to me that the data must reflect the high incidence of recent selection within mainland China. This is exactly what we expect based on the real demography of massive population growth across the same interval and adaptation to post-agricultural ecologies. Although the headline of the paper is about high altitude adaptation in Tibet, the real story is the massive selection in China of other genes.

    If this is correct, then I think there is much promising work to do by using real demographic estimates. Deriving the demographic model from the data themselves is really just throwing away useful information that is abundantly documented archaeologically and historically.

Pages

Subscribe to Asia

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.