john hawks weblog

paleoanthropology, genetics and evolution

admixture

  • ASHG notes on Gene Expression

    Sun, 2012-11-11 17:13 -- John Hawks

    Razib Khan has started writing up his notes on this week's conference of the American Society of Human Genetics: "Reflections on the evolution at ASHG 2012". He includes some reactions on the presentations in human population history, which will be well worth following. There's an exciting agenda of discovery underlying many of the current projects.

    Khan mentions the work on Neandertal genetics at the meeting:

    Sriram Sankararaman had a poster on Neandertal admixture in modern human lineages. In the broad outlines the Reich lab and the Wall lab seem to agree (along with others, like Melinda Yang in the Slatkin lab). We’re seeing the convergence of a new orthodoxy/paradigm.

    I agree that a new paradigm is being written, but I don't expect it to rise to an orthodoxy. At the moment, there is an obvious path forward with extensions of standard tools and new data, and that is what constitutes the active research paradigm. I think of this as a path of least parameters. But so far nobody writing outside our group has published any serious effort to match genetic results with archaeological evidence.

    Thus far, some of the reactions by established players in archaeology can be described as falling in Pauli's "not even wrong" category. Paleogenomes just shocked the systems of some people who should really have hedged their bets on modern human origins. But modern human origins are no longer the interesting issue. Genetics has moved the ratchet forward, and there is no going back to the simple paradigm.

    Now we have to grapple with a complex population history. That history was multilayered, with many more than one or two waves of significant admixture leading to the samples at hand. The great promise is that genetics will at last allow us to test a lot of anthropological assumptions about human hunter-gatherer population dynamics. But the theoretical challenge is that admixture estimates from genetics are conditioned on extremely simple population models that are really far from the ways we know humans have interacted in the past.

    On that note, I will point to my current paper, which has just gone online in the Journal of Anthropological Sciences: "Dynamics of genetic and morphological variability within Neandertals". As I put the paper together, I began to appreciate the difficulty of describing each of these different sources of data -- genetic, morphological and archaeological -- for specialists in the other areas. I will post on some of my favorite parts of the paper later in the week.

  • The North African Neandertal descendants

    Thu, 2012-10-18 16:25 -- John Hawks

    A new paper by Federico Sánchez-Quinto and colleagues reports on comparisons of North African population samples with the Neandertal DNA project data [1]. The paper shows that North African populations also carry a substantial trace of Neandertal ancestry, like living populations outside of Africa, much more than populations of sub-Saharan Africa.

    One of the main findings derived from the analysis of the Neandertal genome was the evidence for admixture between Neandertals and non-African modern humans. An alternative scenario is that the ancestral population of non-Africans was closer to Neandertals than to Africans because of ancient population substructure. Thus, the study of North African populations is crucial for testing both hypotheses. We analyzed a total of 780,000 SNPs in 125 individuals representing seven different North African locations and searched for their ancestral/derived state in comparison to different human populations and Neandertals. We found that North African populations have a significant excess of derived alleles shared with Neandertals, when compared to sub-Saharan Africans. This excess is similar to that found in non-African humans, a fact that can be interpreted as a sign of Neandertal admixture. Furthermore, the Neandertal's genetic signal is higher in populations with a local, pre-Neolithic North African ancestry. Therefore, the detected ancient admixture is not due to recent Near Eastern or European migrations. Sub-Saharan populations are the only ones not affected by the admixture event with Neandertals.

    The interesting aspect of the paper is that the authors attempted to separate the ancestry of North African samples into a pre-Neolithic indigenous African component, and a residual component that represents more recent gene flow into North Africa, from all sources. The historic movement into North Africa has been fairly cosmopolitan, involving sub-Saharan Africans, Arabs, Medieval Europeans, Romans, Carthaginians and many other peoples. Sánchez-Quinto and colleagues used the ADMIXTURE program to try to sort out a pre-Neolithic indigenous component and analyze that specifically for Neandertal similarity.

    Unsurprisingly, the fraction of estimated sub-Saharan African ancestry in each population sample was inversely correlated with the estimated Neandertal ancestry. That is, the more a population looks like sub-Saharan Africans, the less Neandertal it has.

    Here's what's surprising: When they sorted out parts of the genome in Tunisians that ADMIXTURE determines to be most likely from pre-Neolithic North Africans, they found these parts of the genome had more Neandertal ancestry than typical of the CEU sample of northern European ancestry. Is it possible that ancient North Africans had more Neandertal similarity than today's Europeans?

    Sánchez-Quinto and colleagues suggest that the Neandertal ancestry in this population came in Upper Paleolithic times from the Near East. That is possible, or some of the Neandertal similarity may reflect ancient African population structure. Really I think we will have to do a finer analysis of chromosome blocks to examine the subset of shared Neandertal derived alleles that reflect introgression versus incomplete sorting from the ancestral African population. It will be very interesting to examine more closely the mixture of population history within Egypt, through which most Near Eastern pre-Neolithic population movement must have come.

    The authors note that the distribution of Neandertal similarity outside Africa increases with distance from Africa.

    A previous study [26] observed that the similarity to Neandertals increases with distance from Africa and suggested this could be explained by SNP ascertainment bias plus a strong genetic drift in East Asian populations. Nonetheless more complex, population-biased, ascertainment schemes might have additional effects (i.e bottlenecks), but these are not expected to significantly increase the rate of false positives in admixture tests [31]. The Tunisian population has been reported to be a genetic isolate [17] so it is plausible that part of the signal detected is actually due to genetic drift. However, this should not affect the other North African groups in our study. Finally, given that SNP arrays are based on common alleles and probably the relevant admixture information is encoded within the rare and very rare alleles, the potential bias, if anything, will underestimate ancient hominid admixture signals, as shown in previous studies [2],[3].

    This pattern was also observed by Meyer and colleagues earlier this year [2], and I discussed it in my post on that paper ("Denisova at high coverage"). Both papers note that ascertainment bias may contribute to this pattern. I added that Meyer and colleagues had assumed that genes found in sub-Saharan African populations could not have come from Neandertals, which greatly biased their estimates against Europe and West Asia, considering historical and prehistoric gene flow across the Sahara and along the Indian Ocean coast. So I'm not yet accepting the relative numbers of Neandertal ancestry from different populations, as we don't know that they have all come from consistent assumptions. In particular, an elevated amount of Neandertal ancestry in China -- this paper puts it almost as double the amount of Neandertal ancestry in northern Europeans -- is unlikely. There is no pattern of bottlenecks that can give rise to that excess without additional population mixture, and hard to see where such population mixture would have happened without also affecting the ancestors of Europeans. Instead, we have some work to do in reducing the biases on these comparisons.


    References

    Synopsis: 
    A study of North African genetic variation shows that Neandertal genes were widespread in the area before the Neolithic.
  • Denisova at high coverage

    Thu, 2012-08-30 15:25 -- John Hawks

    Science today has released the new paper on the Denisova high-coverage genome by Mattias Meyer and colleagues from Svante Pääbo's group [1]. There is a lot of material in the supplements of the new paper, and it will take some time to work through implications.

    The basics are quite simple: The paper confirms the initial interpretation of the genome by David Reich and colleagues [2] in most respects. The mixture with a whole-genome sample from Papua New Guinea is estimated at 6% Denisovan ancestry. Confirming the later paper by Reich and colleagues [3], the new analysis finds no significant evidence of Denisovan ancestry in a mainland south Chinese (Han Dai) individual, and can exclude it down to a very small fraction:

    However, in contrast to a recent study proposing more allele sharing between Denisova and populations from southern China, such as the Dai, than with populations from northern China, such as the Han (17), we find less Denisovan allele sharing with the Dai than with the Han (although non-significantly so, Z = –0.9) (Fig. 4B) (table S25). Further analysis shows that if Denisovans contributed any DNA to the Dai, it represents less than 0.1% of their genomes today (table S26).

    That is a mystery to be explained. How did Asians end up lacking any evidence of Denisovan ancestry, when the peoples of Sahul (Australia and New Guinea) have six percent? It's nutty! The early modern humans who were the ancestors of present Sahulian peoples surely came from Asia, and they surely mixed with Denisovans there somewhere, right? But today there's no sign that present Asian peoples descended from those early Asian peoples.

    We must, I think, conclude that there was at least one, and possibly several episodes of massive population movement across South and Southeast Asia.

    I have recently completed a review of the analogous problem for Neandertals in Europe -- late and early Neandertals themselves appear to have been a dynamic population. I'm now working on a review of the situation in Southeast Asia. We may fundamentally have to look at the archaeological record in a new, and much more dynamic, way than has been the case.

    Neandertal gene flow

    To me at the moment, this is the most interesting paragraph of the new paper:

    Interestingly, we find that Denisovans share more alleles with the three populations from eastern Asia and South America (Dai, Han, and Karitiana) than with the two European populations (French and Sardinian) (Z = 5.3). However, this does not appear to be due to Denisovan gene flow into the ancestors of present-day Asians, since the excess archaic material is more closely related to Neandertals than to Denisovans (table S27). We estimate that the proportion of Neandertal ancestry in Europe is 24% lower than in eastern Asia and South America (95% C.I. 12–36%). One possible explanation is that there were at least two independent Neandertal gene flow events into modern humans (18). An alternative explanation is a single Neandertal gene flow event followed by dilution of the Neandertal proportion in the ancestors of Europeans due to later migration out of Africa. However, this would require about 24% of the present-day European gene pool to be derived from African migrations subsequent to the Neandertal admixture.

    This is a very interesting result, partially because it is the opposite of what we are finding. As I explained earlier this year, we are finding Europeans to share more Neandertal alleles than Asians do. The difference in our results has been much smaller than 24%; really only an increase of less than 0.5% on the whole genome, or maybe 10% relative to the overall amount in Europe (which is on the order of 3%).

    My initial reaction to this difference is that it reflects the sharing of Neandertal genes in Africa. Meyer and colleagues filtered out alleles found in Africa, as a way of decreasing the effect of incomplete lineage sorting compared to introgression in their comparison. But if Africans have some gene flow from Neandertals, eliminating alleles found in Africans will create a bias in the comparison. If (as we think) some African populations have Neandertal gene flow, that probably came from West Asia or southern Europe. So as long as the present European and Asian (and Native American) samples have undergone a history of genetic drift, or if (as mentioned in the quote) they mixed with slightly different Neandertal populations, this bias will tend to make Asians look more Neandertal and Europeans less so.

    Anyway, this demands further investigation. The Denisova genome makes a more compelling outgroup for these kinds of comparisons, because it is much closer to us than chimpanzees are. But it isn't really an outgroup because it shares alleles by descent with Neandertals. So it takes some clever genetics to compare the distributions of derived alleles in these genomes in terms of introgression versus incomplete lineage sorting.

    Denisovan demography

    It has become possible to make some good estimates of demographic history using only a single diploid genome, using a technique developed by Li and Durbin [4]. Meyer and colleagues applied this technique to the Denisova genome, finding that its genetic history contrasts with that of living human populations:

    To estimate how Denisovan and modern human population sizes have changed over time we applied a Markovian coalescent model (22) to all genomes analyzed. This shows that present-day human genomes share similar population size changes, in particular a more than two-fold increase in size before 125,000–250,000 years ago (depending on the mutation rates assumed (23), Fig. 5B). Denisovans, in contrast, show a drastic decline in size at the time when the modern human population began to expand.

    There is not yet enough data from Neandertal genomes to apply the same method, but to the extent that we understand their diversity, they show a similar picture. These archaic humans in Eurasia had much, much smaller effective population sizes than the ancient population of Africa. That's not surprising, given what we understand about ancient hunter-gatherer population dynamics.

    What may be a bit more surprising is the geography. We know that Neandertals of Europe and Central Asia lived in an environment that was relatively marginal for their technology and subsistence pattern. The Denisovan population could well have lived in parts of South or Southeast Asia -- subtropical and tropical areas comparable to Africa in their ecological diversity and resource richness.

    We might have imagined that the Denisovan population would be more diverse than Neandertals -- that it might have been comparable in diversity to part of Africa, if not the entirety of Africa. The genome is inconsistent with that picture.

    How can we explain the apparent contrast?

    1. Maybe Denisovans didn't live in South or Southeast Asia at all. If not, that demands that we explain how Australians got their genes.

    2. Maybe the population was geographically extensive and diverse, but the genome from Denisova Cave doesn't represent it well. If so, we might discover that Sahulians actually have even more ancestry from this group. Alternatively, we might find that the early history of the population was widely shared, but the recent history diverged between Siberian and other branches of the Denisovan-inhabited region.

    3. Maybe African diversity emerged from a much more complex series of interactions than we now appreciate. The demographic model of Li and Durban doesn't encompass admixture, just the probability of gene coalescence across time. We have recently begun to appreciate the reality of ancient African population structure. If those initial African populations were more divergent from each other than Neandertals and Denisovans, their later mixture would give rise to a picture of early population expansion, even if each of them had relatively low (Denisovan-like) diversity.

    This picture is already complicated. It will get more so. We have a long way to go before the archaeology of MSA and Middle Paleolithic peoples will be reconciled with these genetic models.

    The "modern human" catalog

    I think it's tremendously interesting that the authors have compiled a list of gene variants shared by living humans that are absent from this high-coverage archaic human genome. It's a first step to identifying networks of genes that have been subject to recent evolutionary change in human ancestors.

    That being said, the list of genes itself doesn't lend itself to concrete conclusions:

    One way to identify changes that may have functional consequences is to focus on sites that are highly conserved among primates and that have changed on the modern human lineage after separation from Denisovan ancestors. We note that among the 23 most conserved positions affected by amino acid changes (primate conservation score ≥ 0.95), eight affect genes that are associated with brain function or nervous system development (NOVA1, SLITRK1, KATNA1, LUZP1, ARHGAP32, ADSL, HTR2B, CBTNAP2). Four of these are involved in axonal and dendritic growth (SLITRK1, KATNA1) and synaptic transmission (ARHGAP32, HTR2B) and two have been implicated in autism (ADSL, CNTNAP2). CNTNAP2 is also associated with susceptibility to language disorders (27) and is particularly noteworthy as it is one of the few genes known to be regulated by FOXP2, a transcription factor involved in language and speech development as well as synaptic plasticity (28). It is thus tempting to speculate that crucial aspects of synaptic transmission may have changed in modern humans.

    Interesting. I can imagine a Ph.D. dissertation looking into the function of each of those genes. It is surely true that in the last 300,000 years, human brains have been evolving. But why these genes as opposed to others? And how many regulatory changes (as opposed to amino acid changes) may have been further involved?

    Maybe even more interesting: How many times will the human alleles be found in some other Denisovan (or Neandertal) genomes, and how often will the "archaic" allele be found in anyone living now?

    A limited series of comparisons is too small to exclude that the range of variation will overlap, as fossil analysts have known for a long time. So we will need to work on extending our knowledge of the range of variation within living people, by increasing the sample of genomes representing populations around the world, particularly in Africa.

    The technology

    Of course, the most exciting thing about the new paper is the proof of concept for future high-coverage archaic genomes. The lab was able to generate the high-coverage sequence using its existing samples, by sequencing single-strand DNA instead of requiring double-strand DNA. This is a massive advantage when working with ancient DNA, because damage to the sequence often prevents double-stranded DNA from being amplified.

    The paper makes explicit that the Denisova phalanx simply has better endogenous DNA preservation than any other specimen known. That being said, the new sequencing method has greatly increased the sequence yield from the sample:

    We applied this method to aliquots of the two DNA extracts (as well as side fractions) that were previously generated from the 40 mg of bone that comprised the entire inner part of the phalanx (2, 8). Comparisons of these newly generated libraries to the two libraries generated in the previous study (2) show at least a 6-fold and 22-fold increase in the recovery of library molecules (8), which is particularly pronounced for longer molecules (fig. S4).

    It would be too soon to say that a similar increase in yield will happen for other specimens, but obviously, this may bring higher coverage into reach for several specimens that are currently only sequenced at very low coverage, including the Vindija, Mezmaiskaya, and El Sidron Neandertals. We will have to wait and see how the new technique affects ancient DNA recovery going forward.

    I keep telling people that I think it's exciting that research into human evolution is now pushing technology forward. It has often been that paleoanthropology uses technological advances in other fields. But with ancient DNA, we really see an organic growth of technology along with research questions about our evolution. In our work on the ancient genomes, we're making some progress pushing forward knowledge about human biology by understanding human evolution. Evolution really is the fundamental principle of biology, but using evolution to learn about biology sometimes requires traveling through time. Ancient DNA gives us a time machine bringing new insights into reach.


    References

    Synopsis: 
    A technological advance in library preparation gives rise to much better knowledge of the ancient Denisovans
  • Mailbag: Spuds and mutts

    Wed, 2011-11-09 00:28 -- John Hawks

    Re: "How widespread is Denisovan ancestry today?" and "Potato sack race":

    Question about Denisovan DNA. Once introduced into a population, beginning many millenia ago, what keeps it from being in the DNA of everybody in the area? I exclude new arrivals, but what kept the Denisovan DNA from being spread to the homeland of the new arrivals what with the traveling salesmen, the refugees from tribal pushing and shoving, armies marching, cross marching and countermarching? It isn't as if Denisovan genes cause assortative mating by making the possessor either a hell of a catch or a last-man-on-earth scenario. Is it? Selective survival against diseases that come and go, while not so good in between, a la sickle cell? Is the blender model of human reproduction faulty somehow.

    As to potatoes, I'd heard that one advantage is that armies, used to pasturing their horses in the grain of the enemy's peasants' fields, had to move on more quickly when the supply officers gave up trying to get their foraging parties to dig potatoes.

    If, as Keegan hypothesizes, the ration was one pound of meat and two of bread (requiring two pounds of firewood) per man per day, an army of 30,000 ate out a location pretty quickly. If spuds were the local staple, they'd have to move. You just can't feed 30,000 guests who arrived unannounced by digging potatos. Not fast enough. Do horses like potatos? So, the army moves on--win--and the peasants get out the potato forks and do okay, more or less. Win.

    Re: potatoes -- I think you've pointed to an important factor -- also, they can't be burned when the army retreats. The sheer productivity of tubers really does outweigh the available grain crops in Northern Europe.

    Re: Denisovan DNA -- The genes should have diffused into other populations, all things being equal. That they did not do so is a pretty strong indication that SE Asia today shares little genetically with SE Asia 30,000 years ago. There must have been a massive influx of people who lacked Denisovan ancestry, well after the initial mixture with Denisovans happened and Denisovans themselves left the scene.

  • How widespread is Denisovan ancestry today?

    Tue, 2011-11-01 00:32 -- John Hawks

    Last month, David Reich and colleagues [1] reported on estimates of Denisovan ancestry for island and mainland Asian populations. Their most memorable conclusion was that they could find no substantial sign of Denisovan ancestry anywhere on the Asian mainland, or indeed on any island that had ever been connected by land to Asia.

    The distribution was stark, as illustrated by the map from the paper:

    I wrote about the paper when it was released ("Denisovan DNA in the islands, and an Australian genome"), noting:

    Notice the apparent lack of Denisovan ancestry in anyone who lives anywhere that was once connected by land with mainland Asia. I say "apparent" deliberately: Abi-Rached and colleagues reported last month on the widespread distribution of Denisovan HLA types among today's Asian populations, and those may well be products of Denisovan genes that were later selected. I've already identified a handful of other loci that seem to reflect Denisovan ancestry in mainland Asian people. According to the comparisons by Reich and colleagues, such loci must be exceptions.

    Abi-Rached and colleagues [2] had argued that HLA alleles found in the Denisovan genome are presently common in some parts of Asia, and likely reflect local adaptive introgression. Substantial introgression of a small number of genes would not be enough to create a strong genome-wide appearance of Denisovan ancestry. Still, it was a little odd that the first genes anybody looked closely at would provide strong evidence of introgression.

    Now, Pontus Skoglund and Mattias Jakobsson [3] say that Denisovan ancestry is widespread across China and Southeast Asia.

    That conclusion contradicts Reich and colleagues, so why do the studies come to such different results?

    Skoglund and Jakobsson suggest that they have succeeded in finding introgression where others failed because their model accounts for ascertainment bias in the available datasets. SNP data come from genotyping chips, which have been designed using known polymorphisms. Five years ago, we knew much more about polymorphisms in Europe than other parts of the world, and so the HGDP, and HapMap to a lesser extent, do a good job of sampling rare alleles in Europe but miss many rare alleles in Africa and other populations. This is the ascertainment bias.

    Some of the most obvious signs of introgression today are cases where rare alleles are shared with an archaic genome. If ascertainment bias causes you to miss the rare alleles, you'll miss the introgression.

    But that explanation isn't really sufficient to explain the differences between these papers. For one thing, Reich and colleagues [1] also worked hard to account for ascertainment biases in their SNP samples. For another, whole genome comparisons between East Asian samples and the Denisova genome have not yielded evidence of Denisovan ancestry, even though whole genomes have no ascertainment bias. The number of whole genomes so far compared is very small, and so the statistical ability to detect introgression is lower, but Skoglund and Jakobsson actually replicate that null result in their current paper.

    Probably most important, it's not clear that Skoglund and Jakobsson's result can actually be explained by rare alleles. Here is Figure 1e from their paper:

    Figure 1e from Skoglund and Jakobsson (2011). Original caption: Interpolated spatial distribution of the frequency of Denisova alleles at SNPs where Denisova is different from chimpanzee and Neandertal. Sample localities are indicated with rectangles.

    This map represents a clever comparison. It is a heat map of the mean local frequency of the subset of alleles that are present in Denisova but absent from chimpanzees and Neandertals. These are presumptively derived alleles relative to the chimpanzee. The SNPs here are all known to vary in human populations, because they are all included in the HGDP sample. So the map does not represent all the Denisova derived mutations in humans today, only a particular subset that is especially likely to be informative.

    Given that the sites have been picked in a special way, we need to examine carefully how strong the pattern really is. Notice the scale of the heat map. The difference between the orange area in south China, from the green area in north China, is around 0.001, or a tenth of a percent in mean frequency. The actual values are reported in the online supplement, in Table S3. An exception of Yizu in south China who have around 0.006 more than their neighbors. The Yizu sample includes only 10 individuals (9 males, 1 female). The paper does not report the number of SNPs included in this comparison, but it must be a very small set relative to the total, because only a small fraction of human SNPs are known to be derived in Denisova and ancestral in Neandertals.

    With this very small difference in frequencies, I would not rule out the hypothesis that the zone of high Denisova derived frequencies in south China is caused entirely by frequency enrichment of a small number of loci. A handful of genes like the HLA loci observed by Abi-Rached and colleagues might be enough to create this very slight elevation in the average. Hence, the best case is that the data here simply provide greater sensitivity to small amounts of introgression. The worst case is that the pattern may be dominated by the Yizu sample, which is really too small to carry this kind of load.

    The strongest evidence presented in the paper is a comparison of north and south East Asian regions directly. Although the comparison of south China against other regions of the world (Africa, Europe) does not yield significant evidence of Denisovan similarity in this paper, south China differs from north China in essentially the same way that the Oceanian people do from other regions. And the Oceanian populations (here, Papua New Guinea and Bougainville) differ from other regions because of their Denisovan ancestry. So Skoglund and Jakobsson infer that the north/south comparison reflects Denisovan ancestry as well.

    I think this comparison is sound, and the question is, how much introgression would this pattern require? The paper answers that question in this way:

    Quantitative estimation of the precise fraction of Denisova-related ancestry in Southeast Asian populations based on genotype data are unfortunately sensitive to ascertainment bias and genetic drift, and such estimates will require genome sequence data that are currently unavailable. However, both the PCA results (Fig. 1B) and the approximately six times lower absolute values of the D statistic in tests between Northeast Asians and Southeast Asians compared with tests between Northeast Asians and Oceanians (Table S4) indicate a relatively low fraction of Denisova-related ancestry. Thus, the fraction is likely to be smaller than both the ~5% fraction of Denisova-related ancestry present in Oceanians and the ~2.5% fraction of Neandertal ancestry present in non-Africans (23, 24), perhaps around 1%.

    One percent is an amount that whole genome comparisons at present do not rule out, and I think it's a reasonable guess. I would not have thought we could rule out a one percent contribution from other, non-Denisovan archaic people, for example.

    We aren't very far from a more definitive answer of this question, as the data continue to accumulate every day. What I find interesting is the way that models can generate these 1% differences in ancestry proportions, depending on sampling and the pattern of migration assumed to have happened in the past. Two estimates that differ by less than a percent are not really different. This paper provides the suggestion of a more widespread Denisovan legacy, and I accept that as a possibility.

    I should mention: less than one percent of a half billion people is still a very large number, added to five percent of the indigenous population of New Guinea and Australia, and smaller fractions of other island populations. The total amount of Denisovan legacy present in living people probably exceeds the population of Earth at the time the Denisovans lived.


    References

    Synopsis: 
    A new paper contradicts earlier work, by suggesting a widespead Denisovan legacy in south China
  • Population structure within Africa: has "modern human origins" become a non sequitur?

    Tue, 2011-03-15 16:33 -- John Hawks

    When I wrote about the Denisova genome late last year, I claimed that "A large-scale reorganization of the science of human origins is upon us."

    I'm glad I had the sense to write that. A lot of people have pointed back to that quote over the last few months. Still, I know that the full implications of the Denisova and Neandertal genomes haven't really sunk in. "Large-scale reorganization" takes time.

    A new paper by Brenna Henn and colleagues in PNAS [1] shows how the shifting landscape has caught many geneticists off their footing. Submitted before the Denisova genome, but long after the Neandertal, the paper is titled, "Hunter-gatherer genomic diversity suggests a southern African origin for modern humans". In today's landscape, with only one instance of the word "Neanderthal" in the paper, the conclusions are obviously incomplete.

    The "southern African origins" conclusion of the paper comes out of a simple analysis that assumes that the best-fit maximum for genetic diversity (as assessed by linkage) is the most likely point of origin of the population. That would be true if the African population emerged by a series of founder effects from a single small ancestral population -- the "serial founder effect" model that I have criticized here before. But of course in 2011, we know that model is false, because it is predicated on a lack of ancient mixture with Neandertals or other populations. If the serial founder model can't work outside Africa, it certainly can't work inside Africa, where populations were larger and regionally diversified during by the beginning of the Late Pleistocene. Without that false assumption, the "southern African origin" evaporates. The primary observation, a cline of linkage disequilibrium within sub-Saharan Africa, can be explained with reference to mixture of populations without assuming an origin and expansion from one geographic location.

    I don't want to criticize overmuch. Many ongoing research projects are casualties of our new knowledge of ancient genomics, and we'll see more papers like this before the fallout has settled. Simplistic founder models, acceptable only a year ago when these projects were conceived, are now unquestionably false. Ancient population mixture is the order of the day, and we don't have any simple, plug-in-the-data models to apply to data like these.

    Instead, I want to consider the power of the data in this article to answer some fundamental questions about African population history. Henn and colleagues report on SNP genotyping of several Bushman groups from southern Africa and Sandawe and Hadza people from eastern Africa. These data are on the 550k SNP platform that was used by 23andMe before the recent increase to 1M SNPs. That means the data are comparable to many other studies. They are not entirely comparable with other samples of African genetic variation, and the authors cut the total number of SNPs down to the 55,000 that overlap among all the genotyping platforms used in their analysis. For this reason, the paper presents a genome-wide set of 55,000 SNPs across many African populations.

    It's far from the perfect sample. I expect we'll be able to do much more with the full 550k dataset from the hunter-gatherer populations. The data have been made publicly available for download, and here we're already starting to investigate them.

    Within the current paper there is a very useful analysis of the broader dataset using the ADMIXTURE software. ADMIXTURE assumes that the current samples represent a mixture of ancient populations that were more distinct than today's. I went through this algorithm with my students in class Wednesday and Friday, which I'm sure was an intimidating process to most of them. The math is not too conceptually daunting; it's just hard to conceptualize how all the possible interactions relate to gene frequencies when you are assuming more than a few putative ancestral populations. Razib Khan gives an impressive step-by-step guide to performing an ADMIXTURE analysis, including some of these samples.

    I'm not in love with this analytical method -- there's no reality check on its assumptions. But its output can be informative about many aspects of population structure. Here are some first approximations:

    1. The genetic diversification of African populations was once much greater than today. Razib Khan points out the homogenizing effect that agricultural populations have had on the African continent, particularly during and after the Bantu expansion. I think the current data suggest that earlier processes involving LSA hunter-gatherers also tended to homogenize populations.

    For example, when eight initial clusters are assumed, the ADMIXTURE analysis constructs them in a way that most of the ancestors of today's Bushmen were in a population with a high degree of genetic divergence from the other seven ancestral populations. The FST between the Bushman ancestral population and others ranges from 0.1 (for forest pygmies) to a high of 0.25 (from Europeans). That estimate is nearly double the equivalent statistic in today's populations.

    Again, we don't have to believe the assumptions underlying the ADMIXTURE algorithm, but it does highlight the basic partitioning of diversity in the African population. Today there is high diversity within African population samples, and some of that diversity can be traced back to populations of 100,000 years ago or more. Some of the diversity that once existed among these populations has now been spread within them instead. The populations got genetically closer over time.

    A model of successive population expansions, bringing ancient populations genetically closer and closer together, is also what we may see in other places. As we have learned more about the mtDNA of ancient Europeans, it has become clear that successive expansions and migrations of people into Europe have radically reshaped the gene pool.

    2. Click languages have no genealogical unity. Over the years, many linguists and anthropologists have proposed that Hadza, Sandawe, and Bushmen are closely related to each other, despite their geographic distance, because they all speak languages that use click sounds. No historical linguist has ever successfully demonstrated a system of sound changes or detailed correspondences among these languages, but people promoting the hypothesis seem immune to these kinds of facts.

    The genetics show a very clear and ancient differentiation of these hunter-gatherer peoples. In the ADMIXTURE analysis, some of the largest genetic distances are among these peoples. By itself, that may not be surprising; these are the populations that have most evaded the homogenization that followed the spread of farming. The Hadza themselves are strikingly distinctive, and their genetics may reflect a history of small population size during the last several hundred years. The potential for genetic drift in this population was very high. Still, the genetic relations are just the opposite that would be expected if speakers of these click languages had shared a common origin.

    Seems to me that this could have been the lede of the paper, if it had been written differently. A bit more exploration of the hunter-gatherer data (probably incorporating some haplotype-level analysis to give a better estimate of the ages of events) would demonstrate this point very well.

    3. By the time we find "modern" humans in West Asia, the African population had long since diversified into regional populations. This is not news; the mtDNA evidence has suggested for several years that southern Africa and the remainder of sub-Saharan Africa were already regionally differentiated before 120,000 years ago. There have also been hints of this diversification from whole-genome evidence (including the supplement of the Neandertal genome paper last year). Here we have a clear indication that the regionality extends to every African hunter-gatherer population.

    4. Hunter-gatherers have relatively little evidence for recent positive selection. The supplementary data of the current paper includes a short discussion of selection and a list of candidate loci in the hunter-gatherer samples. There is relatively little overlap in candidate regions for selection among these samples. Different genes have been selected in different populations, and not all that many of them. This is not surprising if the selection is relatively new -- the last 20,000 years or maybe more, given the distances and amount of historical population structure estimated for the data. It's also consistent with the demography of these populations. It will be interesting to check, but I would speculate that the signature of selection will on average appear older in these samples than in populations that have historically been agriculturalists.

    5. Where's the Aterian? North Africa is relatively depauperate in variation in the large combined dataset. That may stem mostly from Holocene events, including the spread of West Asian populations across North Africa. But the low variation there doesn't readily fit the idea that an out-of-Africa dispersal of genes came from a North African source. I don't think the observations in the paper (centered around linkage disequilibrium with a very low SNP count) are enough to settle anything about this question, but I'd be nervous if I were busy trying to make the Aterian seem important to the modern human origins issue.

    Bottom line

    As interesting as these assertions look, I don't think that a lot of African prehistory is about to be rewritten. Obviously, geneticists need to get serious about reading some African archaeology. We already know that African regional populations were large and diverse during the Middle Stone Age, and that's a very good fit to the kind of genetic diversity we are seeing in these samples.

    The barrier is Holocene population history. Agricultural populations grew, spread, mixed with and absorbed hunter-gatherers, and what we left are the shattered remnants of ancient African population structure. Linkage may be the most powerful way we have to consider historical hypotheses using these SNP data, but if we're going to rely on it we have to control for recent demography and selection.

    And of course, it will be interesting to see a model that can integrate both Neandertal-African and within-African population histories. I don't really have a bang-up finish for this post, because there is immediately more work to be done with these data.


    References

  • Ozzy Osbourne, archaic human

    Mon, 2010-10-25 16:28 -- John Hawks

    Via a reader: The Daily Mail really aims for the lowest common denominator of genetics: "We've all suspected, now it's official: Ozzy Osbourne IS a Neanderthal"

    He claims his ‘superhuman’ genes have kept him healthy despite a lifetime of rock ’n’ roll excess.

    And now it seems science may back up Ozzy Osbourne’s theory that he has a particularly hardy family tree.
    Researchers studying his DNA have found that the singer is the descendant of a Neanderthal man.

    This is almost an entry in the Neandertal anti-defamation series. What holds it back is the clear involvement of some shady genetics company. Get this:

    The researchers also examined the gene the body uses to break down alcohol and discovered an ‘unusual variant’ which could have helped Osbourne survive during the years when he drank up to four bottles of Cognac a day.

    ‘Given the swimming pools of booze I’d guzzled over the years – not to mention all the [drugs] – there’s really no plausible medical reason why I should be alive,’ he told The Sunday Times.

    What a crock! I mean it's one thing to tell people their genomes have Neandertal markers. I mean, that's a crock, too, since we have no clear marker list yet. But at least it's a harmless entertainment-only kind of a crock.

    Now, when you tell an alcoholic that he has an "unusual variant" that "could have helped" metabolize alcohol better -- that's an altogether deeper level of crockery.

    I know, it's like the "Weekly World News", but cheez Louise, what a crock!

  • Mailbag: The Neandertal fraction

    Tue, 2010-09-07 15:22 -- John Hawks

    Re: Neandertal DNA

    I have a question about your "Neandertals Live!" entry written on May 8, 2010.

    When you say that living non-African populations (ancestry) derive
    1-4% of their genomes from Neandertals, does this mean all living
    individuals of non-African descent have some genomic contribution from
    Neandertals? In other words, could one say if you or myself
    specifically have some kind of Neandertal DNA contribution? Or, does
    the 1-4% only refer to certain populations outside of Africa, while
    nothing can be said about individual non-Africans?

    For example, would having Neandertal genes be analogous to certain
    populations, like certain ethnicities, having a particular founder
    mutation on a haplotype, like sickle-cell anemia in people of African
    descent? In other words, some living groups of individuals have them,
    but not all living individuals have them?

    The comparison results from the greater similarity of European (and other non-African) people to the Neandertal sequence, compared to African people. It takes 1-4% genetic contribution to explain this similarity.

    That's an unusual comparison, and it leads to unusual limitations. The number is genome-wide and we don't know (yet) whether some parts of the genome are more consistently Neandertal than others. We also don't know (yet) whether Africans have no Neandertal at all, or just 1-4% less than non-Africans.

    We know nothing at all about individuals (at this moment) although I expect we'll be able to say something about the heterogeneity of Neandertal contribution fairly soon.

    I expect that some genes will have a very common Neandertal-derived haplotype outside of Africa because of selection, and that these will account for a predominant fraction of the admixture. But I can't say we know this yet empirically.

  • NEANDERTALS LIVE!

    Thu, 2010-05-06 12:53 -- John Hawks

    I, for one, welcome my Neandertal ancestry.

    It may not sound like a lot -- between 1 and 4 percent. But that's the equivalent of one great-great-great grandparent's DNA contribution. In the case of the Neandertal contribution, more than 1500 generations ago, it's an enduring legacy of an ancient group of people, spread across many lines of the genealogies of living people. Beyond their genealogical interest, Neandertal genes might have made a big difference to our evolutionary potential.

    In case you wonder what the heck I'm talking about, here's the story: Two new papers in Science describe the full draft sequence of the Neandertal genome, and perform additional analyses to understand the pattern of adaptive evolution in the population ancestral to living people.

    Richard Green and colleagues report on the genome, demonstrating very convincingly that present-day people have Neandertal ancestors. It is not entirely obvious when and where the gene flow between Neandertals and other ancient populations happened -- whether it was associated with the dispersal of most of our ancestry from Africa, or whether it may have been earlier. The gene flow was not limited to Europe, and evidence for Neandertal ancestry occurs in East Asian and Australasian populations.

    The paper is full of other good stuff, including some evidence about which gene regions changed under selection in the ancestral human population.

    Meanwhile, the second paper by Burbano and colleagues applies new microarray techniques to assess how much of the human legacy of amino acid changes has arisen in the latest, post-Neandertal period of our evolution.

    So there's a lot about the pattern of evolution and gene flow leading to living people, and a lot about adaptive and functional evolution. That makes a lot for me to cover -- and while I have the papers a little early, time is short. Let's see how much I can help clarify what's in this new research.

    If you had to sum up in a few words, what does this mean for paleoanthropology?

    These scientists have given an immense gift to humanity.

    I've been comparing it to the pictures of Earth that came back from Apollo 8. The Neandertal genome gives us a picture of ourselves, from the outside looking in. We can see, and now learn about, the essential genetic changes that make us human -- the things that made our emergence as a global species possible.

    And in doing so, they've taken a forgotten group of people -- whom even most anthropologists had given up on -- and they've restored them to their rightful place in our heritage.

    Beyond that, they've taken all of their data and deposited it in a public database, so that the rest of us can inspect them, replicate results, and learn new things from them. High school kids can download this stuff and do science fair projects on Neandertal genomics.

    This is what anthropology ought to be.

    What did they sequence?

    The Max Planck group obtained most of their genomic sequence from three specimens from Vindija -- Vi33.16, Vi33.25, and Vi33.26. These are all postcranial fragments with minimal anatomical information. Green and colleagues were able to establish that the three bones represent different women, and that Vi33.16 and Vi33.26 may represent maternal relatives.

    From these skeletons they got 5.3 billion bases of sequence. All this from an amount of bone powder about equal in mass to an aspirin pill.

    Amazing. I mean, I know the folks at Max Planck are reading this. It's inspiring to see what they've been able to do. These are three pieces of barely diagnostic hominin bone, and they've obtained literally hundreds of times more information than we have ever gotten from the fossil record of Neandertals.

    I'll describe the analyses of genetic similarity with humans in more detail below. As a brief summary, of those positions where the human genome differs from chimpanzees, Neandertals have the chimpanzee version around 12.7 percent of the time -- meaning that across the genome, a Neandertal and a human will share a genetic ancestor an average of around 800,000 years ago. This is a couple hundred thousand years higher than the same number if we compare two humans to each other. The higher age of genetic common ancestors reflects partial isolation between the Neandertal population and the African populations that gave rise to most of our current genetic variation.

    The team were able to identify 111 candidate duplications, almost all of which have some evidence of copy number variation in humans or other primates. They tentatively show that Neandertals have a bit more copy number variation than present-day humans, and identify a few loci with substantially higher copy numbers in one group or the other.

    A substantial part of the paper is dedicated to finding evidence of positive selection on the human lineage after the emergence of Neandertals. The idea is to look for fixed selective sweeps -- regions where humans are likely to have SNPs absent in Neandertals and a relatively shallow gene tree. They identify 212 regions like this -- as I discuss below, a surprisingly low number.

    The second paper, by Hernán Burbano and colleagues, describes the application of a targeted microarray to probe Neandertal genetic samples for protein-coding variants that separate humans from chimpanzees. They identify 88 amino acid substitutions that seem fixed in the known sample of living humans, but not present in the Neandertal sequence. Those 88 are not necessarily all functionally important, although this list will include a number of "structural" genetic changes that make a difference to proteins expressed worldwide today. There is much to come in analyzing the categories and genes represented in both lists, which may tell us very interesting things about our Late Pleistocene evolution.

    What is the evidence for interbreeding?

    From their initial work sequencing the nuclear genome in Neandertals, the Max Planck group has followed a clever strategy: Don't look at the Neandertal sequence to see what humans share, look at human variation to see which version the Neandertal sequence has.

    The strategy is smart because it helps to obviate some major problems with ancient DNA -- you don't have all the parts, and the parts you do have probably contain a lot of sequencing errors of various kinds. By looking first at sites that vary within humans (or, in some comparisons, between humans and chimpanzees), we can focus on a very simple question -- did the Neandertal have one version, or the other?

    Applied to human variation today, there are several ways we might use a Neandertal genome test the hypothesis of no interbreeding. Green and colleagues focus on two complementary approaches.

    1. If Neandertals contributed no genes to living populations, then they should be equally related to all living people, no matter where in the world those people live.

    Green and colleagues show that the Neandertal genome is closer to some humans than others. People whose ancestry lies outside Africa are significantly more like Neandertals than are people who live in Africa today. In this study, the authors include whole genomes from people in France, China and Papua New Guinea outside Africa, and Yoruba and San inside Africa. The Africans are not as close to the Neandertal as any of the non-Africans.

    That doesn't mean that non-Africans derive most of their genes from Neandertals -- in fact, as I describe below, the proportion is quite small. Living people are more like each other -- even non-Africans and Africans -- than any of them are like Neandertals.

    The point is that despite this great similarity of living people, we have genetic variants that we share with the Neandertal genome, and that proportion is a lot higher outside Africa than inside it. The natural conclusion is the Neandertals contributed more genes to non-Africans than to Africans.

    One thing is for sure: You can't explain this observation under the hypothesis that a small, African population expanded out of Africa without interbreeding with Neandertals along the way.

    2. Look at the genes most likely to represent ancient population structure, the ones with deep roots outside Africa.

    This is an idea that we came up with to look for genes in living humans that might have come in from Neandertals or other ancient populations (for example, we described it in our 2008 review). Look for the parts of the genome with the deepest genealogical roots outside of Africa. Those are candidates for Neandertal gene flow -- a high chance that one of the two sides of that deep root was present outside of Africa for hundreds of thousands of years.

    Green and colleagues took this idea to the next level. They found parts of the genome where non-Africans have a deep root and Africans don't. Then they looked at the Neandertal sequence. Out of the 12 regions they identified with deep roots outside Africa, they found that the Neandertals had the deep, non-African specific version in 10 of those.

    I mean, there's really not any other way you can explain this. We got those genes from Neandertals. Every one of those loci is a region where some people have a Neandertal-derived allele, and others don't. Those particular 10 loci are a small fraction of the overall Neandertal-derived element of our heritage -- because they used Perlegen SNPs to find them, they ended up with regions that are fairly long (100 kb or more in length). Those are probably all really interesting, but there will be more of them when we can reliably identify smaller segments with deep genealogies.

    Could the results have been caused by contamination?

    Green and colleagues are utterly convincing about the level of contamination in their sequence. They have employed several independent checks, all of which arrive at the same conclusion: The modern human contamination in almost all their comparisons is limited to significantly less than one percent -- and for autosomal sequence they can give a tight estimate of 0.7 percent contaminating sequence.

    The methods that Green and colleagues used to test for a Neandertal contribution to non-African populations are not likely to be strongly influenced by contamination. The probe for deep roots in particular is extremely unlikely to be influenced by contamination in the Neandertal sequence.

    The very low contamination rate, and methods that should be robust to some contamination, means that we can be very confident in their result.

    How much Neandertal ancestry do we have?

    The Neandertal contribution does not make up a major proportion of any population, even outside of Africa. Green and colleagues apply a population model that involves isolation between ancestral Neandertal and African populations, a dispersal from Africa into Eurasia, and subsequent mixture with the Neandertals. Under this model, the estimated fraction of Neandertal ancestry for non-African populations today is between 1 and 4 percent.

    Now, let's put on our skeptics' hats. Is this the right model?

    If Neandertal and African populations had not been isolated, then the amount of mixture after an out-of-Africa dispersal would be lower. On the other hand, the dispersing African population would already be part Neandertal, because of genetic mixture. The proportion of ancestry from ancestral Neandertals would be around the same amount, it would just be distributed across a longer time.

    They did not examine the question of how much of the genome came in from Neandertals because of selection. The estimate they have, between 1 and 4 percent, is so high that this is not just a few genes introgressing in from Neandertals -- it is a big fraction of the neutral, non-coding part of the genome. So selection doesn't explain the similarity, nor can parallelism -- the similarity is genome-wide, not just coding or functional changes, and not as far as we know clustered into regions that might have hitchhiked with adaptive alleles.

    But there's clearly a lot more to do, characterizing the functional implications of some regions, testing for selection, and finding Neandertal variants that might have reached very high frequencies in later populations. To the extent that selection has influenced the pattern, it will also throw off the simple population model. But it doesn't throw off the fraction of Neandertal ancestry -- if it's three percent, it doesn't matter whether it was selected or neutral, it's still three percent.

    So the bottom line is, the fraction is going to be about right, regardless of the mechanism by which the genetic mixture happened.

    Can we please take off our skeptics' hats? It's getting in the way of my Neandertal victory dance.

    No. All the cool paleoanthropologists wear hats.

    What about population structure within Africa? Could that explain the apparent Neandertal contribution?

    We've known about the occasional deep-rooted genealogies outside Africa for a long time (and Jeff Wall's work, as an example among others, has explained that pattern as archaic human mixture into non-Africans). They've been talking about something like five percent of the human genome coming from admixture with ancient groups outside of Africa. So this shouldn't come as a shock.

    Until now, though, it has been possible for some people to wave these results away. We didn't really know that any of those deep roots were in archaic humans, and after all, who's to say that they aren't variants that originated in Africa and have since been lost there, or that we haven't found them yet? African variation is great, and if you imagine that some variation might have once existed in northeastern Africa and was subsequently lost within African populations, that might look like admixture with archaic humans outside of Africa.

    This line of argument is now special pleading. Why would we posit a cryptic mystery population in Africa, which happens to look genetically identical to Neandertals, but has subsequently disappeared? A big fraction of deep genealogies outside Africa really are in Neandertals. By far the simplest explanation is that today's non-Africans got them from ancient non-Africans. This is no surprise -- that's where the data have been pointing now for five years.

    Yet Africans are a lot more diverse than other populations, and this diversity itself does reflect the dynamics of the ancient African population. The Neandertals aren't so different from that pattern that now still exists within Africa -- they're extending the notion that "modern" is something that's been evolving for a long time. I expect we'll be able to come to a better understanding of ancient population interactions within Africa, by understanding the parts of the genome that have come from Neandertals outside of Africa.

    Could the gene flow be due to ancient interactions between West Asia and Africa?

    Green and colleagues suggest that at most few genes from modern humans ended up in Neandertals.

    That is, although they find lots of evidence of old-looking genes in us that are shared with the Neandertal genome, they find few cases of new-looking genes in us that are shared with that genome.

    That might suggest several things about interactions between Africa and West Asia and Europe during the Middle to Late Pleistocene. For example, if there had been high gene flow from Africa into West Asia after the first appearance of a distinct Neandertal population, maybe 200,000 to 400,000 years ago, we might expect to find some new-looking genes in humans that Neandertals also got.

    On the other hand, the data are from European Neandertals, who are at the end of a fairly long chain of populations from Northeast Africa. If gene flow had been ongoing into the Levant or further into West Asia during the last 200,000 years, it's not obvious how many of these genes would have made it into Europe. The rapid mitochondrial DNA coalescence of Neandertals does suggest substantial mobility in the population across Central Asia to Western Europe. But maybe that apparent dynamism had a boost from mtDNA selection.

    So just on the data, I don't think we know yet whether this is gene flow in the Levant 200,000 or 100,000 years ago, or whether it's genes coming from West Asian Neandertals into dispersing Africans after 100,000 years ago. I expect all are likely. I have some ideas how to test some of these things, and we will get started immediately.

    The lack of apparent mixture of "modern" genes into Neandertals -- what does it mean?

    It means that a model of one-way gene flow from Neandertals into us can explain the pattern of genetic similarity.

    The authors explain this as a function of population expansion. The expanding population (us) picks up some Neandertal genes that expand in numbers, while the contracting population (Neandertals) doesn't have a chance to pick up as many genes because it is declining in numbers. That model seems plausible, particularly in comparison with historical cases of population contact.

    On the other hand, the three Neandertals from which most of the genome sequence was derived all date to before 40,000 years ago. There weren't any modern humans around for them to have interacted with around Vindija at that time. So should we be surprised that they don't have genes of modern humans?

    A more interesting question was posed to me by a very sharp journalist: What would we expect the result to have been if they had sequenced a Near Eastern Neandertal, like Amud, for example?

    The answer seems obvious -- the admixture fraction should have been higher. That population, which is the most likely to have been the source of mixture, must have been somewhat genetically different from the European Neandertals. Any extent of genetic differentiation between them would make the European Neandertals look less like non-Africans today than the Near Eastern ones.

    I'll have more to say about these Near Eastern Neandertals in the next few days.

    But wait a minute. I thought the mitochondrial DNA proved that Neandertals are extinct!

    Selection. Selection. Selection.

    I've been saying it for years. I've published it. Will you learn to listen to me, already?

    The mtDNA of Neandertals is gone because it conferred some disadvantage. There are many reasons to suspect this -- the Neandertal variation is itself apparently recently derived; the human variation is clearly in disequilibrium, especially outside Africa; the mtDNA genes affect functions that differ greatly in Neandertal and recent populations, including energetics, longevity, and brain; there are clear signs of mtDNA selection in many recent human populations.

    Mitochondrial DNA is useful for a lot of reasons, but nobody should ever have relied on it alone as evidence of Neandertal population dynamics.

    Is it really true that there is no variation in Neandertal ancestry outside Africa?

    The comparisons in the paper are highly convincing because of the sheer amount of sequence taken from the sampled individuals. A single gene locus from an individual may be unrepresentative of the person's population, but averaged across the whole genome, the difference between two people from distant populations is very, very close to the difference between the two populations.

    But they sampled very few individuals. So we are left with a question -- do we really know we've sampled variation outside Africa enough to make regional estimates of Neandertal gene flow?

    I think we could do better with more genomes. For example, when it comes to finding deep genealogies, we need to be able to find shorter regions than the ones used by Green and colleagues. That will expand the sample of candidate loci, and will catch some Neandertal-derived genes that we're missing now. Moreover, if gene flow was really around 1-4 percent, many SNPs that came in from Neandertals will be rare enough to be missing from the big SNP genotyping samples. We may find some variants with whole-genome sequencing on larger samples that will be worth examining.

    But most important, we'll be able to develop strategies based on this success to find ancient population structure involving groups where we don't yet have the DNA -- like populations of South and East Asia. Some of those may give us the chance to test those methods soon, as for the Denisova individual.

    Is this multiregional evolution, or just out-of-Africa with some leakage of earlier Eurasian genes?

    Out-of-Africa movement was a major mechanism of recent human evolution. The genetic ancestry of living people is multiregional.

    I see no contradiction between those statements. From now on, we are all multiregionalists trying to explain the out-of-Africa pattern.

    There was clearly a dispersal of African genes into the rest of the world during the Late Pleistocene, sometime between 50,000 and 100,000 years ago. Living people everywhere on Earth derive more than 90 percent of their genes from African populations who lived 100,000 years ago. That much is plain.

    (Why did I not write "more than 96 percent?" See below.)

    These genetic observations require some kind of out-of-Africa event. This event was not limited to a few genes, and selection of a few genes even with substantial hitchhiking of surrounding genome cannot account for the pattern. There must have been some kind of demographic expansion including African-derived populations and preferentially excluding the genes of Eurasian populations like the Neandertals. Selection on a gene network might have mediated the expansion, as suggested by Eswaran (2002). Or the expansion might have been culturally or technologically mediated, as many other people have suggested.

    Those are hypotheses about mechanisms. How did it come to be that living people trace the overwhelming majority of their ancestry to Africa within the last 100,000 years? These explanations may answer that question.

    The present study shows that Neandertals were at a minimum partially isolated from their contemporaries in Africa, and that the genetic divergence between those populations was larger than the genetic differences between European, Asian, and African populations today.

    Yet those Neandertals are among our ancestors. Late Pleistocene humans had multiregional origins, and the evolution of the Neandertals was itself a case of relatively recent population dispersal from Africa or West Asia. Human and Neandertal genes mostly derive from common genetic ancestors between 400,000 and a million years ago -- much, much later than the initial habitation of Eurasia 1.8 million years ago.

    But 1-4 percent is so minor, can it be an important part of our evolution?

    There are three things you have to ask about the fraction of Neandertal ancestry.

    1. How much gene flow would it take to guarantee that anything adaptive in the Neandertal population survived into later people?

    The answer to that question is simple -- it takes a few dozen matings to get most adaptive genes into our population. If there was a lot of interference with the genetic background, it might take more -- just to make sure that the advantageous alleles had a chance to be de-linked from the genetic background.

    If Neandertals are one percent of the ancestry of non-Africans, we can be very sure that any gene in a Neandertal that had adaptive value in the later population is here now. That means they were important in an evolutionary sense.

    2. What fraction of the human population 50,000 years ago were Neandertals?

    This is very important -- when it comes to neutral genetic loci, the essential question is how much the Neandertals may be underrepresented today relative to their numbers in the past. Is three percent too low? It seems very unlikely that the fraction of Neandertals compared to the rest of humans was as high as 10 percent -- we know that Africa already had a large population 50,000 years ago, and everything we know about Neandertals suggests a very low population density, an effective size much smaller than 10,000 individuals. Were five percent of the people on Earth 50,000 years ago Neandertals?

    We don't really know the answers, but now we have a chance to test hypotheses about ancient population size and expansion in Neandertals. My point at the moment is only this: If today Neandertal genes make up only one percent of the gene pool of the 5 billion people outside Africa, that's the genetic equivalent of 50 million Neandertals.

    In relative terms, their contribution to our population may be a reduction from their fraction of the Late Pleistocene population. Not that great a reduction, not a massive crash to zero. A reduction in the wake of the out-of-Africa movement, possibly from five percent to three.

    You might think the answer to this is obviously zero. But in genetic terms, we can ask, how many times has the average Neandertal-derived gene been replicated in our present gene pool? Those aren't Neandertal individuals -- that is, a forensic anthropologist wouldn't classify them as Neandertals. They're the genetic equivalent.

    The answer to this is also simple: In absolute terms, the Neandertals are here around us, yawping from the rooftops.

    There are more than five billion people living outside of Africa today. If they are one percent Neandertal, that's the genetic equivalent of fifty million Neandertals walking the Earth around us.

    Does that sound minor? If I told you that your average gene would be replicated into fifty million copies in the future, would you be satisfied? Maybe your ambition is greater, but I think the Neandertals have done very well for themselves.

    Does this mean that Neandertals belong in our species, Homo sapiens?

    Yes.

    Interbreeding with fertile offspring in nature. That's the biological species concept.

    Now, some paleontologists might still disagree -- maintaining that species are units that can be distinguished morphologically, or by one or more derived features, or any number of other definitions. That's fine with me, as long as they're clear. But understand: It does define all non-Africans today as an interspecific hybrid population.

    So maybe they want to rethink that one?

    If Eurasians got less than 4 percent from Neandertals, doesn't that mean that they got more than 96 percent from Africa?

    I look at the 1-4 percent estimate as a minimum, for several reasons. As I'll note below, this estimate mainly refers to the excess Neandertal ancestry outside Africa, which means there may be some additional amount that both recent African and non-African populations share.

    But more important, Neandertals weren't the only people living in Eurasia 100,000 years ago. China didn't have Neandertals, nor did Southeast Asia and Java. India was full of hominins, which might or might not have shared substantial genetic similarity with Neandertals. They're close enough to the known Neandertal range to speculate that they may have been close, but the only available fossil, the Middle Pleistocene Narmada skull, is not very informative. Any of these populations might have been genetically different from Neandertals, and might have also contributed genes to present-day human populations -- genes that wouldn't show up by scanning the Neandertal genome.

    The recent genetic sequencing of the Denisova pinky (a.k.a. the X-woman) from the Altai Mountains reminds us that these populations outside of Africa may have been quite a bit closer to us, genetically, than we might have expected from the 1.8-million-year record of humans outside Africa. These populations were dynamic in ways that many paleoanthropologists haven't yet appreciated.

    Do living Africans have Neandertal ancestry, too?

    I think that the present study doesn't have the power to answer this question, at least with the design that the authors used. The fact that living Africans are less genetically similar to the Neandertals is extremely important evidence of the Neandertals' genetic contribution to populations outside Africa. But it doesn't bear on how much back-migration into Africa may have happened.

    We know that the answer is nonzero, because Africa has received immigrants from other parts of the world during historic times. The same genetic patterns that reflect population contacts up and down the East African coast, and across the Sahara into West Africa, show the possible conduits for the flow of Neandertal-derived genes into African populations.

    But how much genetic dispersal into Africa happened in LSA or late MSA times? Mitochondrial and Y chromosome distributions in Northeast Africa suggest there was been some. Nevertheless, Africa would have been a very difficult place to return, for humans who had begun adapting to different ecological and disease environment.

    I think that some Neandertal genes might have made it back into Africa, even in ancient times, but I wouldn't be surprised if that number was small.

    The big shoe left to drop is the extent of population differentiation within Africa during MSA times. So far we've seen hints that these populations might have been nearly as differentiated from each other as they were from Neandertals, with substantial gene flow homogenizing them in the last 30,000 years. This paper includes an additional Bushman genome, after the four published earlier this year. Comparing that new genome to the Neandertals, its modal difference from the human reference (Hg18) genome is between the other humans and the Neandertal. Not quite halfway between, but nearly so. There's a lot of genomic variation within Africa, and exploring the population history that explains that variation may turn up some surprises.

    What about recent selection?

    One of the really exciting aspects of this work is that both Green and colleagues and Burbano and colleagues look for things that all humans today share but Neandertals lack.

    You might call these "the genes that make us modern," although functionally we have little idea what any of them do.

    Both papers show one thing that is extremely interesting: There aren't very many such genetic changes.

    Burbano and colleagues put together a microarray including all the amino acid changes inferred to have happened on the human lineage. They used this to genotype the Neandertal DNA, and show that out of more than 10,000 amino acid changes that happened in human evolution, only 88 of them are shared by humans today but not present in the Neandertals.

    That's amazingly few.

    Green and colleagues did a similar exercise, except they went looking for "selective sweeps" in the ancestors of today's' humans. These are regions of the genome that have an unusually low amount of incomplete lineage sorting with Neandertals, and therefore represent shallow genealogies for all living people. They identify 212 regions that seem to be new selected genes present in humans and not in Neandertals. This number is probably fairly close to the real number of selected changes in the ancestry of modern humans, because it includes non-coding changes that might have been selected.

    Again, that's really a small number. We have roughly 200,000-300,000 years for these to have occurred on the human lineage -- after the inferred population divergence with Neandertals, but early enough that one of these selected genes could reach fixation in the expanding and dispersing human population. That makes roughly one selected substitution per 1000 years.

    Which is more or less the rate that we infer by comparing humans and chimpanzees. What this means is simple: The origin of modern humans was nothing special, in adaptive terms. To the extent that we can see adaptive genetic changes, they happened at the basic long-term rate that they happened during the rest of our evolution.

    Now from my perspective, this means something even more interesting. In our earlier work, we inferred a recent acceleration of human evolution from living human populations. That is a measure of the number of new selected mutations that have arisen very recently, within the last 40,000 years. And most of those happened within the past 10,000 years.

    In that short time period, more than a couple thousand selected changes arose in the different human populations we surveyed. We demonstrated that this was a genuine acceleration, because it is much higher than the rate that could have occurred across human evolution, from the human-chimpanzee ancestor.

    What we now know is that this is a genuine acceleration compared to the evolution of modern humans, within the last couple hundred thousand years.

    Our recent evolution, after the dispersal of human populations across the world, was much faster than the evolution of Late Pleistocene populations. In adaptive terms, it is really true -- we're more different from early "modern" humans today, than they were from Neandertals. Possibly many times more different.

    More?

    That's what I have time for now, if I want to get this posted. There is much, much more to say on the topic, and you can bet it will be all Neandertals all the time here for the foreseeable future.

    References:

    Green RE and many others. 2010. A draft sequence of the Neandertal genome. Science (in press) doi:10.1126/science.1188021

    Burbano HA and many others. 2010. Targeted investigation of the Neandertal genome by array-based sequence capture. Science (in press) doi:10.1126/science.1188046

  • SNPtastic India

    Wed, 2009-09-23 14:49 -- John Hawks

    The cover story in Nature this week is a paper about the population history of India, from David Reich's lab. It's an important contribution to our knowledge of human genetic variation, and provides a very interesting set of data for further investigation of modern human origins, the dispersal of agriculture into the subcontinent, and the history of more recent Indian populations.

    Here's the abstract:

    India has been underrepresented in genome-wide surveys of human variation. We analyse 25 diverse groups in India to provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most Indians today. One, the 'Ancestral North Indians' (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, whereas the other, the 'Ancestral South Indians' (ASI), is as distinct from ANI and East Asians as they are from each other. By introducing methods that can estimate ancestry without accurate ancestral populations, we show that ANI ancestry ranges from 39–71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the indigenous Andaman Islanders are unique in being ASI-related groups without ANI ancestry. Allele frequency differences between groups in India are larger than in Europe, reflecting strong founder effects whose signatures have been maintained for thousands of years owing to endogamy. We therefore predict that there will be an excess of recessive diseases in India, which should be possible to screen and map genetically.

    The number of individuals is not huge for the purposes of population genetic analysis -- only 132 people from 25 groups -- but it is very significant in terms of recent samples. By comparison, it is around double the number of effective individuals in any of the HapMap v.1 populations, genotyped at more than 560,000 SNPs.

    The results of the study are basic population genetic issues, including the degree of endogamy, the pattern of regional differentiation, the likelihood of discovering new recessive genetic disorders by additional sampling. Some notes:

    Population mixture. The authors propose that today's groups descend in varying proportions from two ancient (and no longer existing) populations, which they call "ancestral North Indian" and "ancestral South Indian".

    I'm always skeptical of mixture models, especially when the putative source populations no longer exist. There are just too many ways that structured migration or dispersal can lead to the appearance of mixture. People once thought of "Alpines" as a mixture of pure Nordic and Mediterranean elements, after all-- and that was just because their heads were mesocephalic.

    Still, with a half-million SNPs, it's possible to do a better job testing the hypothesis of mixture versus structured migration. The authors in this paper didn't -- they applied a simplified "3 Population Test" that compares the empirical allele frequencies to proportions expected under only two scenarios: simple mixture or complete isolation. It seems to me that the null should be simple isolation by distance, which would give the same result as "mixture" according to their test. If you really want to look for population mixture, you need to involve the dimension of time, for example, by demonstrating the antiquity of haplotypes that have mixed together.

    So I don't accept this ancestral division, certainly not at face value. It does seem plausible that West Asian (and thereby European-related) genes have introgressed into India over time, perhaps in association with the growth of high-density agricultural populations. Maybe some of this gene flow occurred under the influence of positive selection, but processes of elite dominance and differential growth may have been sufficient.

    Regional differences. The results show a greater degree of regional genetic differentiation in India than has been found for continental Europe. Still, with an FST of only 0.01, we're not talking about major population splits here. With that number, the subcontinent is closer to panmixia than one might expect for a region its size. The authors suggest that founder effects explain the regional differentiation:

    We propose that the high FST among Indian groups could be explained if many groups were founded by a few individuals, followed by limited gene flow. This hypothesis predicts that within groups, pairs of individuals will tend to have substantial stretches of the genome in which they share at least one allele at each SNP. We find signals of excess allele sharing in many groups (Supplementary Fig. 2), which as expected tend to occur in the groups that have the highest FST values from all others (P = 0.002 for a correlation). To estimate the age of founder events, we measured the genetic distance scale over which allele-sharing decays, and verified the robustness of our procedure by simulation (Supplementary Fig. 3). Six Indo-European- and Dravidian-speaking groups have evidence of founder events dating to more than 30 generations ago (Supplementary Fig. 2), including the Vysya at more than 100 generations ago (Fig. 2). Strong endogamy must have applied since then (average gene flow less than 1 in 30 per generation) to prevent the genetic signatures of founder events from being erased by gene flow.

    I don't think that explanation works. With those times in generations, we're talking about events within the last 600-2000 years. Since all these calculations are done on the whole dataset assuming complete neutrality, I think we should look more closely at the distribution of LD across loci. It seems likely that some of the high-LD loci that appear to point to founder effects will actually be found to be selected.

    Relationships of Indian to non-Indian populations. One of the real problems of assuming a tree with no migration is that it leads to statements like this:

    [T]he ANI [ancestral North Indian] and CEU [HapMap European sample] form a clade, and further analysis shows that the Adygei, a Caucasian group, are an outgroup (Supplementary Note 4). Many Indian and European groups speak Indo-European languages, whereas the Adygei speak a Northwest Caucasian language. It is tempting to assume that the population ancestral to ANI and CEU spoke 'Proto-Indo-European', which has been reconstructed as ancestral to both Sanskrit and European languages, although we cannot be certain without a date for ANI–ASI mixture.

    Some of the common ancestors of some living Europeans and some Indians were probably speakers of proto-Indo-European speakers. But we can easily refute the hypothesis that all of the common ancestors did so -- some of those common ancestors lived more than 40,000 years ago, as is well-known from the mtDNA chronology. The tree model with complete isolation does not explain the data. So as simple as it is -- and as well-used by Cavalli-Sforza and others -- it would be better to use a more accurate model.

    UPDATE (2009-09-24): Gene Expression has a full review of the paper.

    UPDATE (2009-09-27): Very interesting angle by Suvrat Kher at Reporting on a Revolution:

    The Indian Press has made a hash of the finding....

    But I can't blame the press entirely. The scientists who gave interviews to the press didn't mention this. They wimped out on reporting this potential inflammatory and politically incorrect finding. This is just poor and irresponsible science outreach on part of the scientists. How can you ignore a finding that is staring out at you from the very paper you are talking about? The press may be guilty of not digging in but it was just reporting what the scientists told them.

    References:

    Reich D, Thangaraj K, Patterson N, Price AL, Singh L. 2009. Reconstructing Indian population history. Nature 461:489-494. doi:10.1038/nature08365

Subscribe to admixture

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.