john hawks weblog

paleoanthropology, genetics and evolution

introgression

  • Which population in the 1000 Genomes Project samples has the most Neandertal similarity?

    Wed, 2012-02-08 01:14 -- John Hawks

    Last December I began writing about an analysis of introgression in the 1000 Genomes Project samples ("Neandertal introgression, 1000 Genomes style"). I left everybody in a bit of suspense, partly because my writing computer was unexpectedly replaced before winter vacation, and partly because of my extensive travel in January.

    I'm catching up this week before I go to Ann Arbor, Michigan next week for a talk and visit with many friends. It's a good time to give readers some status updates on the analyses because the release of the high-coverage Denisova genome today will allow us to do some very deep checks on some of the comparisons we've carried out.

    Picking up where I left off, in the last post I emphasized that the individual genomes represented in the 1000 Genomes Project samples in Europe and East Asia have a surplus of derived SNP alleles that they share with the Vindija Vi33.16 genome. That surplus compared to genomes in the African population samples represents the evidence for Neandertal ancestry in those populations.

    Comparison of shared Neandertal derived variants in African, Chinese and European samples

    Admixed populations, including African-Americans and Puerto Ricans, shared Neandertal derived SNP alleles in the fraction expected for their African and non-African fractions of ancestry.

    Comparison of shared Neandertal derived variants in ASW, YRI and CEU samples

    As I also pointed out, the population samples in Europe and East Asia are not identical in the number of these shared derived variants. The difference between individuals can be caused by differences in the fraction of their genealogy that traces to Neandertals. The difference may also be caused by other aspects of the individuals' genealogy, if for example some aspect of population history has led to discrepancies in the fraction of ancient variations these people share with a Neandertal genome by incomplete lineage sorting.

    Here is the comparison of East Asian samples (Japanese, Han Chinese in Beijing, and Han Chinese originating in South China) and European samples (Tuscans, British, Finn and CEU samples, along with a handful of Spanish):

    Comparison of shared Neandertal derived variants in East Asian and European 1000 Genomes Project samples

    The Europeans average a bit more Neandertal than Asians. The within-population differences between individuals are large, and constitute noise as far as our comparisons between populations are concerned. At present, we can take as a hypothesis that Europeans have more Neandertal ancestry than Asians. If this is true, we can further guess that Europeans may have mixed with Neandertals as they moved into Europe, constituting a second process of population mixture beyond that shared by European and Asian ancestors.

    As we look more closely at the particular gene regions shared between each individual and the Neandertal, we will be able to consider the approximate time that they shared an ancestor for each gene region. That will allow us to distinguish incomplete lineage sorting (ILS) from introgression, although the two will overlap to some extent. We will rely on that test to examine hypotheses about the time and place of population mixture.

    The difference between Europeans and Asians when we lump all the samples together is not as interesting as the differences we can see among the samples within each of those regions. For example, here are British people compared to Tuscans:

    Comparison of shared Neandertal derived variants in British and Tuscan samples

    The Tuscans have the highest level of Neandertal similarity of any of the 1000 Genomes Project samples. They have around a half-percent more Neandertal similarity than Brits or Finns in these samples. The CEU sample is slightly elevated compared to Brits and Finns as well.

    It is tempting to interpret these differences as a north-south cline in Neandertal ancestry. I wouldn't jump too quickly on this idea, because Holocene population movements in Europe are now known to have covered up or erased a substantial fraction of the Upper Paleolithic gene pool. If we have a bonus of extra Neandertal ancestry in southern Europe, we need to explain how that cline persisted across subsequent history. Still, the difference is statistically very strong and deserves some explanation.

    Likewise, the populations within East Asia have some differences in Neandertal similarity. Here is the comparison of Han Chinese, with the Beijing versus South China origins separated out:

    Comparison of shared Neandertal derived variants in CHB and CHS samples

    North China has a bit more Neandertal, on average, than South China according to these samples. These are all identified as ethnic Han Chinese, so I expect that the comparison would be much more interesting if some minority populations had been examined. The "cline" here seems opposite in direction compared to the European case. I can add that the Japanese sample is largely intermediate between the CHB and CHS, with an average closer to the Beijing sample.

    If there was one thing that surprised me in the comparisons, it was this:

    Comparison of shared Neandertal derived variants in Luhya and Yoruba samples

    Yoruba have substantially more Neandertal similarity than Luhya. This may seem counter-intuitive, because the geographic location of Luhya in East Africa might seem better placed for Neandertal similarity to appear, whether through ancient population structure and ILS or through recent gene flow or backmigration into Africa of Neandertal descendants.

    Instead, it looks like the Yoruba are the recipients of Neandertal genes, whether by means of ancient population structure or introgression and recent trans-Saharan gene flow. I personally think both factors are involved, but again their relative importance will be determined by comparing individual gene regions.

    In this vein, it is useful to outline the hypothesis of differential ILS within African samples. We now know from examination of genetic variation within Africa today that some of today's diversity can be traced to ancient population structure in Middle Pleistocene African populations. For example, Neandertals could be more closely related to some African populations than others today because Neandertals actually exchanged genes with some ancient African populations. Or Neandertals might have sprung from one African population among many who lived 250,000 years ago. If some of these ancient populations persisted and contributed genes to different present-day African populations, those populations would share different fractions of genes with a Neandertal genome.

    I expect we will learn a substantial amount about African population structure in the MSA by using these Neandertal-similar regions of the genome. It's like having a probe that can trace the movement of people across Africa more than 100,000 years ago. As we combine the archaic genome data with our growing picture of diverse lineages in Africa today, we may discover ancient populations that are not apparent archaeologically. Again, genetics is giving us a totally new picture of the diversity and population dynamics of ancient people.

    Next: Which Neandertal-derived variants are shared between regions, and which are unique to one region? I touched on this question last spring by using genotype data. Now, we have sequences capable of telling us much more.

    Synopsis: 
    Europe has a touch more Neandertal than East Asia; Tuscans have more than any other European sample
  • Denisova in the news

    Mon, 2012-01-30 23:45 -- John Hawks

    Hey, I'm in the New York Times today!

    "DNA Turning Human Story Into a Tell-All"

    It's a story about the Denisova genome and its possible relationships to recent human populations. We have been concentrating here on the Neandertals for the last few months, but I did get some analyses run on Denisova last week (thanks in large part to my grad student, Aaron Sams, who lifted over the genome from the old to new genetic map coordinates). I'll share some of those results soon.

  • Looking over a Neandertal's shoulder

    Sat, 2012-01-07 18:04 -- John Hawks

    A study by Di Vincenzo, Steven Churchill and Giorgio Manzi has fallen into the early drawer of the Journal of Human Evolution: "The Vindija Neanderthal scapular glenoid fossa: Comparative shape analysis suggests evo-devo changes among Neanderthals" [1]. The authors do a very nice job taking a long-studied anatomical feature and reframing its variation within a new context. Reading through its discussion, I find much to like in the way Di Vincenzo and colleagues deal with the variation of late Neandertals and integrate the concept of introgressive gene flow among Late Pleistocene populations.

    The glenoid fossa is the part of the scapula that articulates with the head of the humerus. It's the base of the "socket" in the ball-and-socket joint of the shoulder -- indeed, "glenoid" comes from the Greek word for "socket". Roughly shaped like a rounded teardrop, the glenoid is narrower in early hominins and relatively broad in recent people. Neandertals have an intermediate form compared to earlier and later humans.

    Figure 1 from Di Vincenzo et al. 2012, showing glenoid fossa of Vi-209

    Figure 1 from Di Vincenzo et al [1]. Original caption: "The scapular fragment VI-209 and its stratigraphic position (arrow) within the Mousterian layers of complex G of Vindija cave (left) according to Malez et al. (1980). On the right, the configuration of the 60 semi-landmarks used in the analysis is superimposed on the SGF profile. Sliding points are filled. The stratigraphic column is from Janković et al. (2006). Photograph by Milford H. Wolpoff."

    The main point of the study is that the Vindija glenoid specimen, Vi-209, has a more humanlike form than other Neandertals. Another conclusion based on the comparative sample is that the sample of glenoids from late Neandertals is intermediate between early Neandertals and recent people. Likewise, Upper Paleolithic and Mesolithic-era European specimens are intermediate between late Neandertals and recent people. Here's a graph with the first and second principal components of the variation; I've highlighted these groups.

    Figure 3a from Di Vincenzo et al. 2012

    Figure 2a from Di Vincenzo et al. [1]. Altered to include sample names: Krapina, "Classic" and West Asian Neandertals, Vi-209, and Upper Paleolithic/Mesolithic. X-axis is the first principal component of variation based on analysis of the whole sample, Y-axis the second principal component.

    The first principal component basically depends on the relative breadth of the glenoid fossa, with living people being much broader and Australopithecus (represented by Sterkfontein Sts 7 and Malapa MH2) being much narrower relative to the overall size of the fossa. The authors tested and rejected the hypothesis that the apparent trend could be a simple effect of size. This test was carried out relative to glenoid size, and since Australopithecus had relatively large shoulders compared to Homo, size does not vary much across the hominin sample. It would be useful to consider whether body size might matter, but body size would not by itself explain the relations of the later members of the genus Homo.

    The authors emphasize that the data are consistent with a single evolutionary trend within the genus Homo, so that the Neandertal-human difference should be interpreted within the context of this broader pattern. They propose a specific developmental hypothesis.

    Therefore, it seems reasonable that heterochronic factors related to the prolonged developmental pattern of our species (Smith et al., 2007a), which contrasts with the faster growth rates of Neanderthals and other ‘archaic’ hominins (Smith et al., 2007b; but see; Guatelli-Steinberg et al., 2005), led to longer periods of bone deposition along the inferior-lateral edge of the SGF [scapular glenoid fossa]. This could explain the observed variation along PC1 (and/or CV1) for different morphs of the genus Homo, reaching in H. sapiens the greatest extent in width of the SGF and, particularly, of its scapular portion. This is also consistent with the observation by Churchill and Trinkaus (1990) that much of the variability of the glenoid surface is a function of size variation of the joint itself, which can be viewed as forming a single functional matrix sensu Moss and Young (1960). Thus, the overall reduction in developmental rates in the genus Homo (relative to those of other hominoids) across the Pleistocene may account for the general evolutionary trend in SGF shape seen in the fossils, with more marked changes in developmental rates between archaic (including Neanderthals) and early modern humans, producing somewhat more dramatic differences between these groups in joint shape. Green et al. (2010) suggest that some of the differences between Neanderthals and modern humans in shoulder and thoracic morphology (particularly those related to clavicular length) are attributable to differences in the RUNX2/CBFA1 gene. The temporal pattern observed here would suggest that, with respect to SGF shape at least, that some differences are due to overall differences in developmental schedules (rather than specific differences in genes controlling development of the shoulder, such as RUNX2/CBFA1 or HoxC6).

    By suggesting at least one actual genetic substitution in recent humans, they lend some plausibility to the idea. I am more hesitant to accept the assumption that Neandertals had faster developmental schedules than recent people, although it could be true. This specific assumption is not necessary to support the idea of heterochronic change in the glenoid, which could be caused by much more focused developmental processes. If glenoid shape reflects heterochronic developmental changes, the data suggest that those changes were ongoing in global populations during the Holocene. Indeed, the difference between recent people in the study and Upper Paleolithic Europeans is as great as the difference between late Neandertals and Upper Paleolithic Europeans. The study's recent human sample covers a broad geographic distribution but is relatively small in numbers; a fuller comparison of recent people might uncover a more interesting pattern of change.

    The scapula has long figured in discussions of Neandertal genetic persistence. Neandertal scapulae often have a sulcus (groove) on the dorsal (back) aspect of the axillary border, and this feature is also found in a high fraction of early Upper Paleolithic skeletons [2] The axillary border morphology probably has no functional or developmental correlation with the glenoid morphology, so these features are best viewed as separate issues. I mention the axillary border only because of one significant commonality with the glenoid as considered here: We don't know how much variation in the trait may be explained by environment. Maybe the way an individual uses her arms when growing will affect the form of the scapula? With the axillary border, this question has occupied many researchers who tried to determine why some humans resemble some Neandertals and vice versa [3]. The current consensus is that a dorsal axillary sulcus probably reflects early developmental processes that are substantially influenced by genetics instead of shoulder activity pattern, but the consensus is not without detractors.

    In this study, the authors consider the role of introgressive gene flow among Pleistocene populations as a way to maintain the apparently continuous trend:

    The morphology of the SGF [scapular glenoid fossa] is unlikely to be under the genetic control of a single locus. Thus, it is more likely that regulatory genes controlling developmental rates overall produce pleiotropic effects throughout the skeleton. The introduction of these and other (non-regulatory) alleles into the Neanderthal populations of the Near East, and their movement by gene flow across Neanderthal demes into southern Europe (well in advance of the actual in-migration of modern humans) could account for mosaic morphology seen in the Vindija G3 Neanderthals, including the Vi-209 scapula. Introgression and subsequent gene flow would not be expected to have affected early Neanderthal populations (those predating the admixture), nor late Neanderthal populations from western (trans-Alpine) Europe, because they were separated by geographic barriers ( [Fabre et al., 2009] and [Degioanni et al., 2011] ), and/or protected from gene flow by distance (as hypothesized by Voisin, 2006).

    There is as yet no evidence that the Vindija Neandertal genomes have genetic introgression from the African populations from which present non-Africans derive most of their genetic heritage. Green and colleagues [4] tested explicitly for this kind of gene flow, from "modern" into Neandertal populations and found none.

    And yet, the latest Neandertals are consistently similar to recent people in ways that earlier Neandertals were not. The glenoid fossa of Vi-209 is not an isolated case, it joins many other characteristics in this sample (as noted in the quote above) and other Neandertal samples after 45,000 years ago.

    Frankly, I expect that the admixture estimates presented thus far will prove to be wrong. I could be wrong in this expectation, but there are many assumptions underlying genetic analyses of admixture, and it's easy for an incorrect assumption to give rise to an incorrect conclusion. I take the morphological evidence very seriously as a possible "reality-check" about the validity of genetic comparisons. After all, the morphological comparisons predicted introgression from Neandertals in the first place...

    Another reaction to the study by Zachary Cofran: "Evo-devo of the human shoulder?"

    Fabio Di Vincenzo and colleagues analyzed the shape of the outline of the glenoid fossa on the scapula (not to be confused with the glenoid on your skull), from Australopithecus africanus to present day humans. The glenoid fossa is essentially the socket in the ball-and-socket joint of your shoulder. The authors found that there is pretty much a single trend of glenoid shape change from Australopithecus through the evolution of the genus Homo: from the fairly narrow joint in Australopithecus africanus and A. sediba, to the relatively wide joint in recent humans. The overall size and shape of the joint influences/reflects shoulder mobility, so presumably this shape change hints that more front-to-back arm motions became more important through the course of human evolution (authors suggest throwing in humans from the Late Pleistocene onward).

    I think Cofran takes this in an interesting direction with respect to his own dissertation work on development in earlier hominins.


    References

    1. Di Vincenzo F, Churchill SE, and Manzi G. 2011. The Vindija Neanderthal scapular glenoid fossa: Comparative shape analysis suggests evo-devo changes among Neanderthals. Journal of human evolution.
    2. Frayer DW. 1992. The persistence of Neandertal features in post-Neandertal Europeans. In: Bräuer G, Smith FH Continuity or Replacement? Controversies in Homo sapiens Evolution. Continuity or Replacement? Controversies in Homo sapiens Evolution. Rotterdam. p 179–188.
    3. Trinkaus E. 2008. Kiik-Koba 2 and Neandertal axillary border ontogeny. Anthropological Science 116:231 - 236.
    4. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, et al. 2010. A Draft Sequence of the Neandertal Genome. Science [Internet] 328:710–722. Available from: http://dx.doi.org/10.1126/science.1188021
    Synopsis: 
    A study of the glenoid fossa finds a pattern across the genus Homo, and similarities between a Vindija specimen and more recent humans
  • A quick look at your Neandertal fraction

    Fri, 2011-12-16 15:13 -- John Hawks

    The 23andMe blog, the Spittoon, has a description of their new technique to use 23andMe SNPs to estimate any customer's fraction of Neandertal: "Find your inner Neanderthal".

    The result is a rough-and-ready numerical estimate of your Neandertal ancestry fraction. For me it's 2.5 percent. Gretchen is 3 percent, and she's been lording it over me all day.

    The estimate is the work of Eric Durand, who broke ground on the D-statistic method for finding introgression from archaic genomes [1]. He has made public a short white paper describing the application.

    So far, all estimates of Neandertal (or other archaic human) ancestry have come from the proportion of a genome (or genotypes from a genome) that are shared and derived with Neandertals. That includes the results I've been posting here for the 1000 Genomes Project samples this week.

    The next step is to uncover exactly which parts of a person's genome have come from Neandertal ancestors. To discover this, we have to further determine which shared alleles come from recent introgression as opposed to ancient incomplete lineage sorting. We have been working very hard on that problem here, as you'll see, and it has been an important aspect of our work in pigmentation genes in the archaic genomes.

    If you have been considering getting your genotypes from 23andMe, it has become a very good time to do this. The overall fraction of your DNA derived from Neandertals is only the beginning. Soon we'll be able to specify which parts, and in a few cases we'll have a good guess as to what difference it makes. If you want to participate in this research, I'm hoping to gather as many interested people as I can -- so keep your eyes here over the next month.

    And if you are interested in having your genotypes done, feel free to use my link to the 23andMe promotion. I've been very happy with their way of presenting the genotypes and their updates, and know many other people who have also found it interesting. As I wrote a couple of years ago, it's not something to spend your food money on, but it does have an entertainment value. And the potential to be an active research participant.


    References

  • Mailbag: Did Neandertals have the derived MCPH1 allele?

    Thu, 2011-12-15 08:38 -- John Hawks

    Re: "Introgression and microcephalin FAQ"

    Hi Dr. Hawks,

    I just ran across your introgression and microcephalin FAQ on your blog, and I wanted to ask you one quick question. Now that we have a draft sequence of the Neanderthal genome, has anyone yet looked to confirm that one of the modern human microcephalin alleles was bestowed upon us by admixture with Neanderthals?

    Thanks in advance!

    Thanks for writing!

    Lari and colleagues published on this last year: 10.1371/journal.pone.0010648, [1] they didn't find the derived (presumed introgressed) allele in Monti Lessini 1. We have no sign of it in the Vindija genomes, either. So far, no sign of it. The other encouraging gene region was an inversion including the MAPT gene; this also has not yet been found in a Neandertal.

    So now we have tons of evidence of introgression, but none of the genes that we thought were strong cases before the ancient DNA. That doesn't rule out that we'll find these other cases in some ancient specimen, but in the meantime we're working on what we have.


    References

  • Mailbag: Neandertal derived SNP alleles

    Tue, 2011-12-13 09:48 -- John Hawks

    Re: Neandertal introgression, 1000 Genomes style:

    Long-time reader of your blog, non-paleo/anthro/genetics person, here. But please read on:

    Just a couple of brief questions.

    (i) It seems that it would make sense to look at pairwise comparisons (of shared derived Neanderthal SNP alleles) both within a population (e.g., Asians, or CEU) and between them, and build a histogram of how often they overlap.

    (ii) Then one could remove from the data set all such African shared SNPs - assuming that most of them are incomplete lineage sorting but that Africa had the initial superset of alleles before ooA (I know some are likely West Asian or European admixture, reducing the data set slightly more than necessary), and repeat (i) and similar diagnostics. Is the typical unmodified genome chunk length around such sites much longer than in (i) - can one date this? Can one now better quantify the actual admixture percentage outside of Africa?

    Wouldn't such a procedure give more insight about how Neanderthal introgression is distributed, when it occurred, and perhaps where it occurred?

    I am sure you are already working on similar ideas - just wanted to know if you agree that these may be low-hanging fruit to pursue.

    Thanks!

    Hi -- thanks for writing!

    I started with exactly the approach you describe, when we were working exclusively with SNP data in the spring. For example:

    http://johnhawks.net/weblog/reviews/neandertals/neandertal_dna/europe-ch...

    We were using linked haplotypes rather than single SNPs but the filtering process was the same.

    Now I am hopeful that we will have decent age estimates for the introgressing SNPs from a different technique. I would rather find these ages independently of filtering by geographic location, because having this information will greatly simplify testing models of ancient population dynamics. If we succeed at this, we will also have a test of selection based on the same allele ages.

    I am continuing to update and you'll see these results not long after we get them!

  • Neandertal introgression, 1000 Genomes style

    Sat, 2011-12-10 18:16 -- John Hawks

    For our project to understand pigmentation genetics in archaic humans, we had to find a good comparative sample of sequence data from recent humans. The original publication on the draft Neandertal genomes compared them to five low-coverage genomes from different Old World populations, along with the publicly available genomes from Craig Venter and others [1]. The first publication on the Denisova genome added an additional handful of genomes to these comparisons [2].

    Some of these handful of genomes from living people are more similar to the Neandertal and Denisova genomes than others. That simple fact is the proof that some living people have Neandertal and Denisovan ancestors.

    But until now, the comparison has been limited to a very small number of human genomes. That became a focus for critics of the Neandertal and Denisovan results. How could three or four genome sequences possibly provide an adequate representation of human variability? We could imagine scenarios in which the similarities between Neandertal and humans could be explained by some unsampled population, for example, northeast Africans [3]. Denisova does not present the same problem, because African population structure cannot possibly explain its resemblance to populations in Wallacea, Australia, and Oceania [2] [4]. But to compare either of these genomes, we should seek a broader sampling of genomes from living people.

    As I wrote yesterday, my students and I have been working to understand pigmentation genetics of the archaic human genomes ("Pigmentation of archaic humans: introduction"). I've emphasized the need to break the analysis into small steps. For this question, we need to examine whether the pattern of introgression around pigmentation genes is characteristic of the genome as a whole. If genes involved in pigmentation have systematically higher or lower levels of Neandertal ancestry, that will tell us a lot about the evolutionary history of pigmentation in recent and archaic humans. For this, we need a good comparative sample, and the 1000 Genomes Project provides the best sample available.

    The first step in assessing the pattern of introgression for pigmentation genes is to characterize the pattern of introgression across the whole genome.

    Yes, a whole-genome introgression analysis sounds awfully big for my "small steps" concept. But actually this is simpler than it might sound. Here's a teaser:

    The figures in this post are not from a whole-genome analysis; they include data from eight chromosomes that we prioritized because of our pigmentation analysis. I am licensing all of them under a Creative Commons ShareAlike license so that anyone can use them anywhere.

    UPDATE (2011-12-10): I finished the whole genome analysis and am updating this post and figures accordingly. The results are the same throughout, with the exception of the Europe-East Asia comparison, which now shows these populations to be significantly different across the genome as a whole. I have partially updated the figures and will finish these later today.

    The value of sequences

    The 1000 Genomes Project data have been updated several times in the last year, as both sequencing and analysis of the genomes have progressed (more information on 1000 Genomes Project website). We downloaded a release of SNP genotype calls from 1094 individuals, based on the low-coverage (average 4x) sequencing that has been carried out on the sample.

    A SNP (single nucleotide polymorphism) is a nucleotide site with at least two alleles present in the global human sample. These sites represent only one kind of genetic variation in today's populations. Many of the differences between people's genes are caused by insertions, duplications, deletions, transpositions, or inversions. But those kinds of polymorphisms can be challenging to study in low-coverage genomes, and we already understand quite a lot about SNPs in human populations from the earlier HapMap project [5] [6]. The HapMap provided the data underlying our 2007 paper on the acceleration of recent human evolution ("Why human evolution accelerated") [7].

    The drawback of earlier SNP variation projects is that they examined only a subset of SNP variation in a sample of people. To design a microchip that could provide a million or more SNP genotypes from a saliva sample, somebody first had to discover where in the genome SNPs could be found. So they took small samples of people, sometimes only a single person's two copies of the genome, and sequenced. Adding together SNPs found by several methods, they could get a representation of SNP variation across the whole genome in a population. But this process introduced a bias: the SNPs were ascertained in a sample that inevitably could not represent humans in other samples with the same accuracy. Initially, SNP samples were heavily biased toward people of European ancestry (upon whom most genetic work was originally done), and the HapMap project went to great efforts to increase the representation of other populations. But even with the best possible ascertainment, interpreting SNP variation requires us to jump through some theoretical hoops.

    Sequence data make life much easier for the population geneticist. Seriously, working on this stuff on the whiteboard is fun instead of a constant nightmare of sampling biases and spaces between markers. I have a bias myself, in that I find recombination hard to deal with. I love reticulation among populations, but I'd rather work with genealogies that look like proper trees instead of a liana-strewn mess. So looking at sequence data over short intervals makes me happy. Not as happy as beer aged in bourbon barrels, but happy.

    The 1000 Genomes Project SNP files represent every SNP mutation observed in the sample. In other words, these are sequence data, just with all the fixed (and therefore redundant) sites removed. Even so, these sequence data are not perfect. Low coverage means that some rare mutations in the sampled individuals will go unreported. We aren't typically interested in singleton mutations in the sample, except that missing them will introduce a bias upon our estimates of the time that common ancestors lived. Next-gen sequence reads are usually fairly riddled with errors. High coverage allows these errors to be removed with some confidence, but low-coverage genomes risk throwing out real SNPs along with the spurious ones. The publicly available files represent some analytical steps that we do not here control, so we have to work with the understanding that the data are not perfect.

    The 1000 Genomes SNP files have had a phasing algorithm applied to them, which attempts to assign genotypes to chromosomes. In essence, phasing tries to figure out whether adjacent SNP alleles belong to the same copy or to different copies of the same chromosome. The details of this phasing are not yet apparent, and for many reasons I am cautious about using phased data. The inference is often inaccurate for rare mutations, and the whole process tends to sneak assumptions about population history into the resulting dataset. I hate being forced to live with someone else's assumptions about human population history, and I typically try to avoid needing phased data. In this case, it looks like the data over short intervals are as accurate as they can be, given the limitations on coverage and sampling. We have moved forward by applying methods that make a bare minimum of assumptions.

    Counting derived SNP alleles

    David Reich and colleagues came up with an appealingly simple test of introgression, which they applied to both the Neandertal and Denisovan genomes. Eric Durand, Reich, Nick Patterson and Monty Slatkin described the method formally this year [8], which they call the D-statistic. Informally, this has become known as the ABBA-BABA test, after their labels for the discordant genealogies that the test compares. By and large, across the genome, humans living today share many more new mutations with each other than they do with an archaic human like a Neandertal. But sometimes two genomes are different from each other, and one of them shares a new mutation with the Neandertal.

    A human might share a mutation with a Neandertal because it actually isn't very new, and both inherited the mutation from some much more ancient population of humans. This scenario is called "incomplete lineage sorting", because humans today have multiple gene lineages that existed within some very ancient population, instead of these having been "sorted" cleanly into the different human and Neandertal populations. Incomplete lineage sorting does happen a lot between humans, Neandertals, and Denisovans. ILS is the normal mode of variation among recent human populations, who trace their genealogical histories back much further than the earliest "modern" humans. So if one human has a Neandertal allele, and another human has a different allele, it's probably no big deal. They both just inherited gene variants that already existed in our distant common ancestors.

    You can probably see already that if we had a way to estimate the age of an allele, we could tell whether incomplete lineage sorting is a credible explanation for any particular site. I'll leave that point for another post.

    In the meantime, if we pretend that we know nothing at all about the ages of alleles, we must find some other way to tell whether incomplete lineage sorting can explain Neandertal similarities. Reich and colleagues recognized that incomplete lineage sorting from ancient pre-Neandertal ancestors ought to be distributed equally among living people. If we look at every site in the genome where we have data from Neandertals, we should find that one living human genome should look like the Neandertal just as often as another.

    This insight led to their test. Take a pair of humans, count the number of times sequence A is like the Neandertal and sequence B is like a chimpanzee, and then do the inverse — B then A. ABBA-BABA.

    Why a chimpanzee? In most cases the chimpanzee allele will represent the ancestral state for humans. Living people can inherit ancestral alleles from Neandertals as well as derived ones, but the derived ones tend to be rarer and younger within human populations. If one living genome shares an ancestral allele with the Neandertal genome, we don't need incomplete lineage sorting or introgression to explain the pattern. For all we know, such a mutation originated after Neandertals were already gone. So we need to pay attention to the derived mutations, ones that are present in Neandertals but not in chimpanzees. Do a count of these across the genome, and if you find a living genome with significantly more than another, you've found evidence for introgression.

    Ed Green, David Reich and colleagues [1] [2] did a comparison of every possible pair of genomes in their modern human sample. These sequence data were gappy, so that sequence A might share different coverage with B than with sequence C. So it was necessary to consider each pair separately, counting all the sites where both human sequence and the Neandertal and chimpanzee sequences had data.

    The 1000 Genomes Project sample reports genotypes for every SNP for every sampled individual. So in principle, every pair of sequences should have data for every one of these sites. Again, we have to be cautious about the nature of the sequencing, attending to the possibility of systematic biases due to low coverage. But we really don't have to take the time-consuming step of comparing every possible pair of the 2188 resulting haploid genomes. We can just find the derived SNP alleles that are present in Neandertals and count how many of them are in each of the human sequences. If one sequence has significantly more Neandertal derived alleles than another, it had to get them somehow.

    That magic three percent

    The figure at the top of the post represents that count. Every individual in the 1000 Genomes Project dataset has two copies of the autosomal genome. Separating these two copies of the genome (basically arbitrarily) and counting up the shared derived features between each of those copies and the genome of Vindija 33.16, we obtain the histogram. Here it is again:

    The African genomes in the 1000 Genomes sample include Yoruba from Nigeria and Luhya from Kenya. The Asian populations sampled are Japanese and Chinese, including people of Han Chinese ethnicity in Beijing and southern China. The European ancestry samples include the CEU sample from Utah, as well as British, Tuscan, Spanish and Finn samples.

    The histogram shows that Asian and European genomes have significantly more Neandertal derived SNP alleles than do the African genomes. The averages for the Asian and European samples are around 3% higher than the average for the African samples. Whatever gave Africans some degree of similarity to Neandertals, non-Africans seem to have gotten around 3% more of it.

    Green and colleagues [1] assumed conservatively that Africans share derived SNP alleles with Neandertals only because of incomplete lineage sorting from the human-Neandertal ancestral population. This fraction should be the same in all human populations, under the assumption that Africans were mostly isolated from Neandertals for some period of time. The 3% Neandertal bonus outside Africa should then represent introgression from Neandertals into recent populations outside Africa.

    Both previous studies noted that genomes outside Africa are not significantly different in the fraction of derived SNP alleles shared with Neandertals. A genome from China and a genome from France carried the same fraction of shared derived SNP alleles with Neandertals. Here, we've confirmed that basic identity in the level of introgression in these populations.

    I have told several people now that I find the distributions in China and Europe spookily similar. On parts of the genome, the two distributions have means that are not significantly different. Indeed, I worked for a week with an analysis of eight chromosomes, in which the East Asian and European means were fewer than 100 SNP alleles apart. Even across the whole genome, Europeans average only 700 derived SNP alleles more than the East Asian sample. This small difference a bit more than a tenth of a percent) is strongly significant on these sample sizes. A t-test yields a p-value of 1.1 times 10-26 on the difference in means. Even so, the distributions of these two populations overlap across most of their ranges.

    Seeing these hundreds of genomes arrayed on a histogram provides much more information than we had from a handful of genomes. It is remarkable how much dispersion there is among genomes from a single population. Although the means of these two samples are nearly the same, you can see that each of them has a large range of variation in the shared derived SNP alleles with Neandertals. This variation means that people within a single population have very different proportions of Neandertal ancestry.

    This is not a graph of people, but a separation of the two copies of SNP alleles carried by these people. That separation is phased at short scales but arbitrary on the scale of a whole chromosome, so the histogram likely understates the variance among single genomes while it overestimates to some extent the variation among people with their diploid genomes. Still, it looks likely from these comparisons that some people in Europe carry more than a percent higher Neandertal ancestry than the average, and some carry a percent less. We can use statistical methods to test this hypothesis directly as applied to individuals in the sample.

    Neandertal genes in recently admixed populations

    A sample of hundreds of people allows us to demonstrate significant differences among the genomes of different populations. Some of the 1000 Genomes Project samples are from populations that represent historically recent admixture of people who trace their ancestry to different parts of the world.

    For example, the "ASW" population sample includes African-American people who live in the Southwest United States. We know from many other genetic studies that African-Americans vary in the fraction of ancestry they derive from Europeans and from Africans. The average amount of African and European ancestry varies among African-Americans who live in different parts of the U.S., as low as 3% and as high as 20% or more in some parts of the country. The proportion among individuals varies even more. So when we consider the ASW sample, we should expect to see a lot of variation in the number of shared derived SNP alleles with Neandertals, with a mean higher than African populations.

    Which is exactly what we do see:

    The ASW sample overlaps substantially with the Yoruba sample from West Africa (Nigeria) and slightly with the CEU sample, which includes people of European ancestry in Utah. The total in the ASW genomes is more variable than either the Yoruba or CEU population samples. If the higher mean in the ASW genomes reflects European ancestry from a population like CEU, the proportion of European ancestry would be around 17% for that sample of people. It would be hard to tell from these numbers alone how much of the variation in ASW is attributable to variation in ancestry fraction, and how much is expected within a population of homogeneous ancestry. As we'll see in some other populations, there are some appreciable differences among populations within a given region, and ancestry differences may add to the variation among individuals within populations.

    We see a similar pattern when we look at the Puerto Rican sample. Individuals in this sample have some ancestry from European, Native American and African ancestors. The comparisons by Reich and colleagues [2] and Green and colleagues [1] suggested that Native American populations have the same fraction of Neandertal ancestry as other people outside Africa. In the comparison with YRI and CEU samples, Puerto Rican (PUR) genomes are intermediate, with a mean suggesting around 15% ancestry from the West African population.

    The two outlier points in the Puerto Rican sample are the two genome copies from one individual, who we would hypothesize had much higher African ancestry than the average in the sample.

    Next...

    This post has taken me much longer than I expected to get to the point of talking about variation among samples within continental regions. It turns out that, despite the similarity of European and East Asian samples in their averages, there are substantial differences between samples within each of these regions.

    For example, here's a comparison of north and south Chinese samples:

    People of Han Chinese ethnicity sampled in Beijing appear to have on average a half percent more Neandertal ancestry than people of the same ethnicity sampled in southern China. I found these kinds of differences almost everywhere I looked within regions. More later...


    References

    1. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, et al. 2010. A Draft Sequence of the Neandertal Genome. Science [Internet] 328:710–722. Available from: http://dx.doi.org/10.1126/science.1188021
    2. Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PLF, et al. 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature [Internet] 468:1053–1060. Available from: http://dx.doi.org/10.1038/nature09710
    3. Hodgson JA, Bergey CM, and Disotell TR. 2010. Neandertal genome: the ins and outs of African genetic diversity. Current biology : CB 20:R517-9.
    4. Reich D, Patterson N, Kircher M, Delfin F, Nandineni MR, Pugach I, Ko AM-S, Ko Y-C, Jinam TA, Phipps ME, et al. 2011. Denisova admixture and the first modern human dispersals into southeast Asia and oceania. American journal of human genetics 89:516-28.
    5. The International HapMap Consortium. 2005. A Haplotype Map of the Human Genome. Nature [Internet] 437:1299–1320. Available from: http://dx.doi.org/10.1038/nature04226
    6. McVean G, Spencer CCA, and Chaix R. 2005. Perspectives on human genetic variation from the HapMap Project. PLoS genetics 1:e54.
    7. Hawks J, Wang ET, Cochran G, Harpending HC, and Moyzis RK. 2007. Recent acceleration of human adaptive evolution. Proceedings of the National Academy of Sciences, U. S. A. [Internet] 104:20753–20758. Available from: http://dx.doi.org/10.1073/pnas.0707650104
    8. Durand EY, Patterson N, Reich D, and Slatkin M. 2011. Testing for ancient admixture between closely related populations. Molecular biology and evolution [Internet]. Available from: http://dx.doi.org/10.1093/molbev/msr048
    Synopsis: 
    We're quantifying the amount of Neandertal ancestry in whole genome data from living people.
  • Mailbag: Neandertal-human comparisons

    Fri, 2011-12-09 21:38 -- John Hawks

    Re: Neandertal-human comparisons

    Your website states, "of those positions where the human genome differs from chimpanzees, Neandertals have the chimpanzee version around 12.7 percent of the time."

    Since the subject is the comparison with supposed MRCA of humans/chimps, shouldn't the correct statement be, "of those positions where the human genome differs from chimpanzees, Neandertals have the MRCA version around 12.7 percent of the time." ?

    Or therefore, "of those positions where the human genome differs from chimpanzees, Neandertals have the chimpanzee version around 6.35 percent of the time."

    If Neanderthals were something like 2 million base pairs closer to chimpanzee, shouldn't a few thousand of those base pairs be in at least a few modern Eurasians ?

    Hi, thanks for your question!

    Your point is correct that Neandertals do not have chimpanzee ancestors. If we were considering a comparison of all sites in the Neandertal sequence, you would be correct about the proportions. Neandertals would lack some proportion of the mutations that occurred on the modern human's lineage but they would lack every one of the mutations that happened on the chimpanzee lineage -- except for a very small fraction of parallelisms.

    However, the comparison carried out by Green and colleagues was not of the entire genome, but specifically those sites in the genome that underwent mutations on the human lineage. The mutations on the chimpanzee lineage from the MRCA are completely ignored by this comparison.

    The chimpanzee genome therefore stands in for the MRCA in this comparison. Sites at which both chimpanzees and humans have undergone parallel mutations have the potential to confound this comparison, because they are not counted (they are not places where the human and chimpanzee genomes differ). But the proportion of human substitutions that are also chimpanzee substitutions from the MRCA is very small, only around 1 percent of the human sites.

    The fraction of Neandertal ancestry of Eurasians is around 3 percent, this is calculated differently, by examining polymorphisms within human populations today and considering the fraction shared by different humans' genomes with Neandertals. Eurasian people have around 3 percent more similarity with Neandertals than present-day Africans.

  • Mailbag: Spuds and mutts

    Wed, 2011-11-09 00:28 -- John Hawks

    Re: "How widespread is Denisovan ancestry today?" and "Potato sack race":

    Question about Denisovan DNA. Once introduced into a population, beginning many millenia ago, what keeps it from being in the DNA of everybody in the area? I exclude new arrivals, but what kept the Denisovan DNA from being spread to the homeland of the new arrivals what with the traveling salesmen, the refugees from tribal pushing and shoving, armies marching, cross marching and countermarching? It isn't as if Denisovan genes cause assortative mating by making the possessor either a hell of a catch or a last-man-on-earth scenario. Is it? Selective survival against diseases that come and go, while not so good in between, a la sickle cell? Is the blender model of human reproduction faulty somehow.

    As to potatoes, I'd heard that one advantage is that armies, used to pasturing their horses in the grain of the enemy's peasants' fields, had to move on more quickly when the supply officers gave up trying to get their foraging parties to dig potatoes.

    If, as Keegan hypothesizes, the ration was one pound of meat and two of bread (requiring two pounds of firewood) per man per day, an army of 30,000 ate out a location pretty quickly. If spuds were the local staple, they'd have to move. You just can't feed 30,000 guests who arrived unannounced by digging potatos. Not fast enough. Do horses like potatos? So, the army moves on--win--and the peasants get out the potato forks and do okay, more or less. Win.

    Re: potatoes -- I think you've pointed to an important factor -- also, they can't be burned when the army retreats. The sheer productivity of tubers really does outweigh the available grain crops in Northern Europe.

    Re: Denisovan DNA -- The genes should have diffused into other populations, all things being equal. That they did not do so is a pretty strong indication that SE Asia today shares little genetically with SE Asia 30,000 years ago. There must have been a massive influx of people who lacked Denisovan ancestry, well after the initial mixture with Denisovans happened and Denisovans themselves left the scene.

  • Braiding Denisovans into our ancestry

    Fri, 2011-11-04 10:39 -- John Hawks

    Dalton Luther reflects on the Denisovan admixture paper [1] that I wrote about earlier this week ("How widespread is Denisovan ancestry today?"), by referring to John Moore's work on ethnogenesis [2].

    Getting back to the original quote about Denisovan legacies, just because the Denisovans aren’t “around” anymore, doesn’t mean they’re not “around.” An ancient population is present even though in a very different form. Using the braided river metaphor, the name Denisovan refers to the contents of a particular stream that mixed back into another stream, which grew larger, amplifying its original contents.

    What seems to be the challenging concept to some geneticists is that some people today have that legacy and others don't. But it's not at all unusual for that to be true of families, kindreds, cultural traits, or even languages. So why should it be unusual for populations?


    References

    1. Skoglund P, and Jakobsson M. 2011. Archaic human ancestry in East Asia. Proceedings of the National Academy of Sciences, U. S. A.
    2. Moore JH. 1994. Putting anthropology back together again: the ethnogenetic critique of cladistic theory. American Anthropologist 96:925–948.

Pages

Subscribe to introgression

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.