john hawks weblog

paleoanthropology, genetics and evolution

Neandertal DNA

  • Which population in the 1000 Genomes Project samples has the most Neandertal similarity?

    Wed, 2012-02-08 01:14 -- John Hawks

    Last December I began writing about an analysis of introgression in the 1000 Genomes Project samples ("Neandertal introgression, 1000 Genomes style"). I left everybody in a bit of suspense, partly because my writing computer was unexpectedly replaced before winter vacation, and partly because of my extensive travel in January.

    I'm catching up this week before I go to Ann Arbor, Michigan next week for a talk and visit with many friends. It's a good time to give readers some status updates on the analyses because the release of the high-coverage Denisova genome today will allow us to do some very deep checks on some of the comparisons we've carried out.

    Picking up where I left off, in the last post I emphasized that the individual genomes represented in the 1000 Genomes Project samples in Europe and East Asia have a surplus of derived SNP alleles that they share with the Vindija Vi33.16 genome. That surplus compared to genomes in the African population samples represents the evidence for Neandertal ancestry in those populations.

    Comparison of shared Neandertal derived variants in African, Chinese and European samples

    Admixed populations, including African-Americans and Puerto Ricans, shared Neandertal derived SNP alleles in the fraction expected for their African and non-African fractions of ancestry.

    Comparison of shared Neandertal derived variants in ASW, YRI and CEU samples

    As I also pointed out, the population samples in Europe and East Asia are not identical in the number of these shared derived variants. The difference between individuals can be caused by differences in the fraction of their genealogy that traces to Neandertals. The difference may also be caused by other aspects of the individuals' genealogy, if for example some aspect of population history has led to discrepancies in the fraction of ancient variations these people share with a Neandertal genome by incomplete lineage sorting.

    Here is the comparison of East Asian samples (Japanese, Han Chinese in Beijing, and Han Chinese originating in South China) and European samples (Tuscans, British, Finn and CEU samples, along with a handful of Spanish):

    Comparison of shared Neandertal derived variants in East Asian and European 1000 Genomes Project samples

    The Europeans average a bit more Neandertal than Asians. The within-population differences between individuals are large, and constitute noise as far as our comparisons between populations are concerned. At present, we can take as a hypothesis that Europeans have more Neandertal ancestry than Asians. If this is true, we can further guess that Europeans may have mixed with Neandertals as they moved into Europe, constituting a second process of population mixture beyond that shared by European and Asian ancestors.

    As we look more closely at the particular gene regions shared between each individual and the Neandertal, we will be able to consider the approximate time that they shared an ancestor for each gene region. That will allow us to distinguish incomplete lineage sorting (ILS) from introgression, although the two will overlap to some extent. We will rely on that test to examine hypotheses about the time and place of population mixture.

    The difference between Europeans and Asians when we lump all the samples together is not as interesting as the differences we can see among the samples within each of those regions. For example, here are British people compared to Tuscans:

    Comparison of shared Neandertal derived variants in British and Tuscan samples

    The Tuscans have the highest level of Neandertal similarity of any of the 1000 Genomes Project samples. They have around a half-percent more Neandertal similarity than Brits or Finns in these samples. The CEU sample is slightly elevated compared to Brits and Finns as well.

    It is tempting to interpret these differences as a north-south cline in Neandertal ancestry. I wouldn't jump too quickly on this idea, because Holocene population movements in Europe are now known to have covered up or erased a substantial fraction of the Upper Paleolithic gene pool. If we have a bonus of extra Neandertal ancestry in southern Europe, we need to explain how that cline persisted across subsequent history. Still, the difference is statistically very strong and deserves some explanation.

    Likewise, the populations within East Asia have some differences in Neandertal similarity. Here is the comparison of Han Chinese, with the Beijing versus South China origins separated out:

    Comparison of shared Neandertal derived variants in CHB and CHS samples

    North China has a bit more Neandertal, on average, than South China according to these samples. These are all identified as ethnic Han Chinese, so I expect that the comparison would be much more interesting if some minority populations had been examined. The "cline" here seems opposite in direction compared to the European case. I can add that the Japanese sample is largely intermediate between the CHB and CHS, with an average closer to the Beijing sample.

    If there was one thing that surprised me in the comparisons, it was this:

    Comparison of shared Neandertal derived variants in Luhya and Yoruba samples

    Yoruba have substantially more Neandertal similarity than Luhya. This may seem counter-intuitive, because the geographic location of Luhya in East Africa might seem better placed for Neandertal similarity to appear, whether through ancient population structure and ILS or through recent gene flow or backmigration into Africa of Neandertal descendants.

    Instead, it looks like the Yoruba are the recipients of Neandertal genes, whether by means of ancient population structure or introgression and recent trans-Saharan gene flow. I personally think both factors are involved, but again their relative importance will be determined by comparing individual gene regions.

    In this vein, it is useful to outline the hypothesis of differential ILS within African samples. We now know from examination of genetic variation within Africa today that some of today's diversity can be traced to ancient population structure in Middle Pleistocene African populations. For example, Neandertals could be more closely related to some African populations than others today because Neandertals actually exchanged genes with some ancient African populations. Or Neandertals might have sprung from one African population among many who lived 250,000 years ago. If some of these ancient populations persisted and contributed genes to different present-day African populations, those populations would share different fractions of genes with a Neandertal genome.

    I expect we will learn a substantial amount about African population structure in the MSA by using these Neandertal-similar regions of the genome. It's like having a probe that can trace the movement of people across Africa more than 100,000 years ago. As we combine the archaic genome data with our growing picture of diverse lineages in Africa today, we may discover ancient populations that are not apparent archaeologically. Again, genetics is giving us a totally new picture of the diversity and population dynamics of ancient people.

    Next: Which Neandertal-derived variants are shared between regions, and which are unique to one region? I touched on this question last spring by using genotype data. Now, we have sequences capable of telling us much more.

    Synopsis: 
    Europe has a touch more Neandertal than East Asia; Tuscans have more than any other European sample
  • Denisova APOE status

    Fri, 2012-02-03 23:36 -- John Hawks

    I got thinking this evening about APOE, which includes a very well-known polymorphism of three alleles, where the most ancient (ApoE4) is associated significantly with Alzheimer's Disease risk in European population samples. The association is not significant in all genetic backgrounds, including African American population samples, so it's not necessarily a case where we could predict the phenotype of an ancient genome from observing the allele. But it is one of the most commonly known disease risk polymorphisms, and I hadn't happened to look it up to see what Neandertals and Denisovans are like.

    There are two constituent SNP loci -- rs429358 and rs7412. For both these loci, the Denisova genome data include one relevant read, together indicating the ApoE4 allele. The alignment quality of these reads is indicated as poor and I wouldn't take the result to the bank. A third locus, rs4420638 in the nearbyAPOC1 gene is typically linked to the APOE status in living people, and four Denisova reads indicate the allele that is today usually linked to ApoE4. The Neandertal data have no reads at all for the two key SNPs in APOE, and only a single read for the linked SNP in APOC1 is likewise the one usually linked to ApoE4.

    None of this is surprising, because ApoE4 is the more ancestral allele. Still, the other common alleles (ApoE2 and ApoE3) are relatively ancient as human polymorphisms go, and could very well have existed in populations contemporary to Neandertals and Denisovans, or in some individuals in those populations. But as it stands, the data suggest that the Denisova genome carried an ApoE4 allele.

  • A quick look at your Neandertal fraction

    Fri, 2011-12-16 15:13 -- John Hawks

    The 23andMe blog, the Spittoon, has a description of their new technique to use 23andMe SNPs to estimate any customer's fraction of Neandertal: "Find your inner Neanderthal".

    The result is a rough-and-ready numerical estimate of your Neandertal ancestry fraction. For me it's 2.5 percent. Gretchen is 3 percent, and she's been lording it over me all day.

    The estimate is the work of Eric Durand, who broke ground on the D-statistic method for finding introgression from archaic genomes [1]. He has made public a short white paper describing the application.

    So far, all estimates of Neandertal (or other archaic human) ancestry have come from the proportion of a genome (or genotypes from a genome) that are shared and derived with Neandertals. That includes the results I've been posting here for the 1000 Genomes Project samples this week.

    The next step is to uncover exactly which parts of a person's genome have come from Neandertal ancestors. To discover this, we have to further determine which shared alleles come from recent introgression as opposed to ancient incomplete lineage sorting. We have been working very hard on that problem here, as you'll see, and it has been an important aspect of our work in pigmentation genes in the archaic genomes.

    If you have been considering getting your genotypes from 23andMe, it has become a very good time to do this. The overall fraction of your DNA derived from Neandertals is only the beginning. Soon we'll be able to specify which parts, and in a few cases we'll have a good guess as to what difference it makes. If you want to participate in this research, I'm hoping to gather as many interested people as I can -- so keep your eyes here over the next month.

    And if you are interested in having your genotypes done, feel free to use my link to the 23andMe promotion. I've been very happy with their way of presenting the genotypes and their updates, and know many other people who have also found it interesting. As I wrote a couple of years ago, it's not something to spend your food money on, but it does have an entertainment value. And the potential to be an active research participant.


    References

  • Over coffee

    Fri, 2011-12-16 08:03 -- John Hawks

    G: Guess what Daddy and I learned last night? I'm more Neandertal than he is!

    S: How did you find that out?

    G: Our genes.

    S: That's creepy.

    G: What do you mean, creepy? We think it's awesome!

    S: Awesome... in a creepy way.

  • Mailbag: Did Neandertals have the derived MCPH1 allele?

    Thu, 2011-12-15 08:38 -- John Hawks

    Re: "Introgression and microcephalin FAQ"

    Hi Dr. Hawks,

    I just ran across your introgression and microcephalin FAQ on your blog, and I wanted to ask you one quick question. Now that we have a draft sequence of the Neanderthal genome, has anyone yet looked to confirm that one of the modern human microcephalin alleles was bestowed upon us by admixture with Neanderthals?

    Thanks in advance!

    Thanks for writing!

    Lari and colleagues published on this last year: 10.1371/journal.pone.0010648, [1] they didn't find the derived (presumed introgressed) allele in Monti Lessini 1. We have no sign of it in the Vindija genomes, either. So far, no sign of it. The other encouraging gene region was an inversion including the MAPT gene; this also has not yet been found in a Neandertal.

    So now we have tons of evidence of introgression, but none of the genes that we thought were strong cases before the ancient DNA. That doesn't rule out that we'll find these other cases in some ancient specimen, but in the meantime we're working on what we have.


    References

  • Mailbag: Neandertal derived SNP alleles

    Tue, 2011-12-13 09:48 -- John Hawks

    Re: Neandertal introgression, 1000 Genomes style:

    Long-time reader of your blog, non-paleo/anthro/genetics person, here. But please read on:

    Just a couple of brief questions.

    (i) It seems that it would make sense to look at pairwise comparisons (of shared derived Neanderthal SNP alleles) both within a population (e.g., Asians, or CEU) and between them, and build a histogram of how often they overlap.

    (ii) Then one could remove from the data set all such African shared SNPs - assuming that most of them are incomplete lineage sorting but that Africa had the initial superset of alleles before ooA (I know some are likely West Asian or European admixture, reducing the data set slightly more than necessary), and repeat (i) and similar diagnostics. Is the typical unmodified genome chunk length around such sites much longer than in (i) - can one date this? Can one now better quantify the actual admixture percentage outside of Africa?

    Wouldn't such a procedure give more insight about how Neanderthal introgression is distributed, when it occurred, and perhaps where it occurred?

    I am sure you are already working on similar ideas - just wanted to know if you agree that these may be low-hanging fruit to pursue.

    Thanks!

    Hi -- thanks for writing!

    I started with exactly the approach you describe, when we were working exclusively with SNP data in the spring. For example:

    http://johnhawks.net/weblog/reviews/neandertals/neandertal_dna/europe-ch...

    We were using linked haplotypes rather than single SNPs but the filtering process was the same.

    Now I am hopeful that we will have decent age estimates for the introgressing SNPs from a different technique. I would rather find these ages independently of filtering by geographic location, because having this information will greatly simplify testing models of ancient population dynamics. If we succeed at this, we will also have a test of selection based on the same allele ages.

    I am continuing to update and you'll see these results not long after we get them!

  • Neandertal introgression, 1000 Genomes style

    Sat, 2011-12-10 18:16 -- John Hawks

    For our project to understand pigmentation genetics in archaic humans, we had to find a good comparative sample of sequence data from recent humans. The original publication on the draft Neandertal genomes compared them to five low-coverage genomes from different Old World populations, along with the publicly available genomes from Craig Venter and others [1]. The first publication on the Denisova genome added an additional handful of genomes to these comparisons [2].

    Some of these handful of genomes from living people are more similar to the Neandertal and Denisova genomes than others. That simple fact is the proof that some living people have Neandertal and Denisovan ancestors.

    But until now, the comparison has been limited to a very small number of human genomes. That became a focus for critics of the Neandertal and Denisovan results. How could three or four genome sequences possibly provide an adequate representation of human variability? We could imagine scenarios in which the similarities between Neandertal and humans could be explained by some unsampled population, for example, northeast Africans [3]. Denisova does not present the same problem, because African population structure cannot possibly explain its resemblance to populations in Wallacea, Australia, and Oceania [2] [4]. But to compare either of these genomes, we should seek a broader sampling of genomes from living people.

    As I wrote yesterday, my students and I have been working to understand pigmentation genetics of the archaic human genomes ("Pigmentation of archaic humans: introduction"). I've emphasized the need to break the analysis into small steps. For this question, we need to examine whether the pattern of introgression around pigmentation genes is characteristic of the genome as a whole. If genes involved in pigmentation have systematically higher or lower levels of Neandertal ancestry, that will tell us a lot about the evolutionary history of pigmentation in recent and archaic humans. For this, we need a good comparative sample, and the 1000 Genomes Project provides the best sample available.

    The first step in assessing the pattern of introgression for pigmentation genes is to characterize the pattern of introgression across the whole genome.

    Yes, a whole-genome introgression analysis sounds awfully big for my "small steps" concept. But actually this is simpler than it might sound. Here's a teaser:

    The figures in this post are not from a whole-genome analysis; they include data from eight chromosomes that we prioritized because of our pigmentation analysis. I am licensing all of them under a Creative Commons ShareAlike license so that anyone can use them anywhere.

    UPDATE (2011-12-10): I finished the whole genome analysis and am updating this post and figures accordingly. The results are the same throughout, with the exception of the Europe-East Asia comparison, which now shows these populations to be significantly different across the genome as a whole. I have partially updated the figures and will finish these later today.

    The value of sequences

    The 1000 Genomes Project data have been updated several times in the last year, as both sequencing and analysis of the genomes have progressed (more information on 1000 Genomes Project website). We downloaded a release of SNP genotype calls from 1094 individuals, based on the low-coverage (average 4x) sequencing that has been carried out on the sample.

    A SNP (single nucleotide polymorphism) is a nucleotide site with at least two alleles present in the global human sample. These sites represent only one kind of genetic variation in today's populations. Many of the differences between people's genes are caused by insertions, duplications, deletions, transpositions, or inversions. But those kinds of polymorphisms can be challenging to study in low-coverage genomes, and we already understand quite a lot about SNPs in human populations from the earlier HapMap project [5] [6]. The HapMap provided the data underlying our 2007 paper on the acceleration of recent human evolution ("Why human evolution accelerated") [7].

    The drawback of earlier SNP variation projects is that they examined only a subset of SNP variation in a sample of people. To design a microchip that could provide a million or more SNP genotypes from a saliva sample, somebody first had to discover where in the genome SNPs could be found. So they took small samples of people, sometimes only a single person's two copies of the genome, and sequenced. Adding together SNPs found by several methods, they could get a representation of SNP variation across the whole genome in a population. But this process introduced a bias: the SNPs were ascertained in a sample that inevitably could not represent humans in other samples with the same accuracy. Initially, SNP samples were heavily biased toward people of European ancestry (upon whom most genetic work was originally done), and the HapMap project went to great efforts to increase the representation of other populations. But even with the best possible ascertainment, interpreting SNP variation requires us to jump through some theoretical hoops.

    Sequence data make life much easier for the population geneticist. Seriously, working on this stuff on the whiteboard is fun instead of a constant nightmare of sampling biases and spaces between markers. I have a bias myself, in that I find recombination hard to deal with. I love reticulation among populations, but I'd rather work with genealogies that look like proper trees instead of a liana-strewn mess. So looking at sequence data over short intervals makes me happy. Not as happy as beer aged in bourbon barrels, but happy.

    The 1000 Genomes Project SNP files represent every SNP mutation observed in the sample. In other words, these are sequence data, just with all the fixed (and therefore redundant) sites removed. Even so, these sequence data are not perfect. Low coverage means that some rare mutations in the sampled individuals will go unreported. We aren't typically interested in singleton mutations in the sample, except that missing them will introduce a bias upon our estimates of the time that common ancestors lived. Next-gen sequence reads are usually fairly riddled with errors. High coverage allows these errors to be removed with some confidence, but low-coverage genomes risk throwing out real SNPs along with the spurious ones. The publicly available files represent some analytical steps that we do not here control, so we have to work with the understanding that the data are not perfect.

    The 1000 Genomes SNP files have had a phasing algorithm applied to them, which attempts to assign genotypes to chromosomes. In essence, phasing tries to figure out whether adjacent SNP alleles belong to the same copy or to different copies of the same chromosome. The details of this phasing are not yet apparent, and for many reasons I am cautious about using phased data. The inference is often inaccurate for rare mutations, and the whole process tends to sneak assumptions about population history into the resulting dataset. I hate being forced to live with someone else's assumptions about human population history, and I typically try to avoid needing phased data. In this case, it looks like the data over short intervals are as accurate as they can be, given the limitations on coverage and sampling. We have moved forward by applying methods that make a bare minimum of assumptions.

    Counting derived SNP alleles

    David Reich and colleagues came up with an appealingly simple test of introgression, which they applied to both the Neandertal and Denisovan genomes. Eric Durand, Reich, Nick Patterson and Monty Slatkin described the method formally this year [8], which they call the D-statistic. Informally, this has become known as the ABBA-BABA test, after their labels for the discordant genealogies that the test compares. By and large, across the genome, humans living today share many more new mutations with each other than they do with an archaic human like a Neandertal. But sometimes two genomes are different from each other, and one of them shares a new mutation with the Neandertal.

    A human might share a mutation with a Neandertal because it actually isn't very new, and both inherited the mutation from some much more ancient population of humans. This scenario is called "incomplete lineage sorting", because humans today have multiple gene lineages that existed within some very ancient population, instead of these having been "sorted" cleanly into the different human and Neandertal populations. Incomplete lineage sorting does happen a lot between humans, Neandertals, and Denisovans. ILS is the normal mode of variation among recent human populations, who trace their genealogical histories back much further than the earliest "modern" humans. So if one human has a Neandertal allele, and another human has a different allele, it's probably no big deal. They both just inherited gene variants that already existed in our distant common ancestors.

    You can probably see already that if we had a way to estimate the age of an allele, we could tell whether incomplete lineage sorting is a credible explanation for any particular site. I'll leave that point for another post.

    In the meantime, if we pretend that we know nothing at all about the ages of alleles, we must find some other way to tell whether incomplete lineage sorting can explain Neandertal similarities. Reich and colleagues recognized that incomplete lineage sorting from ancient pre-Neandertal ancestors ought to be distributed equally among living people. If we look at every site in the genome where we have data from Neandertals, we should find that one living human genome should look like the Neandertal just as often as another.

    This insight led to their test. Take a pair of humans, count the number of times sequence A is like the Neandertal and sequence B is like a chimpanzee, and then do the inverse — B then A. ABBA-BABA.

    Why a chimpanzee? In most cases the chimpanzee allele will represent the ancestral state for humans. Living people can inherit ancestral alleles from Neandertals as well as derived ones, but the derived ones tend to be rarer and younger within human populations. If one living genome shares an ancestral allele with the Neandertal genome, we don't need incomplete lineage sorting or introgression to explain the pattern. For all we know, such a mutation originated after Neandertals were already gone. So we need to pay attention to the derived mutations, ones that are present in Neandertals but not in chimpanzees. Do a count of these across the genome, and if you find a living genome with significantly more than another, you've found evidence for introgression.

    Ed Green, David Reich and colleagues [1] [2] did a comparison of every possible pair of genomes in their modern human sample. These sequence data were gappy, so that sequence A might share different coverage with B than with sequence C. So it was necessary to consider each pair separately, counting all the sites where both human sequence and the Neandertal and chimpanzee sequences had data.

    The 1000 Genomes Project sample reports genotypes for every SNP for every sampled individual. So in principle, every pair of sequences should have data for every one of these sites. Again, we have to be cautious about the nature of the sequencing, attending to the possibility of systematic biases due to low coverage. But we really don't have to take the time-consuming step of comparing every possible pair of the 2188 resulting haploid genomes. We can just find the derived SNP alleles that are present in Neandertals and count how many of them are in each of the human sequences. If one sequence has significantly more Neandertal derived alleles than another, it had to get them somehow.

    That magic three percent

    The figure at the top of the post represents that count. Every individual in the 1000 Genomes Project dataset has two copies of the autosomal genome. Separating these two copies of the genome (basically arbitrarily) and counting up the shared derived features between each of those copies and the genome of Vindija 33.16, we obtain the histogram. Here it is again:

    The African genomes in the 1000 Genomes sample include Yoruba from Nigeria and Luhya from Kenya. The Asian populations sampled are Japanese and Chinese, including people of Han Chinese ethnicity in Beijing and southern China. The European ancestry samples include the CEU sample from Utah, as well as British, Tuscan, Spanish and Finn samples.

    The histogram shows that Asian and European genomes have significantly more Neandertal derived SNP alleles than do the African genomes. The averages for the Asian and European samples are around 3% higher than the average for the African samples. Whatever gave Africans some degree of similarity to Neandertals, non-Africans seem to have gotten around 3% more of it.

    Green and colleagues [1] assumed conservatively that Africans share derived SNP alleles with Neandertals only because of incomplete lineage sorting from the human-Neandertal ancestral population. This fraction should be the same in all human populations, under the assumption that Africans were mostly isolated from Neandertals for some period of time. The 3% Neandertal bonus outside Africa should then represent introgression from Neandertals into recent populations outside Africa.

    Both previous studies noted that genomes outside Africa are not significantly different in the fraction of derived SNP alleles shared with Neandertals. A genome from China and a genome from France carried the same fraction of shared derived SNP alleles with Neandertals. Here, we've confirmed that basic identity in the level of introgression in these populations.

    I have told several people now that I find the distributions in China and Europe spookily similar. On parts of the genome, the two distributions have means that are not significantly different. Indeed, I worked for a week with an analysis of eight chromosomes, in which the East Asian and European means were fewer than 100 SNP alleles apart. Even across the whole genome, Europeans average only 700 derived SNP alleles more than the East Asian sample. This small difference a bit more than a tenth of a percent) is strongly significant on these sample sizes. A t-test yields a p-value of 1.1 times 10-26 on the difference in means. Even so, the distributions of these two populations overlap across most of their ranges.

    Seeing these hundreds of genomes arrayed on a histogram provides much more information than we had from a handful of genomes. It is remarkable how much dispersion there is among genomes from a single population. Although the means of these two samples are nearly the same, you can see that each of them has a large range of variation in the shared derived SNP alleles with Neandertals. This variation means that people within a single population have very different proportions of Neandertal ancestry.

    This is not a graph of people, but a separation of the two copies of SNP alleles carried by these people. That separation is phased at short scales but arbitrary on the scale of a whole chromosome, so the histogram likely understates the variance among single genomes while it overestimates to some extent the variation among people with their diploid genomes. Still, it looks likely from these comparisons that some people in Europe carry more than a percent higher Neandertal ancestry than the average, and some carry a percent less. We can use statistical methods to test this hypothesis directly as applied to individuals in the sample.

    Neandertal genes in recently admixed populations

    A sample of hundreds of people allows us to demonstrate significant differences among the genomes of different populations. Some of the 1000 Genomes Project samples are from populations that represent historically recent admixture of people who trace their ancestry to different parts of the world.

    For example, the "ASW" population sample includes African-American people who live in the Southwest United States. We know from many other genetic studies that African-Americans vary in the fraction of ancestry they derive from Europeans and from Africans. The average amount of African and European ancestry varies among African-Americans who live in different parts of the U.S., as low as 3% and as high as 20% or more in some parts of the country. The proportion among individuals varies even more. So when we consider the ASW sample, we should expect to see a lot of variation in the number of shared derived SNP alleles with Neandertals, with a mean higher than African populations.

    Which is exactly what we do see:

    The ASW sample overlaps substantially with the Yoruba sample from West Africa (Nigeria) and slightly with the CEU sample, which includes people of European ancestry in Utah. The total in the ASW genomes is more variable than either the Yoruba or CEU population samples. If the higher mean in the ASW genomes reflects European ancestry from a population like CEU, the proportion of European ancestry would be around 17% for that sample of people. It would be hard to tell from these numbers alone how much of the variation in ASW is attributable to variation in ancestry fraction, and how much is expected within a population of homogeneous ancestry. As we'll see in some other populations, there are some appreciable differences among populations within a given region, and ancestry differences may add to the variation among individuals within populations.

    We see a similar pattern when we look at the Puerto Rican sample. Individuals in this sample have some ancestry from European, Native American and African ancestors. The comparisons by Reich and colleagues [2] and Green and colleagues [1] suggested that Native American populations have the same fraction of Neandertal ancestry as other people outside Africa. In the comparison with YRI and CEU samples, Puerto Rican (PUR) genomes are intermediate, with a mean suggesting around 15% ancestry from the West African population.

    The two outlier points in the Puerto Rican sample are the two genome copies from one individual, who we would hypothesize had much higher African ancestry than the average in the sample.

    Next...

    This post has taken me much longer than I expected to get to the point of talking about variation among samples within continental regions. It turns out that, despite the similarity of European and East Asian samples in their averages, there are substantial differences between samples within each of these regions.

    For example, here's a comparison of north and south Chinese samples:

    People of Han Chinese ethnicity sampled in Beijing appear to have on average a half percent more Neandertal ancestry than people of the same ethnicity sampled in southern China. I found these kinds of differences almost everywhere I looked within regions. More later...


    References

    1. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, et al. 2010. A Draft Sequence of the Neandertal Genome. Science [Internet] 328:710–722. Available from: http://dx.doi.org/10.1126/science.1188021
    2. Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PLF, et al. 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature [Internet] 468:1053–1060. Available from: http://dx.doi.org/10.1038/nature09710
    3. Hodgson JA, Bergey CM, and Disotell TR. 2010. Neandertal genome: the ins and outs of African genetic diversity. Current biology : CB 20:R517-9.
    4. Reich D, Patterson N, Kircher M, Delfin F, Nandineni MR, Pugach I, Ko AM-S, Ko Y-C, Jinam TA, Phipps ME, et al. 2011. Denisova admixture and the first modern human dispersals into southeast Asia and oceania. American journal of human genetics 89:516-28.
    5. The International HapMap Consortium. 2005. A Haplotype Map of the Human Genome. Nature [Internet] 437:1299–1320. Available from: http://dx.doi.org/10.1038/nature04226
    6. McVean G, Spencer CCA, and Chaix R. 2005. Perspectives on human genetic variation from the HapMap Project. PLoS genetics 1:e54.
    7. Hawks J, Wang ET, Cochran G, Harpending HC, and Moyzis RK. 2007. Recent acceleration of human adaptive evolution. Proceedings of the National Academy of Sciences, U. S. A. [Internet] 104:20753–20758. Available from: http://dx.doi.org/10.1073/pnas.0707650104
    8. Durand EY, Patterson N, Reich D, and Slatkin M. 2011. Testing for ancient admixture between closely related populations. Molecular biology and evolution [Internet]. Available from: http://dx.doi.org/10.1093/molbev/msr048
    Synopsis: 
    We're quantifying the amount of Neandertal ancestry in whole genome data from living people.
  • Mailbag: Neandertal-human comparisons

    Fri, 2011-12-09 21:38 -- John Hawks

    Re: Neandertal-human comparisons

    Your website states, "of those positions where the human genome differs from chimpanzees, Neandertals have the chimpanzee version around 12.7 percent of the time."

    Since the subject is the comparison with supposed MRCA of humans/chimps, shouldn't the correct statement be, "of those positions where the human genome differs from chimpanzees, Neandertals have the MRCA version around 12.7 percent of the time." ?

    Or therefore, "of those positions where the human genome differs from chimpanzees, Neandertals have the chimpanzee version around 6.35 percent of the time."

    If Neanderthals were something like 2 million base pairs closer to chimpanzee, shouldn't a few thousand of those base pairs be in at least a few modern Eurasians ?

    Hi, thanks for your question!

    Your point is correct that Neandertals do not have chimpanzee ancestors. If we were considering a comparison of all sites in the Neandertal sequence, you would be correct about the proportions. Neandertals would lack some proportion of the mutations that occurred on the modern human's lineage but they would lack every one of the mutations that happened on the chimpanzee lineage -- except for a very small fraction of parallelisms.

    However, the comparison carried out by Green and colleagues was not of the entire genome, but specifically those sites in the genome that underwent mutations on the human lineage. The mutations on the chimpanzee lineage from the MRCA are completely ignored by this comparison.

    The chimpanzee genome therefore stands in for the MRCA in this comparison. Sites at which both chimpanzees and humans have undergone parallel mutations have the potential to confound this comparison, because they are not counted (they are not places where the human and chimpanzee genomes differ). But the proportion of human substitutions that are also chimpanzee substitutions from the MRCA is very small, only around 1 percent of the human sites.

    The fraction of Neandertal ancestry of Eurasians is around 3 percent, this is calculated differently, by examining polymorphisms within human populations today and considering the fraction shared by different humans' genomes with Neandertals. Eurasian people have around 3 percent more similarity with Neandertals than present-day Africans.

  • When anthropological and geological facts collide

    Mon, 2011-11-28 01:56 -- John Hawks

    This passage is the first paragraph of the introduction to Franz Weidenreich's monograph, The Skull of Sinanthropus pekinensis [1].

    In my earlier contributions to the study of Early Man I pointed out repeatedly the danger of confusing anthropological facts with geological facts. In determining the character of a given fossil form and its special place in the line of human evolution, only its morphological features should be made the basis of decision; neither the location of the site where it was recovered nor the geological nature of the layer in which it was imbedded [sic] are important. Discrepancies cannot be smoothed out by bringing morphological facts and opposing geological data into closer harmony with artful interpretations or by touching-up reconstructions. It is a generally accepted conception that Man has developed in the course of time by gradual transformation from an ape-like type to the type he presents today. Viewed from this fundamental standpoint, it is logical to assume that the more a form resembles the supposed ancestor the more ancient it will be, or that the more ancient it is the more "primitive" it should be.

    I am concerned with this passage today because of a re-emerging mismatch of evidence from the morphology of Middle Pleistocene humans and the genetics of Neandertals. Some paleoanthropologists have asserted that Europeans of the Middle Pleistocene were the exclusive ancestors of Neandertals. I have in the past written that Middle Pleistocene Europeans were among the ancestors of Neandertals, with sustained gene flow from other populations including Africa [2]. The Sima de los Huesos people, maybe 600,000 years old, resembled the (much) later Neandertals in several aspects of their anatomy, as did other Middle Pleistocene Europeans.

    The genetic differences between living people and the ancient Neandertal genomes appear consistent with the emergence of distinct African and Neandertal populations only within the last 400,000 years or less [3], [4].

    Such a recent date seems a poor match for the morphological evidence of Neandertal ancestry in Europe. I can think of several ways to make these morphological and genetic comparisons concordant with each other, all of which balance some shift in one body of inference against the other. As long as we can't pin down the human mutation rate within a factor of two ("What is the human mutation rate?"), there's a lot of room to make different population models consistent with the genetic data.

    This is, in today's language, Weidenreich's point. Morphological data must be interpreted in accordance with evolutionary principles, and if it doesn't fit a temporal scheme, it doesn't fit. Likewise, genetic similarities must be explained in their own evolutionary framework. These two sources of evidence must in the end be consistent with a single history. We will find that consistency not by shoehorning the evidence together, but by interpreting each with the strongest possible skepticism concerning assumptions and models.

    Weidenreich's introduction illustrates two cases. The simpler, from our point of view today, was Piltdown. Many establishment anthropologists, particularly in Britain, had maintained that Piltdown was a morphologically advanced ancestor of modern humans, which had lived early in the geological record of human evolution. Weidenreich had been an early and prominent critic of this idea, because he was convinced that the specimen simply did not fit together with its supposed geological context.

    I cannot believe, even making very liberal allowances for these uncertainties, that such incongruity between morphology and chronology as is found in the case of Piltdown can be completely brought into accord. The only hope of solution in this case would lie in assuming that the human bones were not contemporaneous with the layer in whih they were found but were deposited there later. Otherwise, modern man must be much more ancient than we ever imagined, or else Western European man did not pass through evolutionary stages as did the hymans of other regions of the earth.

    We now know, of course, that Weidenreich was entirely correct. The apparent geological facts were false, and the "advanced" characters of the specimen were simple reflections of the fact that the skull is a modern human skull.

    The other problem Weidenreich discussed in some detail was the phylogenetic position of the Steinheim skull. Like Piltdown, this specimen had been placed in a Presapiens context by other workers. Steinheim lacks most of the derived characteristics of later Neandertal specimens. Weidenreich, along with many of his contemporaries, accepted its lack of Neandertal features as evidence for affinity with modern humans. In Weidenreich's view, this similarity with modern humans was "anachronistic". Even so, the case did not challenge an evolutionary interpretation, only the assumption that features could evolve from "primitive" to "modern" along a single line. If we admit that Neandertal features were not in all cases "primitive", even if they may resemble superficially the characteristics of some apes, we can accommodate specimens like Steinheim within a population model where both moderns and Neandertals may have derived (and in some cases, secondarily derived) characters that appeared afterward.

    This scenario requires us to straighten out the analysis of the characters themselves, a process for which larger fossil samples are essential. It was to that end that Weidenreich supposed the Sinanthropus sample to be of such great utility. The subtext of the introduction was to illuminate the kinds of evolutionary problems that could be further illuminated by a full description of fossil variation. Finding variation in fossil humans did not repudiate the concept that modern humans had evolved in stages from primitive ancestors, but helps to clarify cases where the evolution has not been a simple linear progression. In many cases, features that are superficially "primitive" may actually have been secondarily derived in recent humans compared to earlier hominins.

    Along similar lines, I ran across this old post: "Dobzhansky on Weidenreich's species concept", in which Dobzhansky predicts:

    Some modern populations may carry genes that were present in the Neanderthaloids, and other moderns may not carry such genes.


    References

    Synopsis: 
    Weidenreich's introduction to the Sinanthropus cranial monograph illuminates some issues I'm facing with ancient genomes.
  • How widespread is Denisovan ancestry today?

    Tue, 2011-11-01 00:32 -- John Hawks

    Last month, David Reich and colleagues [1] reported on estimates of Denisovan ancestry for island and mainland Asian populations. Their most memorable conclusion was that they could find no substantial sign of Denisovan ancestry anywhere on the Asian mainland, or indeed on any island that had ever been connected by land to Asia.

    The distribution was stark, as illustrated by the map from the paper:

    I wrote about the paper when it was released ("Denisovan DNA in the islands, and an Australian genome"), noting:

    Notice the apparent lack of Denisovan ancestry in anyone who lives anywhere that was once connected by land with mainland Asia. I say "apparent" deliberately: Abi-Rached and colleagues reported last month on the widespread distribution of Denisovan HLA types among today's Asian populations, and those may well be products of Denisovan genes that were later selected. I've already identified a handful of other loci that seem to reflect Denisovan ancestry in mainland Asian people. According to the comparisons by Reich and colleagues, such loci must be exceptions.

    Abi-Rached and colleagues [2] had argued that HLA alleles found in the Denisovan genome are presently common in some parts of Asia, and likely reflect local adaptive introgression. Substantial introgression of a small number of genes would not be enough to create a strong genome-wide appearance of Denisovan ancestry. Still, it was a little odd that the first genes anybody looked closely at would provide strong evidence of introgression.

    Now, Pontus Skoglund and Mattias Jakobsson [3] say that Denisovan ancestry is widespread across China and Southeast Asia.

    That conclusion contradicts Reich and colleagues, so why do the studies come to such different results?

    Skoglund and Jakobsson suggest that they have succeeded in finding introgression where others failed because their model accounts for ascertainment bias in the available datasets. SNP data come from genotyping chips, which have been designed using known polymorphisms. Five years ago, we knew much more about polymorphisms in Europe than other parts of the world, and so the HGDP, and HapMap to a lesser extent, do a good job of sampling rare alleles in Europe but miss many rare alleles in Africa and other populations. This is the ascertainment bias.

    Some of the most obvious signs of introgression today are cases where rare alleles are shared with an archaic genome. If ascertainment bias causes you to miss the rare alleles, you'll miss the introgression.

    But that explanation isn't really sufficient to explain the differences between these papers. For one thing, Reich and colleagues [1] also worked hard to account for ascertainment biases in their SNP samples. For another, whole genome comparisons between East Asian samples and the Denisova genome have not yielded evidence of Denisovan ancestry, even though whole genomes have no ascertainment bias. The number of whole genomes so far compared is very small, and so the statistical ability to detect introgression is lower, but Skoglund and Jakobsson actually replicate that null result in their current paper.

    Probably most important, it's not clear that Skoglund and Jakobsson's result can actually be explained by rare alleles. Here is Figure 1e from their paper:

    Figure 1e from Skoglund and Jakobsson (2011). Original caption: Interpolated spatial distribution of the frequency of Denisova alleles at SNPs where Denisova is different from chimpanzee and Neandertal. Sample localities are indicated with rectangles.

    This map represents a clever comparison. It is a heat map of the mean local frequency of the subset of alleles that are present in Denisova but absent from chimpanzees and Neandertals. These are presumptively derived alleles relative to the chimpanzee. The SNPs here are all known to vary in human populations, because they are all included in the HGDP sample. So the map does not represent all the Denisova derived mutations in humans today, only a particular subset that is especially likely to be informative.

    Given that the sites have been picked in a special way, we need to examine carefully how strong the pattern really is. Notice the scale of the heat map. The difference between the orange area in south China, from the green area in north China, is around 0.001, or a tenth of a percent in mean frequency. The actual values are reported in the online supplement, in Table S3. An exception of Yizu in south China who have around 0.006 more than their neighbors. The Yizu sample includes only 10 individuals (9 males, 1 female). The paper does not report the number of SNPs included in this comparison, but it must be a very small set relative to the total, because only a small fraction of human SNPs are known to be derived in Denisova and ancestral in Neandertals.

    With this very small difference in frequencies, I would not rule out the hypothesis that the zone of high Denisova derived frequencies in south China is caused entirely by frequency enrichment of a small number of loci. A handful of genes like the HLA loci observed by Abi-Rached and colleagues might be enough to create this very slight elevation in the average. Hence, the best case is that the data here simply provide greater sensitivity to small amounts of introgression. The worst case is that the pattern may be dominated by the Yizu sample, which is really too small to carry this kind of load.

    The strongest evidence presented in the paper is a comparison of north and south East Asian regions directly. Although the comparison of south China against other regions of the world (Africa, Europe) does not yield significant evidence of Denisovan similarity in this paper, south China differs from north China in essentially the same way that the Oceanian people do from other regions. And the Oceanian populations (here, Papua New Guinea and Bougainville) differ from other regions because of their Denisovan ancestry. So Skoglund and Jakobsson infer that the north/south comparison reflects Denisovan ancestry as well.

    I think this comparison is sound, and the question is, how much introgression would this pattern require? The paper answers that question in this way:

    Quantitative estimation of the precise fraction of Denisova-related ancestry in Southeast Asian populations based on genotype data are unfortunately sensitive to ascertainment bias and genetic drift, and such estimates will require genome sequence data that are currently unavailable. However, both the PCA results (Fig. 1B) and the approximately six times lower absolute values of the D statistic in tests between Northeast Asians and Southeast Asians compared with tests between Northeast Asians and Oceanians (Table S4) indicate a relatively low fraction of Denisova-related ancestry. Thus, the fraction is likely to be smaller than both the ~5% fraction of Denisova-related ancestry present in Oceanians and the ~2.5% fraction of Neandertal ancestry present in non-Africans (23, 24), perhaps around 1%.

    One percent is an amount that whole genome comparisons at present do not rule out, and I think it's a reasonable guess. I would not have thought we could rule out a one percent contribution from other, non-Denisovan archaic people, for example.

    We aren't very far from a more definitive answer of this question, as the data continue to accumulate every day. What I find interesting is the way that models can generate these 1% differences in ancestry proportions, depending on sampling and the pattern of migration assumed to have happened in the past. Two estimates that differ by less than a percent are not really different. This paper provides the suggestion of a more widespread Denisovan legacy, and I accept that as a possibility.

    I should mention: less than one percent of a half billion people is still a very large number, added to five percent of the indigenous population of New Guinea and Australia, and smaller fractions of other island populations. The total amount of Denisovan legacy present in living people probably exceeds the population of Earth at the time the Denisovans lived.


    References

    Synopsis: 
    A new paper contradicts earlier work, by suggesting a widespead Denisovan legacy in south China

Pages

Subscribe to Neandertal DNA

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.