john hawks weblog

paleoanthropology, genetics and evolution

ancient DNA

  • How widespread is Denisovan ancestry today?

    Tue, 2011-11-01 00:32 -- John Hawks

    Last month, David Reich and colleagues [1] reported on estimates of Denisovan ancestry for island and mainland Asian populations. Their most memorable conclusion was that they could find no substantial sign of Denisovan ancestry anywhere on the Asian mainland, or indeed on any island that had ever been connected by land to Asia.

    The distribution was stark, as illustrated by the map from the paper:

    I wrote about the paper when it was released ("Denisovan DNA in the islands, and an Australian genome"), noting:

    Notice the apparent lack of Denisovan ancestry in anyone who lives anywhere that was once connected by land with mainland Asia. I say "apparent" deliberately: Abi-Rached and colleagues reported last month on the widespread distribution of Denisovan HLA types among today's Asian populations, and those may well be products of Denisovan genes that were later selected. I've already identified a handful of other loci that seem to reflect Denisovan ancestry in mainland Asian people. According to the comparisons by Reich and colleagues, such loci must be exceptions.

    Abi-Rached and colleagues [2] had argued that HLA alleles found in the Denisovan genome are presently common in some parts of Asia, and likely reflect local adaptive introgression. Substantial introgression of a small number of genes would not be enough to create a strong genome-wide appearance of Denisovan ancestry. Still, it was a little odd that the first genes anybody looked closely at would provide strong evidence of introgression.

    Now, Pontus Skoglund and Mattias Jakobsson [3] say that Denisovan ancestry is widespread across China and Southeast Asia.

    That conclusion contradicts Reich and colleagues, so why do the studies come to such different results?

    Skoglund and Jakobsson suggest that they have succeeded in finding introgression where others failed because their model accounts for ascertainment bias in the available datasets. SNP data come from genotyping chips, which have been designed using known polymorphisms. Five years ago, we knew much more about polymorphisms in Europe than other parts of the world, and so the HGDP, and HapMap to a lesser extent, do a good job of sampling rare alleles in Europe but miss many rare alleles in Africa and other populations. This is the ascertainment bias.

    Some of the most obvious signs of introgression today are cases where rare alleles are shared with an archaic genome. If ascertainment bias causes you to miss the rare alleles, you'll miss the introgression.

    But that explanation isn't really sufficient to explain the differences between these papers. For one thing, Reich and colleagues [1] also worked hard to account for ascertainment biases in their SNP samples. For another, whole genome comparisons between East Asian samples and the Denisova genome have not yielded evidence of Denisovan ancestry, even though whole genomes have no ascertainment bias. The number of whole genomes so far compared is very small, and so the statistical ability to detect introgression is lower, but Skoglund and Jakobsson actually replicate that null result in their current paper.

    Probably most important, it's not clear that Skoglund and Jakobsson's result can actually be explained by rare alleles. Here is Figure 1e from their paper:

    Figure 1e from Skoglund and Jakobsson (2011). Original caption: Interpolated spatial distribution of the frequency of Denisova alleles at SNPs where Denisova is different from chimpanzee and Neandertal. Sample localities are indicated with rectangles.

    This map represents a clever comparison. It is a heat map of the mean local frequency of the subset of alleles that are present in Denisova but absent from chimpanzees and Neandertals. These are presumptively derived alleles relative to the chimpanzee. The SNPs here are all known to vary in human populations, because they are all included in the HGDP sample. So the map does not represent all the Denisova derived mutations in humans today, only a particular subset that is especially likely to be informative.

    Given that the sites have been picked in a special way, we need to examine carefully how strong the pattern really is. Notice the scale of the heat map. The difference between the orange area in south China, from the green area in north China, is around 0.001, or a tenth of a percent in mean frequency. The actual values are reported in the online supplement, in Table S3. An exception of Yizu in south China who have around 0.006 more than their neighbors. The Yizu sample includes only 10 individuals (9 males, 1 female). The paper does not report the number of SNPs included in this comparison, but it must be a very small set relative to the total, because only a small fraction of human SNPs are known to be derived in Denisova and ancestral in Neandertals.

    With this very small difference in frequencies, I would not rule out the hypothesis that the zone of high Denisova derived frequencies in south China is caused entirely by frequency enrichment of a small number of loci. A handful of genes like the HLA loci observed by Abi-Rached and colleagues might be enough to create this very slight elevation in the average. Hence, the best case is that the data here simply provide greater sensitivity to small amounts of introgression. The worst case is that the pattern may be dominated by the Yizu sample, which is really too small to carry this kind of load.

    The strongest evidence presented in the paper is a comparison of north and south East Asian regions directly. Although the comparison of south China against other regions of the world (Africa, Europe) does not yield significant evidence of Denisovan similarity in this paper, south China differs from north China in essentially the same way that the Oceanian people do from other regions. And the Oceanian populations (here, Papua New Guinea and Bougainville) differ from other regions because of their Denisovan ancestry. So Skoglund and Jakobsson infer that the north/south comparison reflects Denisovan ancestry as well.

    I think this comparison is sound, and the question is, how much introgression would this pattern require? The paper answers that question in this way:

    Quantitative estimation of the precise fraction of Denisova-related ancestry in Southeast Asian populations based on genotype data are unfortunately sensitive to ascertainment bias and genetic drift, and such estimates will require genome sequence data that are currently unavailable. However, both the PCA results (Fig. 1B) and the approximately six times lower absolute values of the D statistic in tests between Northeast Asians and Southeast Asians compared with tests between Northeast Asians and Oceanians (Table S4) indicate a relatively low fraction of Denisova-related ancestry. Thus, the fraction is likely to be smaller than both the ~5% fraction of Denisova-related ancestry present in Oceanians and the ~2.5% fraction of Neandertal ancestry present in non-Africans (23, 24), perhaps around 1%.

    One percent is an amount that whole genome comparisons at present do not rule out, and I think it's a reasonable guess. I would not have thought we could rule out a one percent contribution from other, non-Denisovan archaic people, for example.

    We aren't very far from a more definitive answer of this question, as the data continue to accumulate every day. What I find interesting is the way that models can generate these 1% differences in ancestry proportions, depending on sampling and the pattern of migration assumed to have happened in the past. Two estimates that differ by less than a percent are not really different. This paper provides the suggestion of a more widespread Denisovan legacy, and I accept that as a possibility.

    I should mention: less than one percent of a half billion people is still a very large number, added to five percent of the indigenous population of New Guinea and Australia, and smaller fractions of other island populations. The total amount of Denisovan legacy present in living people probably exceeds the population of Earth at the time the Denisovans lived.


    References

    Synopsis: 
    A new paper contradicts earlier work, by suggesting a widespead Denisovan legacy in south China
  • Ancient genomes review

    Thu, 2011-08-18 12:21 -- John Hawks

    Mark Stoneking and Johannes Krause present a review article in the current Nature Reviews Genetics [1] that gives an overview of the science of ancient genomes.

    I think the article is very good about presenting aspects of ancient genome sequencing and assembly, and the attendant problems and biases. I find myself explaining this stuff a lot and it's useful to have the concise descriptions that Stoneking and Krause provide here. For example, here's a paragraph that describes mapping bias:

    However, there are important limitations to current approaches to ancient genome assembly owing to the short length of ancient DNA fragments and the repetitive nature of large parts of mammalian genomes (which creates ambiguities in sequence read mapping). For example, short fragments can cause mapping bias, as highly divergent short fragments cannot be accurately mapped to a reference genome. Fragments may also map to different locations in different reference genomes depending on the completeness and accuracy of the reference genomes. For example, to calculate divergence times between an ancient hominin genome sequence, modern humans and chimpanzees, it is important to first verify that the ancient DNA sequences map to orthologous positions in both the human and chimpanzee genomes. These issues mean that even at 20-fold coverage (which was the coverage obtained for the Saqqaq genome) not more than 85% of the genome could be reconstructed; full genome sequences from fossil samples can probably never be achieved with current methods.

    The article discusses chemical changes in ancient genomes, methods to detect contamination, and specialized methods such as targeted DNA hybridization capture.

    I'm less happy with the second half of the article, which discusses population genetics. A few computational techniques are very briefly described (for example, unsupervised versus model-based approaches) and Stoneking and Krause give quick synopses of some population genetic inferences reported during the last year.

    I guess where I perceive a difference between the first (sequencing) and second (population genetics) parts of the article, is that the sequencing part emphasizes the many problems with analysis and describes approaches to overcome them. It seems as if there's a vibrant discussion of sequencing and biochemistry, giving rise to a fuller account. Meanwhile, the second part, discussing human population history, seems to accept results relatively uncritically. There is very little citation of anthropological or archaeological work, and little indication that the methods of population genetic inference may have weaknesses or assumptions that color their results.

    It's great to see review articles on this topic, given the broad interest I expect we'll see more of them soon. A flood of ancient genetic data means a lot of new results that need to be summarized. But a summary is really not enough -- we need critical examination of the assumptions underlying population genetic inferences and a discussion of how they accord with what we know from archaeology and paleontology.


    References

    Synopsis: 
    A new review article by Mark Stoneking and Johannes Krause presents some useful information.
  • Did Denisovans have genetic adaptations to high altitude?

    Tue, 2011-06-21 12:26 -- John Hawks

    We don't really know the extent of territory that might have been occupied by the population represented by the Denisova genome. The signs of mixture into the Melanesian/New Guinea population suggests that the Denisova individual shared many genes with people who lived somewhere along the South or Southeast Asian coast. Denisova itself, however, is in the Altai Mountains.

    Last week I wrote some thoughts about the possible introgression of HLA alleles from Denisovans into more recent populations. HLA genes pose many problems for testing this hypothesis -- including the difficulty of identifying the alleles in a low-coverage genome and the high chance of incomplete lineage sorting of ancient alleles in recent populations. Other parts of the genome in principle may be much easier to find evidence of introgression.

    If an allele that originated in Denisovans had some advantage in later populations, it might today be found very widely spread across Asian populations, even if the amount of Denisovan ancestry in most of these populations is very small. This was the theme of my paper with Gregory Cochran several years ago [1] ("The inevitability of introgression"). The probability that a single copy of an advantageous allele will survive and increase in the population is roughly 2s, where s is the fitness advantage in a heterozygote carrying the allele. A relatively small number of copies of an allele might have entered a recent human population by introgression from some ancient population, but these few copies would have a high likelihood of surviving and increasing in frequency, possibly toward fixation. HLA alleles could easily be in this category, but the challenges identifying them and high chance of ILS make the hypothesis hard to test.

    Another strategy is to identify genes that have been selected in recent populations and see if the linked haplotype shows up in the Denisova genome. Recently, several studies have attempted to identify genes related to high altitude adaptation in Tibetans. At least some Denisovans lived in the mountainous areas of central Asia, and so I'm curious whether they might have some alleles adapted to this environment. The Altai are not nearly as high as the Tibetan plateau (in fact Denisova itself is not much higher than western Kansas), and we don't know how long Denisovan people might have been resident in Central Asia, but if we're looking for selected alleles there are some strong candidates in this category of genes.

    So let's look at some of them. All positions here are mapped to the hg18 human genome assembly.

    Yi and colleagues [2] find a strong frequency difference between China and Tibet for a SNP in EPAS1, at chr2:46441523. The derived allele, G, has a frequency of 87% in their Tibetan sample but only 9% in their Chinese sample (and zero in Denmark). The Denisova genome is represented by two reads at this site, both C, the ancestral allele. We don't necessarily have to accept that this is a functional site, but as the marker most strongly differentiating the high altitude population it would likely be closely linked to any functional variant. So the Denisova allele suggests that this ancient individual lacked whatever functional variant might currently be common in Tibetans for this gene.

    Simonson and colleagues [3] took a different approach, focusing on candidate genes that they argued a priori were likely to be involved in adaptation to hypoxia because of their physiological role. They evaluated these genes for evidence of positive selection in Tibetans, finding several candidate haplotypes for recent adaptive evolution to high altitude.

    For each of five genes, they identified a three-locus "core selection haplotype" that shows signs of selection within Tibet. The purpose of these three-SNP haplotypes was to examine the correlation of haplotypes and phenotypes in a sample of people where physiological data were taken. So they are intended as tags, not as comprehensive and unique identifiers of the candidates at the genetic level. But the three-locus haplotypes are the only ones reported in the supplement to the paper, so that's what I have to compare.

    EGLN1: The three-allele candidate selected haplotype consists of A at chr1:229793717, T at chr1:229667980 and T at chr1:229665156. Denisova apparently has the selected haplotype with A at chr1:229793717 (2/2 reads), T at chr1:229667980 (3/3 reads) and T at chr1:229665156 (1/1 reads). However, it is not obvious whether this is significant. All three alleles on the candidate selected haplotype are the ancestral (present in chimpanzees and gorillas) alleles, which are much more likely to show up in the archaic genomes than derived alleles. These ancestral alleles are also present in several of the whole genomes provided along with the Denisova sequence reads. So it's not clear to me how good a candidate for selection the haplotype really is.

    CYP17A1: Here the three-allele candidate selected haplotype includes G at chr10:104568521, G at chr10:104594906, and C at chr10:104517420. Denisova has C (5/5 reads, ancestral), T (4/4 reads, ancestral), and C (3/3 reads, ancestral). Again, Denisova has the all-ancestral haplotype here, but in this case it is not the selection candidate.

    PTEN: The selected candidate haplotype is G at chr10:89770364, C at chr10:89790851 and C at chr10:89778618. Denisova has G (5/5 reads, ancestral), T (2/2 reads, derived), and C (4/4 reads, ancestral). Not selected.

    I always find it interesting when the Denisova genome has a derived allele at an interesting site -- it is the shared derived alleles between these archaic genomes and living people that constitute evidence of genetic persistence of the archaic people. No single site carries that information (any one allele may be shared by incomplete lineage sorting), but I still like to note them. The Papuan and half the Native American, Sardinian and Mongolian reads share the derived T at chr10:89790851 with Denisova.

    HMOX2: The candidate selected haplotype has C at chr16:4456093, T at chr16:4465266, T at chr16:4442515. Denisova has this candidate selected haplotype: C (3/3 reads, ancestral), T (4/4 reads, ancestral), T (5/5 reads, ancestral). That haplotype may also be in the Cambodian whole genome accompanying the Denisova data, and can't be ruled out for the Mongolian. Again, the all-ancestral haplotype and wider distribution argue against the hypothesis that this haplotype was specifically selected in Tibet.

    PPARA: The core candidate selected haplotype has A at chr22:44827140, C at chr22:44832376 and T at chr22:44842095. Denisova has A (8/8 reads, ancestral), A (5/5 reads, ancestral), and C (2/2 reads, ancestral). Notice again, Denisova has the all-ancestral haplotype. As an ancient sequence, we are finding this is the usual case, human-derived alleles are just rarer in this genome.

    OK, where are we? Out of six genes that are candidates for selection on altitude adaptation in Tibetans, the Denisova genome has two -- at ELGN1 and HMOX2. In both cases, the core selected haplotype consists entirely of ancestral alleles, and so I think they are actually poor evidence of introgression on the surface. I would test them by looking at more SNPs linked to the presumed selected haplotype, hoping to find some derived SNPs shared by the Denisovan genome and the presumed selected haplotypes. Unfortunately, publications do not yet routinely report long haplotypes, so it will take some more digging to test these cases.


    References

    Synopsis: 
    Noodling through the Denisova genome data for signs of candidate altitude adaptations.
  • A problem of fuzzy mammoths

    Sat, 2011-06-04 03:56 -- John Hawks

    Paleogenomics is changing the way we study evolution. In a number of cases, it now allows us to study extinct organisms with the same methods as we study living ones. A study last year in PLoS Biology[1] used genetic evidence from living elephants, extinct mammoths and mastodons, to reconstruct the times that these species diverged.

    Woolly and Columbian mammoths

    Mammoths are back in the news this week because of a paper by Jacob Enk and colleagues [2]. I think this paper represents a very nice collaboration of paleontologists (Dan Fisher, Ross MacPhee) and paleogeneticists (led by Hendrik Poinar's lab). It's refreshing to read a paper that describes not only the way that the DNA was sampled but also the age and morphological attributes of the sampled mammoths. For example:

    This 60+ year old bull is exceptionally well preserved, and exhibits the classic character suite of his species, including low molar lamellar frequency (Figure S1 in Additional file 3), broadly divergent tusk alveoli, a markedly downturned mandibular symphysis, and tremendous body size. We used tusk fragments for the shotgun sequencing, and both tusk and bone samples for PCR and Sanger sequencing.

    Every genetics paper should have descriptions like that. Very nicely done.

    As an anthropologist, I pay a lot of attention to studies of elephants, because they are another long-lived social mammal, in some ways closer to us in population structure and dynamics than most primates. As in the case of hominins, some taxonomists have argued that we should recognize lots of fossil elephants, others question that distinctiveness. And just as we are discovering for hominins, the elephants are showing evidence for population mixture among groups once considered to be different species.

    Enk and colleagues sampled the mtDNA from two Columbian mammoths and one woolly mammoth from North America. The Columbian mammoth is seen by pretty much everybody as a separate species (Mammuthus columbi) from woolly mammoths (Mammuthus primigenius), and paleontologists have thought that they diverged 1-2 million years ago. Woolly mammoths were Holarctic animals, with a range that extended from Europe to North America, while Columbian mammoths were limited to the Americas south of the U.S.-Canada border, roughly. Already other researchers have recovered dozens of woolly mammoth sequences, and their phylogenetic relations are well characterized (as shown in the paper). What Enk and colleagues show is that the two Columbian mammoths both have mtDNA sequences that belong to a single, relatively young clade that is present in woolly mammoths in Alaska and Yukon.

    The simplest explanation is that the Columbian and woolly mammoths of North America were exchanging genes.

    The authors also suggest the possibility of incomplete lineage sorting (ILS) -- the retention of a single ancestral clade in two isolated species. This seems unlikely given the topology of the clade within woolly mammoths, but the authors omitted the crucial test: the date of the most recent common ancestor of the mtDNA within the clade. If it's truly younger than a million years, we might easily rule out ILS.

    Forest and savanna elephants

    A lot more information about the variation within living elephantids has appeared within the past year. Looking at them compared to the fossil species, it's pretty clear that taxonomists haven't done well matching taxonomic levels in these groups. Here is a quote from the paper by Rohland and colleagues, who considered the genetic relationships of forest and savanna elephants in Africa.

    We also find that savanna and forest elephants, which some have argued are the same species, are as or more divergent in the nuclear genome as mammoths and Asian elephants, which are considered to be distinct genera, thus resolving a long-standing debate about the appropriate taxonomic classification of the African elephants.

    Forest and savanna elephants may deserve a species rank, but we might equally say that the mammoth-Asian elephant divergence doesn't merit the genus rank it has historically been given. As reconstructed in the paper, the forest-savanna elephant and Asian elephant-mammoth divergences both fall within ranges from 2.5 to 5.5 million years. Some widely-recognized mammalian genera (e.g., Homo) are younger, but most mammalian divergences in this range of times are recognized below the genus rank. Should mammoths be put into Elephas? That would probably be a better recognition of the adaptive radiation of Eurasian elephants.

    One way to consider the question is by examining the pattern of speciation. With a large number of sampled loci, a far more detailed consideration of speciation can be achieved. This brings us back to a more careful examination of ILS.

    We find a higher rate of inferred [Incomplete Lineage Sorting (ILS)] in forest and savanna elephants than in Asian elephants and mammoths: (FE+SE)/(AL+ML) = 3.1 (P = 4×10−8 for exceeding unity; Table 2), indicating that there are more lineages where savanna and forest elephants are unrelated back to the African-Eurasian speciation than is the case for Asian elephants and mammoths (Table 2). This could reflect a history in which the savanna-forest population divergence time TFS is older than the Asian-mammoth divergence time TAM, a larger population size ancestral to the African than to the Eurasian elephants, or a long period of gene flow between two incipient taxa. (We use upper case “T” to indicate population divergence time and lower case “t” to indicate average genetic divergence time (t≥T)).

    "A long period of gene flow" would reflect a very gradual speciation event, which might argue that the two resultant species should be classified in the same genus. Or...it might suggest that the ecological differentiation actually commenced much earlier in time than the modal estimate, with later hybridization. Mammoths and Asian elephants, by contrast, seem to have a cleaner separation even though the genetic relationships are almost equally close.

    We're not quite able to test these alternatives, yet, because only a single individual has been sampled from most of these species. Testing for gene flow really will require larger samples of individuals. In particular, the longer geographic distance between Asian and mammoth samples compared to forest-savanna samples may mean that population structure is hiding within this comparison. I just find it remarkable that genetics has arrived at a point where the pattern of speciation of extinct species is within reach.

    The paper uses the extinct mammoth and mastodon comparisons as a frame for discussing the diversity and distinctiveness of African forest elephants. This is in a way unfortunate, because the mammoth-centric questions are probably more interesting to most readers. There's still a lot of productive biology to do there. But the status of forest elephants is a useful hook to hang a paper upon. Whether forest elephants should be given the status of a species has been a hot topic in proboscidean evolutionary biology during the past 10 years. Debruyne [3] gave a good historical review of the issues:

    Indeed, when discovered by Matschie in 1900, [forest elephants] were described as either a potential species, or a regional race of Cameroon (Matschie, 1900). Matschie advocated the usefulness of hydrographical basins in order to subdivide African elephants into distinct units. He thus contributed to the profusion of new taxa to be defined by the turn of the 20th century, so that the taxonomy of the African elephant quickly became extravagant, the most meagre morphological evidence being used to acknowledge a new form (Lyddeker, 1907). Up to 22 forms of Loxodonta were described that were finally assigned either to the savannah or the forest elephant—see Laursen and Bekoff (1978) for a review. Morphologists have addressed this question for decades according to their personal taxonomic perspectives. Some have considered that, although displaying a smaller size, smaller round ears—responsible for their designation as “cyclotis”—more toenail structures on both feet, thin down-pointing tusks and a flatter back and forehead, forest elephants belong to the same species—i.e., Loxodonta africana—as savannah elephants with whom they assumed were reproductively compatible (Backhaus, 1958; Carroll, 1988; Cousins, 1996). Many cases of intermediate morphology have supported this view, which had become prevalent (Laursen and Bekoff, 1978). Conversely, the “splitter” attitude led other authors to put forest elephants apart on the basis of the same anatomical distinctiveness (Frade, 1931; Frade, 1933; Allen, 1936; Petter, 1958). More doubtful morphological characters—extent of hair-covering, color of the skin, carriage of head—have been put forward to support this division.

    The problem became complicated upon recovery of genetic information. Most early phylogeography has been done using mtDNA. The deepest mtDNA clade in the African elephants defines two haplogroups, both of which are shared by the forest and savanna populations. Based on large samples of mtDNA alone, the two populations have been recently exchanging genes.

    Early analyses of nuclear microsatellites indicated the opposite pattern, with relatively little allele sharing between the two elephant varieties. I became interested in the question after a paper by Régis Debruyne (a coauthor on the current paper by Enk and colleagues as well). Debruyne emphasized the great gaps in our sampling of geographic variation in African savanna elephants. Providing some additional data, he showed a very deep mtDNA clade in many forest elephants that was also in many savanna elephants. He argued that the widespread evidence of gene flow refutes the hypothesis of different biological species of elephants.

    Rohland and colleagues also addressed the discordance between mtDNA and nuclear genetic variation.

    Our study also infers a strikingly deep population divergence time between forest and savanna elephant, supporting morphological and genetic studies that have classified forest and savanna elephants as distinct species [13],[16]–. The finding of deep nuclear divergence is important in light of findings from mtDNA, which indicate that the F-haplogroup is shared between some forest and savanna elephants, implying a common maternal ancestor within the last half million years [21]. The incongruent patterns between the nuclear genome and mtDNA (“cytonuclear dissociation”) have been hypothesized to be related to the matrilocal behavior of elephantids, whereby males disperse from core social groups (“herds”) but females do not [13],[38]. If forest elephant female herds experienced repeated waves of migration from dominant savanna bulls, displacing more and more of the nuclear gene pool in each wave, this could explain why today there are some savanna herds that have mtDNA that is characteristic of forest elephants but little or no trace of forest DNA in the nuclear genome [13],[14],[39],[40].

    The scenario may fit with the facts. It was proposed first by Roca and colleagues [4], who proposed it as a "genomic record of ancient habitat changes", which had brought the forest and savanna populations into contact across shifting hybrid zones. They reiterated the hypothesis in a later paper [5] supported with larger samples.

    Further progress will require larger samples and better models. I was interested in Debruyn's account of the geographic holes in genetic sampling across the African range of forest elephants. A highly-resolved test of recent gene flow demands finding and sampling potential contact zones between two populations. Some hypotheses can be tested surprisingly strongly using only a single individual from each population. But the power of such tests depends on the pattern of inbreeding in the past. We can imagine that the ancestry of a single individual stretches through the genealogical network of a species like a cone, widening into the past. Recent events are poorly tested by single individuals.

    If geographic structure is strong enough, distant populations will approximate different species in their recent genealogical connections. So the single individuals in the more recent study by Rohland and colleagues [1] carry a lot of weight.

    There are many parallels here between hominin population dynamics and the elephants. Also, as I pointed out in 2006, the elephant situation helps to clarify how we should consider genetic samples from living great apes.

    The past year has seen a real reversal in the race between data and analysis. For a long time, sequencing has been a bottleneck in serious analysis of population history. The genealogical connections among individuals ramify by double in every generation, so that the inheritance of a single gene reflects one possibility among countless trillions. If we can only afford to sequence a single gene, we are limited to a single sample of the genealogical links among individuals. Whole genomes give enormous samples of the genealogical history among samples. But they create their own challenges of analysis.


    References

    Synopsis: 
    Mammoth paleogenomics and African elephant population structure pose similar problems of sampling.
  • Mummy trouble redux

    Thu, 2011-04-28 22:56 -- John Hawks

    Speaking of Jo Marchant, she has a long article in the current Nature about the mummy DNA controversy ("Ancient DNA: Curse of the Pharoah's DNA").

    I wrote about the problem earlier this year: "Mummy troubles". My opinion is that this work has been relentlessly hyped and hasn't presented adequate information to assess whether the results are genuine:

    Can we accurately type STR alleles from mummies? I wouldn't rule it out given the quantity of tissue available, but there should be many more controls for a high-profile study like this one. The work took place over several years, so it's a bit unrealistic to expect the latest sequencing methods. But JAMA and the Discovery Channel presented the results as important science. They should have ensured that solid answers for the obvious questions were at hand.

    Marchant digs up some quotes from the authors:

    The researchers deny that the television involvement put them under excessive pressure to produce dramatic results. But working for the cameras did make a challenging project even tougher, says Pusch. "Each time they came in to film, we had to close the lab for a week to clean." Eventually the TV crew was banished and the lab scenes reconstructed.

    The article gives an interesting sociology of the competing groups of ancient DNA researchers. I dispute that the field is evenly divided, however. There are a very long list of laboratories doing ancient DNA work according to standardized protocols on skeletal remains from the past several thousand years. Only a few groups claim to be working with nuclear DNA or microbial DNA, the areas of contention in the mummies. Among that small set of labs, most follow similar, conservative techniques.

    Then there are the handful that come up with "surprising" results time and again. If the surprising results are accompanied by substantial evidence, I have no problem. But when a paper has no clear explanation why it arrives at results that others think impossible, that raises my skepticism.

  • Mummy troubles

    Sun, 2011-01-23 01:32 -- John Hawks

    Mummies are always trouble. I hate to say it. You see, in my line of work we can do an awful lot with a skeleton. We're usually down to a few pieces of bone, so that a skeleton is an unimaginable luxury.

    The typical mummified body carries so much more information than a skeleton. I mean, you've got soft tissue there, whole organs. Food left in the mummy's tummy. With Egyptian mummies, you had a whole crew of embalmers using special techniques to preserve the body. They could not possibly have done more to give us time capsules of human biology from the dawn of history.

    So why does it seem like every study of a mummy ends up in a fight?

    I think that mummies give too much to chew on. With a bone, it's sort of likely that you only have one indicator of pathology. One symptom makes for a pretty simple diagnostic problem. Sure, you're likely to be wrong, but with one symptom where's the argument?

    Now, a whole body -- well, there you'll probably have several symptoms. Or you'll have things you would expect to see with a pathology, but they're just not there. So every armchair paleopathologist ends up with his own theory about what the mummy's got.

    The mummies in the news this week are thought to be Akhenaten, Tutankhamen, and their relatives. Last year, Zahi Hawass and colleagues [1] published a paper in the Journal of the American Medical Association, reporting on their Discovery Channel-funded research on these mummies. They ran a series of tests to assess the paleopathology of these mummies, including some work demonstrating the presence of falciparum malaria. They also extracted DNA from the mummies and constructed a pedigree connecting them based on shared microsatellite alleles.

    I wouldn't ordinarily write about mummies. They're really not my thing. If there were a Neandertal mummy, well, I'd be all over that. Ain't gonna happen.

    But if you follow ancient DNA, that last detail probably gave you a bit of a hiccup. Can we really amplify STR alleles from mummies with any accuracy?

    Well, that's why the story is in the news this week. For example, Jo Marchant in New Scientist writes "Royal rumpus over King Tutankhamun's ancestry", quoting geneticists who question the results. Eline Lorenzen and Eske Willerslev wrote a letter to JAMA pointing out the literature on the topic [2]. There are just so many problems with contamination and DNA degradation, even if you have a large tissue sample to work with. The idea that you could extract DNA and do straight-up PCR amplification to identify microsatellite alleles seems, well, optimistic.

    The geneticists involved in the study, Albert Zink and Carsten Pusch, defend their approach in a published reply, as well as in the New Scientist piece.

    I'm skeptical. In 2000, Pusch was involved in a study that claimed to extract DNA from Neandertal and early modern human remains, testing their similarity by means of Southern hybridization [3]. That's an even simpler technique, and the published result surprised a lot of people. Cooper and Poinar [4] immediately criticized the study for lacking the proper controls. Shortly afterward, Geigl [5] challenged the result by demonstrating the strength of results could not have emerged among closely-related primate species and likely reflected the presence of soil microorganisms. Considering what we now know about the low endogenous DNA content of ancient skeletal remains, DNA-DNA hybridization just couldn't possibly have gotten any result that wasn't noise.

    That's the kind of problem that emerges regularly with ancient DNA studies. When someone is taking an approach outside of the ordinary, they'd better document extremely well their attempts to quantify contamination and present many different approaches to validate their results. At a minimum it is very surprising that mtDNA sequence data were not available with the initial results. The lack of adequate documentation in the Hawass study is why a controversy is arising now.

    Can we accurately type STR alleles from mummies? I wouldn't rule it out given the quantity of tissue available, but there should be many more controls for a high-profile study like this one. The work took place over several years, so it's a bit unrealistic to expect the latest sequencing methods. But JAMA and the Discovery Channel presented the results as important science. They should have ensured that solid answers for the obvious questions were at hand.


    References

    1. Hawass Z, Gad YZ, Ismail S, Khairat R, Fathalla D, Hasan N, Ahmed A, Elleithy H, Ball M, Gaballah F, et al. 2010. Ancestry and Pathology in King Tutankhamun's Family. JAMA: The Journal of the American Medical Association [Internet] 303:638–647. Available from: http://dx.doi.org/10.1001/jama.2010.121
    2. Lorenzen ED, and Willerslev E. 2010. King Tutankhamun's Family and Demise. JAMA: The Journal of the American Medical Association [Internet] 303:2471. Available from: http://dx.doi.org/10.1001/jama.2010.818
    3. Scholz M, Bachmann L, Nicholson G, Bachmann J, Giddings I, Ruschoffthale B, Czarnetzki A, and Pusch C. 2000. Genomic Differentiation of Neanderthals and Anatomically Modern Man Allows a Fossil–DNA-Based Classification of Morphologically Indistinguishable Hominid Bones. The American Journal of Human Genetics [Internet] 66:1927–1932. Available from: http://dx.doi.org/10.1086/302949
    4. Cooper A, and Poinar HN. 2000. Ancient DNA: do it right or not at all. Science (New York, N.Y.) [Internet] 289. Available from: http://view.ncbi.nlm.nih.gov/pubmed/10970224
    5. Geigl EM. 2001. Inadequate use of molecular hybridization to analyze DNA in Neanderthal fossils. American journal of human genetics [Internet] 68:287–291. Available from: http://dx.doi.org/10.1086/316948
  • Hobbit DNA hunt

    Wed, 2011-01-05 19:30 -- John Hawks

    Every so often, a reader asks me if I know any new rumors about DNA sampling of "Homo floresiensis". I'm not holding out much hope for success given the tropical location and past failure, but with new technology, who knows? In Nature News, Cheryl Jones tells us that the University of Adelaide's Centre for Ancient DNA is set to try again: "Researchers to drill for hobbit history".

    I mentioned yesterday that dental cementum is packed with calcified epithelial cells, among other things ("Tartar control and Neandertal plant use"). The presence of this organic material in calculus has led to some recent success with ancient DNA recovery:

    Most genetics research on ancient teeth has focused on the inner tooth tissue, dentine, but Adler's team found that cementum, the coating of the root, was a richer source of DNA.

    Drilling is a technique commonly used to sample teeth and bone, because it minimizes damage to the precious specimen. But Adler's team found that the heat generated at standard drill speeds of more than 1,000 revolutions per minute (RPM) destroys DNA rapidly, causing yields to be up to 30 times lower than for samples pulverized in a mill. Reducing the drill speed to 100 RPM alleviated the problem.

    I hope they have some luck, the results will surely be interesting no matter what they may be.

    Jones is an author of The Bone Readers: Science and Politics in Human Origins Research.

    (via Dienekes)

  • The Denisova mtDNA sequence: The X-Woman

    Wed, 2010-03-24 13:12 -- John Hawks

    In this week's copy of Nature, Johannes Krause and colleagues [1] report on the complete mitochondrial sequence of a pinky bone from Denisova Cave, in the Altai Mountains of Siberia.

    You might expect this sequence would look like a Neandertal. After all, two other specimens from a little further to the West have both produced mitochondrial sequences very similar to those of Neandertals from Europe.

    But you would be wrong. This sequence turns out to be a surprise.

    Instead of falling within the Neandertal clade, the sequence in this pinky bone lies as an outgroup to Neandertals and as an outgroup to modern humans.

    Assuming an average divergence of human and chimpanzee mtDNAs of 6 million years ago, the date of the most recent common mtDNA ancestor shared by the Denisova hominin, Neanderthals and modern humans is approximately one million years ago (mean = 1,040,900 years ago; 779,300–1,313,500 years ago, 95% highest posterior density (HPD)), or twice as deep as the most recent common mtDNA ancestor of modern humans and Neanderthals (mean = 465,700 years ago; 321,200–618,000 years ago, 95% HPD) (Fig. 3). Although the absolute dates depend on several assumptions and are subject to uncertainty (Supplementary Information), the fact that the divergence of the Denisova hominin mtDNA is about twice as old as the divergence of Neanderthal and modern human mtDNAs is robust to most assumptions (Krause et al. 2010: 2).

    If you are sharp-eyed, you may notice that mean value from the Neandertal-human comparison, at 465,700 years ago, is rather substantially lower than has previously been reported -- Green and colleagues [2] put this divergence at 660,000 years ago. Including the new Denisova specimen into the comparison provides a much more recent branch point than the human-chimpanzee divergence date. That means some of the ambiguity in the long branch between the chimpanzees and the human-Neandertal ancestor can be resolved, effectively pushing the Neandertal a little bit closer to us.

    As you might have guessed from the paper's title, the authors interpret the deep divergence of the new Denisova sequence as evidence of a previously unknown, "genetically distinct" lineage of hominins. I want to be very precise about what they say and don't say, because it is a very short paper. Nowhere in the paper do they use the word "species". But in the conclusion, they do discuss lineages and "forms".

    We note that the stratigraphy and indirect dates indicate that this individual lived between 30,000 and 50,000 years ago. At a similar time individuals carrying Neanderthal mtDNA were present less than 100 km away from Denisova Cave in the Altai Mountains, whereas the presence of an Upper Palaeolithic industry at some sites, such as Kara-Bom and Denisova, has been taken as evidence for the appearance of anatomically modern humans in the Altai before 40,000 years ago. Although these dates are associated with large and unknown errors, this temporal concurrence suggests that complete and successive replacements of distinct hominin forms, similar to what occurred in Western Europe, may not have taken place in southern Siberia. Rather, representatives of three genetically distinct hominin lineages may all have been present in this region at about the same time. Thus, the presence of Homo floresiensis in Indonesia about 17,000 years ago and of the Denisova mtDNA lineage in southern Siberia about 40,000 years ago suggest that multiple Late Pleistocene hominin lineages coexisted for long periods of time in Eurasia.

    The mention of Homo floresiensis in this conclusion seems unlikely to be an accident, particularly in Nature, the hobbits' birthplace. I haven't seen any press coverage of this yet, obviously, as I'm writing before the embargo breaks. But I can only imagine the likely spin: just as Homo floresiensis has demonstrated the diversity of archaeologically recent hominins in Asia, this new mitochondrial sequence adds even more to that diversity.

    One of my long-time correspondents is already calling it "the Yeti".

    Is this a new species?

    As my students have heard me say many, many times, gene trees are not species trees. The different genetic loci within a population have diverse genealogies. Often, when two populations diverge from each other, their gene genealogies will show similar patterns of divergence. But not always.

    When we look within a single population, gene genealogies are likewise diverse. but within a single population, there is no population divergence. There must be an oldest branch point in the genealogy of any single gene. Here's a question: how many individuals do you have to sample so that you are sure you will find this deepest branch point? The answer to that question depends on the frequencies of the lineages on either side of that branch. If one of them happens to be rare, you're unlikely to find it unless you sample lots and lots of individuals.

    And if the population is spread across a substantial amount of space, it is very likely that one of the clades will be geographically limited compared to the other.

    Put these two things together, and apply them to a widespread population like the Neandertals. It is pretty likely that if we sample a dozen Neandertals across a subset of their range, that we will miss the deepest divergence in the genealogy of a single gene. That may be what has happened here. By extending the known mitochondrial sample of Neandertals even further to the east, this study may have discovered a deeper branch point than was previously known within the Neandertal population.

    Indeed, a million-year-old clade divergence would be entirely normal for a large mammal. That's what we see in chimpanzees, and as I pointed out yesterday, it's smaller than the clade divergence we see among mammoth mtDNA across a similar time range and geographic extent.

    I think the mammoth paper makes a really nice comparison to this one. In that case, they discovered a deep clade divergence in an ancient population, one branch of which was geographically restricted within a part of northern Siberia. They didn't conclude that multiple species of mammoths had been sampled -- despite the fact that one mtDNA lineage significantly outlasted the other. That was variation within one geographically diverse species, consistent with what we know about other species' mtDNA variation.

    So it is unnecessary to posit the existence of an unknown species of hominins in southern Siberia, based on the mitochondrial evidence alone. Whether we're talking about an unexpected diversity of forms -- well, I want to see something other than a pinky bone.

    Does it add to our understanding of Neandertal phylogeography?

    Well, first we need to know if it's a Neandertal. We don't. It's a pinky bone.

    But if it were a Neandertal, then the appearance of a deep clade at the very eastern extent of the population's range might suggest something about its diversification. The western Neandertals in that scenario have relatively restricted diversity, as if they had descended from a recent mtDNA ancestor. That pattern would be consistent with a range expansion from the east to the west. So maybe the Late Pleistocene Neandertals invaded Europe from elsewhere?

    Could this be Homo erectus?

    Of course, at the very furthest eastern extreme of the Neandertal range, we might well be running out of Neandertals and running into another kind of hominin. Even as recently as 40,000 years ago, it is not entirely obvious who those hominins would have been. The archaeological transition is nowhere near as clear in the east as in Europe, and even in Europe the archaeological transition to Upper Paleolithic industries is not the same as the biological transition.

    Before 100,000 years ago, the humans in China could plausibly be assigned to Homo erectus. It seems likely that much, if not most, of the genetic heritage of the pre-40,000 year population of China would have been derived from these ancient Chinese hominins. It is unknown how much genetic exchange there would have been between east and west at this time. I suspect that there were substantial genetic exchanges, both along the southern coast of Asia and across Central Asia. So China might well provide an alternative geographical origin for this mitochondrial clade.

    If we look to China as the ancestor population for this mitochondrial sequence, we can ask whether the roughly million year divergence date makes sense. As a marker of populations, a single gene can inform us about the maximum time of population divergence, not the minimum. The minimum is in effect zero: in other words, a million-year-old divergence genetically could occur within a single human population. So a widespread human population across much of Asia could contain such a deep branch, just as Neandertal's -- equally widespread across West an and Central Asia -- could have contained such a branch.

    But a million-year-old divergence does tell us one thing: this cannot represent a Homo erectus population that originated in Africa 2 million years ago, colonized Asia around the time of Dmanisi, and was isolated after that time.

    In other words, it would argue strongly against the hypothesis of a deep divergence of eastern and western hominin species, starting with the initial dispersal of humans from Africa in the Early Pleistocene. It argues in favor of continued genetic exchanges or a more complex history of population movements.

    I hesitate to take this line of reasoning too far. It's a pinky bone.

    Could this be a modern human?

    Even though the date of the cave could be as recent as 30,000 years ago, it is very unlikely that this mitochondrial sequence would have occurred within the growing population of "modern" humans. A growing population is relatively unlikely to lose mitochondrial variants. An ancient clade like this one, which survived in the population for a million years, might have been just at the edge of extinction at the time the population started to grow and therefore might just have missed its opportunity to survive. But it seems sort of unlikely.

    Do they know more than they are letting on?

    In the back of my mind I'm thinking this: if Krause's team has done enough sequencing to do the entire mitochondrial genome, they surely already know something about what the nuclear genome looks like. The increasing success of DNA recovery from these very fragmentary fossils has been stunning over the last several years. It is incredible that we are likely to recover a substantial amount of autosomal sequence from the distal phalanx of a (did I mention?) pinky. A quick comparison against raw data, without much systematic analysis, would be enough to check the mtDNA result.

    I wonder if this is only the first shoe, and there is another left to drop? These guys know as well as I do the gene trees are not species trees, and that such an obvious point that -- even though this is Nature we're talking about -- the reviewers should have caught it.

    So maybe there are already hints that the autosomal comparison will fall in the same direction as the mitochondrial comparison with Neandertals: different from them, different from us.

    Maybe it's a Yeti after all.

    UPDATE (2010-03-24): Man, the press is worse than I imagined. Nature's news article goes fully with the "new species" interpretation -- even though the paper itself does not include the word "species" -- and every other outlet I've seen is following suit.

    I have to teach my class this afternoon where we'll be talking about this mtDNA sequence, so I don't have time for a longer update. Let me say very clearly: nothing about this sequence requires there to have been an undiscovered hominin species.

    UPDATE (2010-08-10): References updated.


    References

  • Mailbag: Haplogroups of Peruvian mummies

    Sat, 2010-02-20 20:31 -- John Hawks

    Now that we have looked at the DNA of the Tarim Basin mummies, when is somebody going to do the same for the mummies found at Paracasa, Peru? I know that anyone who is interested in them is considered a crank or a racist, but dammit--they do look very Caucasian. The hair is not just just light colored, but very fine and wavy in texture. The funerary masks sometimes have blue-colored stones embedded in them to represent the eyes.

    If they do turn out to be Caucasian, it could be the biggest story in anthropology in a century. They could be a remnant population of our paleolithic ancestors if the Folsom/Solutrean hypothesis is true. Or if they are more recent arrivals, they could show some affinities for some still extant population. Greeks, Romans, wandering Irishmen? Who knows? I don't have any axe to grind in this, I just want to know where such unusual looking people came from.

    There has been some ancient DNA work on ancient Paracas culture mummies, Dienekes wrote about this a little bit last year:

    http://dienekes.blogspot.com/2009/07/mtdna-from-pre-columbian-peru.html

    ..and I found a few more references. There are none but the usual South American mtDNA haplogroups, but that leaves quite a bit of uncertainty about the relationships of the ancient and living populations, which apparently differ substantially in frequency. The same is true in Europe between Neolithic and recent samples. Whole-genome sequencing will be very interesting, not least because the South Americans should have different recent selection histories compared to Old World populations.

  • A genome from the mid-Holocene of Greenland

    Sun, 2010-02-14 15:40 -- John Hawks

    I was really busy meeting awesome people and making new friends in Georgia this week. So, although I got to read and think about the Greenland ancient genome paper, I didn't have a lot of time to sit down and write my thoughts.

    If you want a good, non-pay article, I think Alan Boyle has pulled together the essential details as well as some interesting sidelights.

    This is really excellent work, in terms of technical achievement and application. Clearly Willerslev and his team was dealing with an exceptional case in terms of preservation, and it won't be easy to duplicate this kind of sequencing in other ancient sites, certainly not for the next few years.

    But...

    Somebody ought to take torches and pitchforks over to NIH and find out why we can't get better coverage of U.S. population diversity. If the Danes can afford to pull a genome out of a 4000-year-old frozen skeleton with 20X coverage, why are we still stuck with a couple of white guys? I mean seriously. There are nine "complete" genomes, and two of them are from ancient skeletons of one kind or another? Awesome for anthropologists. For medicine, not so much.

    Yes, yes, I know. Thousand genomes, they're coming. But they won't represent the diversity of U.S. resident populations, much less other parts of the world. It's one thing to say that there's a region of the world for which our knowledge of ancient genetics may be better than our knowledge of the genetics of living populations. But here, that region is the entire Western Hemisphere.

    OK, enough about that. What's interesting about this paper?

    It isn't the functional SNPs. These are kind of a "show and tell" -- like, oooh, if we have the genome, we must know that he had dry earwax. True, but pretty trivial. These are all pretty standard variants common in Asians and New World peoples, and are therefore more a confirmatory negative. If they'd found some recently selected European haplotype, that would be a problem. These data have a lot of promise for studying ancient functional variants, but confirming the presence of high-likelihood known variants is only the first step.

    It seems to me there's something curious about this whole paper. "Ancient human genome sequence of an extinct Palaeo-Eskimo." Why "extinct"? We don't usually call ancient peoples extinct, even if they belonged to extinct cultures. In this case, there was a clear culture replacement in Greenland, and previous mtDNA evidence has shown that there was at least a partial biological replacement. The term is used in the text of the paper only in the context of an "extinct culture", so maybe the title is just extending that use.

    Or maybe it's something in Greenland or Danish politics that I don't understand. If the same headline were applied to Kennewick, for example, it would be interpreted as a political statement.

    I think the most interesting part of the paper is the "Author Information" paragraph at the end:

    Sequences have been deposited to the short read archive with accession number SRA010102; summary data are also available via http://www.ancientgenome.dk. Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests. This paper is distributed under the terms of the Creative Commons Attribution-Non-Commercial-Share-Alike license, and is freely available to all readers at www.nature.com/nature. Correspondence and requests for materials should be addressed to E.W. (ewillerslev@snm.ku.dk) or J.W. (wangj@genomics.org.cn).

    A Nature paper that's open access. I could kiss them full on the lips for that one. The genome is also available by open access, of course. The equal collaboration between Willerslev's and Wang's labs is newsworthy, I'd say. This is all stuff that nobody's pointed out, at least not that I've seen.

    What can we learn from this 4000-year-old genome? I have a few ideas. Data are rarely perfect for testing hypotheses. In this case, the fact of population replacement after this ancient culture was already expected based on other ancient DNA work. The individual is not perfectly placed to test hypotheses about the origin of Arctic or New World populations, but yet is clearly relevant -- these people might have shared Beringian ancestors, or closely related Asian ones.

    There are a few other things that strike me as obvious tests, so I'll keep quiet about them until we've managed to do them. It's not the iceman I was expecting, but very interesting nonetheless.

    References:

    Rasmussen M and lots and lots of others. 2010. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463:757-762. doi:10.1038/nature08835

Pages

Subscribe to ancient DNA

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.