john hawks weblog

paleoanthropology, genetics and evolution

genetic divergence

  • More on the mutation rate

    Mon, 2011-07-18 04:43 -- John Hawks

    I've received several questions over the last few weeks about human genome-wide mutation rates. Some people are noticing heterogeneity in mutation rate estimates among family trios (spurred by a recent paper from the 1000 Genomes Project) while others are asking about apparent contradictions between estimates from pedigree-based methods and those based on phylogenetic comparisons with other primates (see, for instance, Dienekes' discussion of the recent paper by Li and Durbin [1]).

    I wrote a very extensive and referenced post last fall about this issue, and I just want to bring it to people's attention: "What is the human mutation rate?"

    The 1000 Genomes Project has adopted the low per-generation mutation rate that has been coming out of the family trio comparisons. This low rate is around 1.2e-8 per site per generation as opposed to the estimate of around 2.4e-8 per site per generation that was often used prior to last year. Several new or upcoming papers will use the lower rate as applied to comparisons in humans or other hominoids.

    I'll just point out two conclusions I arrived at last fall:

    1. The 1000 Genomes comparisons are not very strong evidence in favor of a low rate. There is too much error in the sequences, and the means of filtering errors may affect the rate estimation. Much stronger evidence comes from pedigree-based comparisons of de novo Mendelian diseases, which encompass tens of thousands of mutational events instead of a few dozen. These also suggest a low rate -- in particular Michael Lynch's work from early 2010 [2]. This work also demonstrates that different sequence contexts give rise to different effective rates of mutation.

    2. The higher rate based on phylogenetic comparisons was always based on circular reasoning. People applied a rate that would fit the observed sequence differences to some paleontological event. Logically, the fossil appearance of an extant lineage puts a minimum time on the divergence of that lineage from others; but geneticists typically assumed that this was the expected time of sequence divergence, not the minimum possible time of species divergence. These two dates may easily differ by a factor of two, given the quality of the hominoid fossil record. Sequence divergence must always precede speciation, and speciation must always precede the earliest fossil occurrence of a lineage. The paleontological dates were then often bootstrapped from estimated mutation rates. The famous "6 million year human-chimpanzee divergence" was always based on these faulty assumptions -- that we knew with exactitude the human-orang or human-macaque sequence divergence time, and that the sequence divergence time between humans and chimpanzees was identical to the speciation time of the two lineages.

    I've had several conversations with people about this issue during the past year. Some of them take it very seriously, others don't. Myself, I see that the lower rate simplifies many problems with the fossil record and comparisons of archaic genomes, but creates some others. For this reason, I'm cautious about it.


    References

    1. Li H, and Durbin R. 2011. Inference of human population history from individual whole-genome sequences. Nature [Internet] 475:493–496. Available from: http://dx.doi.org/10.1038/nature10231
    2. Lynch M. 2010. Rate, molecular spectrum, and consequences of human mutation. Proceedings of the National Academy of Sciences [Internet] 107:961–968. Available from: http://dx.doi.org/10.1073/pnas.0912629107
    Synopsis: 
    The 1000 Genomes Project is on the verge of demonstrating a lower mutation rate in humans. Should we believe it?
  • What is the human mutation rate?

    Thu, 2010-11-04 01:33 -- John Hawks

    Last spring I wrote about a study that used whole-genome comparisons between parents and offspring to estimate the rate of per-genome mutation in humans ("A low human mutation rate may throw everything out of whack").

    The study was by Jared Roach and colleagues [1], and as you might guess from my post title, the result was surprising. Previous work had suggested a human mutation rate around 2.5 x 10-8 per site per generation. The new study found less than half the expected number of mutations between these parents and offspring, an estimated rate of only 1.1 x 10-8 per site.

    If this lower rate of mutation were to hold up, it would affect much of our understanding of the chronology of human evolution. Fossils and archaeological sites would not change in date, but some hypotheses about their relationships would be challenged. For example, the higher rate of 2.5 x 10-8 per site suggests a chimpanzee-human population divergence around 4 million years ago. A new rate of 1.1 x 10-8 would not have a linear effect on this divergence time -- the genes don't have genealogical roots at the same instant as the population divergence. But the human-chimpanzee divergence time would be radically higher than in many recent estimates.

    The same might be true for other primate divergences, and for genealogical relations within human populations today. Basically any times that are estimated from genetic differences may be affected by our knowledge of the per-generation rate of mutations.

    What does this mean? Open below the fold to read more.

    What mutations are we counting?

    Human genomes differ from each other in many ways. There are single base-pair changes in sequences, insertions and deletions, repeat polymorphisms, and larger-scale rearrangements such as inversions and gene duplications. Recent work suggests that some of these larger-scale effects may be very important to phenotypic variation among people. So why should we be talking about only the first of these kinds of variation?

    Single nucleotide mutations have been the focus of most attention about mutation rates because they are relatively easy and quantify. In high-quality sequence data, a single nucleotide change is relatively unambiguous. Reversals are fairly unlikely, although at a small fraction of "hotspot" sites, recurrent mutations can make a big difference.

    It is somewhat misleading to refer to "a" rate of single nucleotide mutations, because some kinds of sites (e.g., CpG nucleotides) have had a much higher probability of mutations than others. This affects the apparent rate of mutations in noncoding versus synonymous sites [2]. Also, the germline in males has been estimated to be as much as 6 times more likely to suffer mutations than the germline in females (discussed by Crow [3]). The idea of a genome-wide rate assumes that when we bin all the single nucleotide mutations together, across large amounts of sequence, we do arrive at a relatively stable rate that can be applied to similarly broad extents of sequence data. Or at least that we can identify sequence regions with compatible rates (e.g., noncoding DNA or synonymous sites).

    At the moment, technical issues make it hard to find and quantify many other kinds of variation. The current generation of sequencing devices tend to generate short reads, which make it difficult to assess the presence of insertions or deletions of more than a few base pairs. Duplications and other rearrangements require special treatment such as higher coverage or longer sequence reads. By contrast, a single nucleotide mutation will typically align in the proper location and be quite evident in a read. In principle, we can just run down the genome and count them.

    Still, finding novel mutations is not without its problems. Recent sequencing projects have yielded a very high rate of false positives. The rate of false negatives is really not yet known. We have a good reason to suspect that the false negative rate will be high. In a low-coverage genome, many short segments of the genome will have very low read numbers, making it likely that the sequence reads represent only one of the two copies of the genome present at that location. Any novel mutations in that area have a 50-50 chance of being missed by our sequencing efforts. This false negative risk can be reduced by adding higher sequence coverage, but we're not yet at the point where we have a lot of genomes sequenced at the 10x or higher coverage that we would really want.

    So while sequencing a parent and offspring genome is the most direct way to estimate the per-generation mutation rate, it is not yet ideal.

    Where did the high rate come from?

    That means we need to look very closely at other sources of data, to see if they may provide some independent confirmation of a lower per-generation mutation rate. In the process, we should ask, why did the higher rate, around 2.5 x 10-8 per generation, become so widely accepted?

    The source cited by Roach and colleagues for the higher rate, 2.5 x 10-8 per site, is a paper by Michael Nachman and Susan Crowell [4]. Nachman and Crowell examined processed pseudogenes in humans and chimpanzees, under the assumption that mutations in these pseudogenes would be neutral to selection in the human and chimpanzee lineages.

    The average mutation rate was calculated from the average autosomal rate of evolution assuming a generation time of 20 years (Table 3). Recent estimates of the time since humans and chimpanzees diverged (T) include 4.5 mya (TAKAHATA and SATTA 1997 ), 5.5 mya (KUMAR and HEDGES 1998 ), and 6.0 mya (GOODMAN et al. 1998 ). ARNASON et al. 1998 estimated the Homo-Pan divergence at 10–13 mya; however, their estimate is based on a calibration using distant, nonprimate species and is at odds with most other recent estimates. Mutation rates were calculated for a range of different human-chimpanzee divergence times and for two different ancestral population sizes. Mutation rate estimates vary from 1.3 x 10-8 (assuming T = 6 mya and Ne = 105) to 2.7 x 10-8 (assuming T = 4.5 mya and Ne = 104). If the average generation time is assumed to be 25 years (e.g., EYRE-WALKER and KEIGHTLEY 1999 ), then mutation rates are estimated to be between 1.6 x 10-8 and 3.4 x 10-8.

    Wait a minute. There's no independent estimate of mutation rate here at all!

    What they did was to assume values for the human-chimpanzee divergence and ancestral (chuman) effective size, and then provide an estimate of mutation rate consistent with those assumptions. That's perfectly reasonable as a way of quantifying the genetic divergence that they observed. If our goal is to predict the per-generation mutation rate from interspecific divergence, that's more or less the kind of estimate that we want.

    But many, many other studies have instead used a citation to the Nachman and Crowell rate as a justification for their own estimates of the human-chimpanzee divergence time! That's not perfectly reasonable, in fact, it's perfectly circular. It's turtles all the way down!

    Worse, those citations tend to cite the midpoint of Nachman and Crowell's range of estimates (2.5 x 10-8) as if it were a true value measured with little error. Reading the original reference, you can plainly see that Nachman and Crowell reported estimates that varied over a factor of three, corresponding to a wide range of chuman population histories. From their discussion:

    Mutation rates estimated for a range of divergence times and ancestral population sizes fall between 1.3 x 10-8 and 2.7 x 10-8 assuming a generation time of 20 years (Table 3) or between 1.6 x 10-8 and 3.4 x 10-8 assuming a generation time of 25 years. We suggest that 2.5 x 10-8 is a reasonable estimate of the average mutation rate per nucleotide site (but caution that the actual rate may be between 1.3 x 10-8 and 3.4 x 10-8).

    That 2.5 x 10-8 is simply the midpoint of their range of estimates with the 25-year generation time.

    What would be more reasonable? For hominins and chimpanzees, we probably want to apply a shorter generation length, a larger ancestral effective size, and a higher time of divergence. All of these would have yielded a lower rate for the Nachman and Crowell data. But we don't want to just assume these values, we should try to test whether they are valid based on other data.

    Other mutation rates from phylogenetic comparisions

    Nachman and Crowell have not been alone in their ultimate reliance on fossil evidence as an assumption underlying the per-generation mutation rate. But several other studies came to a slower mutation rate. Mostly, these studies have assumed that the human-chimpanzee divergence happened significantly earlier than 5 million years ago. Necessarily, then, the human per-generation mutation rate would have to be lower, as long as the sequence divergence remained the same.

    These estimates are ultimately rooted in the date of one or more fossils, among which the generation time certainly varied. The resulting per-site mutation rates are often reported as per-year instead of per-generation. For example, Yi and colleagues [5] yielded a rate of 0.99 x 10-9 per year for the human-chimpanzee comparison, which would multiply to 1.98 x 10-8 per 20-year generation. They propose this as a maximal rate, assuming that Sahelanthropus at a minimum date of 6 million years ago is a hominin. With an older divergence date, they propose a correspondingly lower rate (e.g., 0.79 x 10-9 per year, not accounting for ancestral population polymorphism).

    Similarly, Steiper and Young [6] considered a long (1.9 Mb) alignment of sequence from 19 primate species. In their model to estimate relative rates on different branches of the primate phylogeny, they incorporated the assumption that Sahelanthropus is on the hominin clade. A divergence date of 6 million years gave rise to a human per-site mutation rate of 0.65 x 10-9 per year (1.3 x 10-8 per 20-year generation). A divergence date of 7 million years lowered the mutation rate to 0.57 x 10-9 per year.

    Low mutation rates do not always result from these studies. Several have arrived at either a high human mutation rate or a recent human-chimpanzee divergence time. Sometimes a recent human-chimpanzee divergence emerges simply by assuming the rate given by Nachman and Crowell. Yang [7] provides an example of this -- a paper that very thoroughly explores the relationship of divergence time and ancestral effective population size, but ultimately roots the estimates on a single value for mutation rate. This rate we have already seen was itself based on an assumption about divergence time.

    Kumar and colleagues [8] came to a much lower estimate for the human-chimpanzee divergence time, based on an Old World monkey-hominoid divergence at 23.8 million years ago. This estimate did not consider the effect of ancestral polymorphism on the mean genetic divergence time, and so should -- in the language of computer software -- be deprecated.

    I should reiterate that none of these estimates are suitable for testing the times of phylogenetic divergences, because they all assume that the date of some particular fossil (or set of fossils, by fitting a model) is the minimum divergence time for a clade.

    So much of the literature in this area is ultimately circular, I'm pulling out my sparse hair reading through it. By the time we get back to the mid-1990's, the sequence data are even sparser than my hair by today's standards -- only a few hundred base pairs, or a sampling of restriction sites. But the divergence time estimates have propagated forward from that time to today, recycled through the assumptions of papers in the intervening time. It's like the genetic equivalent of money laundering!

    Evidence from parent-offspring sequence differences

    There is another way besides phylogenetic comparison: Simply look at living people and see how many new mutations they have.

    But this is tricky because we are rarely in a position to know which mutations are new. Most variations that we see between two people have persisted in the population for hundreds of generations or more. It takes a special kind of mutation to make its newness evident.

    Up until the advent of large-scale sequencing, the most important source of information about the mutation rate came from the rates of spontaneous Mendelian diseases. When a person has a dominant genetic disorder not carried by either of his parents, you know that the mutation must be new. Disease rates have long been tracked as standard public health data.

    However, the per-genome or per-locus rate of Mendelian disorders can estimate the per-site rate of mutations only by adding well-resolved information about the target size of functional genes. For example, if we know the average gene length and the proportion of different amino acids in functional proteins we can make some estimate of the ratio of synonymous to nonsynonymous sites. But we would still lack information about the fraction of nonsynonymous mutations that cause deleterious effects on protein function. For this reason, it was possible for very early workers (e.g., Haldane) to come within the ballpark of per-locus mutation rates even before the genetic code was available. Yet such estimates are not strictly useful for understanding per-site rates of mutation.

    By 2000, widespread sequencing had begun to identify disease-causing mutations at the sequence level. When exons are known, it is possible to determine the "target size" -- the number of sites at which loss-of-function mutations may occur. These two values provide the numerator and denominator for an estimate of the per-site mutation rate.

    Kondrashov [9] applied this method to estimate the per-site mutation rate across 20 human genes. He surveyed the literature for genes where more than 100 patients had been sequenced completely for the causative locus, finding the causal mutations. Using this value and the disease incidence allowed an estimate of the per-site rate of mutation for different categories of transitions and transversions. There was some variation among loci, with an average rate of per-site mutation equal to 1.8 x 10-8 per generation.

    Kondrashov observed a few hotspots in these genes, with substitution or deletion rates as much as a hundred times the average site. He also observed that the per-gene rate of mutation varies according to the number of CpG sites. The rate of short deletions was on the order of 5 x 10-10, insertions were even less frequent.

    The rate estimate by Kondrashov is within the range considered by Nachman and Crowell, but only 3/4 of the value 2.4 x 10-8 widely cited as the long-term estimate. If this rate were applied to Nachman and Crowell's pseudogene data, it would predict a human-chimpanzee divergence time around 6 million years.

    This year, Lynch [10] performed a more extensive comparison using similar methods as Kondrashov. Including more genes, and considering a broader range of mutational effects (including missense as well as nonsense coding mutations), Lynch found an even lower estimate of mutation rate per generation -- only 1.28 x 10-8 per site.

    These estimates are not precisely the same as comparing parent-offspring pairs, but they are exceedingly powerful because the data on disease rates encompass very large populations of people.

    We should keep in mind the result of Subramanian and Kumar [2], which showed that exons have a higher effective rate of substitution than do noncoding regions. That result implies that the genome-wide rate of change should be lower than estimated by Lynch, because his estimate encompasses only coding mutations. Also, any effect of purifying selection on these mutations will tend to decrease the long-term rate of substitutions per site to a lower value than the rate of mutations. The rate estimated by Lynch should then be an overestimate of the substitution rate that would be applicable to hominoid phylogenetic relationships.

    A slower rate

    These estimates of the per-generation mutation rate are all low compared to the commonly-cited 2.5 x 10-8. They are not quite as low as the rate estimated by Roach and colleagues [1], but the Lynch estimate is very close: 1.28 x 10-8 compared to 1.1 x 10-8 per site.

    The lower estimate from Roach and colleagues is a direct comparison of parent and offspring. In my earlier discussion of that rate, I suggested that false negatives in the sequence comparisons might have lowered the apparent rate of mutations. I still think we can't rule out that possibility. But the rate is not alone, and so it is less surprising than it may have seemed.

    My post last week on the 1000 Genomes Project results ("Now for anthropological genomics") mentioned that the 1000 Genomes comparisions have arrived at essentially the same rate as Roach and colleagues. Comparison of one family trio led to a rate of 1.0 x 10-8 per site per generation; the other family trio gave rise to an estimate of 1.2 x 10-8 per site per generation. These bracket the estimate given by Roach and colleagues.

    My basic observation about the human-chimpanzee divergence time is still sound:

    If this mutation rate is accurate, then the average human-chimpanzee gene divergence has to be up around 11 million years ago. That can be accommodated with a 7-million-year-old species divergence only if we assume a very large ancestral population -- on the order of 50,000 or higher. Or, the ancestral effective size could be lower -- but that would make the species divergence substantially older -- 9 million years or more.

    As we go further back in time, this lower human mutation rate may be less and less relevant, because different primate lineages may have higher (or lower) rates. When some of the kinks have been worked out of whole-genome sequencing, it would be tremendously useful to sequence parent-offspring pairs in other primate species. With those data, rate heterogeneity could be tested directly.

    For events within the hominins, the parent-offspring rate of mutations ought to be better than a rate estimated from phylogenetic distance. Phylogenetic distances are estimated with even more error than mutations, increasingly so as our methods for comparing genomes improve. But some fraction of new mutations will ultimately be lost to purifying selection. That implies, again, that the longer term rate of substitutions will be lower than the rate of mutations measured from parent-offspring comparisons.

    A rate of 1.1 x 10-8 would have no effect on the number of genetic differences observed between people, because these differences are just counted, not estimated by genealogical relationships that are known. It is the unknown genealogical relationships, which are estimated from genetic differences, that may change substantially.

    Let's consider an example. Harris and Hey [11] sequenced 4200 bp of the gene PDHA1, an X-linked gene whose product is part of a mitochondrial enzyme complex. At the time of their study (1999), their result was one of the oldest coalescence times estimated for non-African populations based on sequence data; they estimated the root of the PDHA1 genealogy was 1.8 million years old. This estimate was based on the assumption that human and chimpanzee copies, which differed by an average of 40.42 substitutions, had diverged at 5 million years ago. That would imply that the average genetic difference between humans across the deepest root of the genealogy, 15.05 mutational differences, corresponds to 1.86 million years of time. If we instead assert a per-generation rate of 1.1 x 10-8 per site, these data would generate an estimate of 163,000 generations for the root of the genealogy, roughly 3.3 million years.

    In other words, a coalescence that appeared to have happened in early Homo now looks rooted at the age of A. afarensis. The chimpanzee-human genetic root would be around 8.7 million years for these data.

    These estimates would likely be biased too low, because the X chromosome has a lower rate of mutation than the autosomes by some extent. That issue was addressed by Lynch [10], due to the fact that X chromosomes are in males (with their higher rate of mutations) only 1/3 of the time compared to 1/2 the time for autosomes. Any purifying selection would also bias the estimate too low. If these 4200 bp have a higher-than-average CpG content, that is one factor that might require a higher per-generation rate.

    Is any of this a problem? I don't think we know yet. A lower rate must readjust the apparent correspondence of some molecular time estimates with the archaeological record. But to be honest, most of the apparent correspondences of such dates have been illusory, because genealogical relationships among genes have such large expected variance under any realistic human population model. It is really the availability of whole-genome comparisons that has a chance of improving these population models.


    References

    1. Roach JC, Glusman G, Smit AFA, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, et al. 2010. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science [Internet] 328:636–639. Available from: http://dx.doi.org/10.1126/science.1186802
    2. Subramanian S, and Kumar S. 2003. Neutral Substitutions Occur at a Faster Rate in Exons Than in Noncoding DNA in Primate Genomes. Genome Research [Internet] 13:838–844. Available from: http://dx.doi.org/10.1101/gr.1152803
    3. Crow JF. 2000. The origins, patterns and implications of human spontaneous mutation. Nature Reviews Genetics [Internet] 1:40–47. Available from: http://dx.doi.org/10.1038/35049558
    4. Nachman MW, and Crowell SL. 2000. Estimate of the Mutation Rate per Nucleotide in Humans. Genetics [Internet] 156:297–304. Available from: http://www.genetics.org/cgi/content/abstract/156/1/297
    5. Yi S, Ellsworth DL, and wen-Hsiung Li. 2002. Slow Molecular Clocks in {Old World} Monkeys, Apes, and Humans. Molecular Biology and Evolution 19:2191–2198.
    6. Steiper ME, and Young NM. 2006. Primate molecular divergence dates. Molecular Phylogenetics and Evolution [Internet] 41:384–394. Available from: http://dx.doi.org/10.1016/j.ympev.2006.05.021
    7. Yang Z. 2002. Likelihood and Bayes Estimation of Ancestral Population Sizes in Hominoids Using Data From Multiple Loci. Genetics [Internet] 162:1811–1823. Available from: http://www.genetics.org/cgi/content/abstract/162/4/1811
    8. Kumar S, Filipski A, Swarna V, Walker A, and Hedges BS. 2005. Placing Confidence Limits on the Molecular Age of the Human-Chimpanzee Divergence. Proceedings of the National Academy of Sciences, U. S. A. [Internet] 102:18842–18847. Available from: http://dx.doi.org/10.1073/pnas.0509585102
    9. Kondrashov AS. 2003. Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases. Hum. Mutat. [Internet] 21:12–27. Available from: http://dx.doi.org/10.1002/humu.10147
    10. Lynch M. 2010. Rate, molecular spectrum, and consequences of human mutation. Proceedings of the National Academy of Sciences [Internet] 107:961–968. Available from: http://dx.doi.org/10.1073/pnas.0912629107
    11. Harris EE, and Hey J. 1999. X chromosome evidence for ancient human histories. Proceedings of the National Academy of Sciences, U. S. A. 96:3320–3324.
    Synopsis: 
    The 1000 Genomes Project is finding that the mutation rate is half the value usually assumed.
  • Time to revise the mtDNA timescale?

    Wed, 2010-08-18 23:35 -- John Hawks

    Krzysztof Cyran and Marek Kimmel (2010) have presented a revised set of estimates of the human mtDNA most recent common ancestor (MRCA). It's an interesting theoretical paper, written for the purpose of developing a method that doesn't rely on the same assumptions as the usual coalescent models.

    Their new method gives an estimate of 174,000 years ago for the human MRCA. They report an upper/lower range as 96,000 to 449,000 years ago. That range does not represent a confidence interval on the estimate, it's an upper/lower based on extreme assumptions about human/Neandertal genetic distance and the human/Neandertal MRCA.

    The Neandertal mtDNA has really affected the way we estimate human MRCA, at least for the mitochondrial genome. Chimpanzees are just too distant. When we compare human and chimpanzee mtDNA genomes, there has been a lot of parallelism and reversal on both lineages, because mutations have hit the same place multiple times. Multiple hits and purifying selection make a mess out of rate estimation -- generally, they make the human MRCA seem a lot older than it truly was. The Neandertals are closer, and are therefore less of a problem.

    But the Neandertal-human MRCA itself was poorly known, as long when we had only chimpanzees to calibrate the mutation rate....

    That's what we discovered earlier this year with the mtDNA genome of the Denisova specimen [1] ("The Denisova mtDNA sequence: The X-Woman"). Denisova is an outgroup to the human-Neandertal mtDNA clade, which diverged from our mtDNA ancestors around a million years ago. Sliding in that branch redated the human-Neandertal MRCA down to 460,000 years ago. Unfortunately, that paper came too late for Cyran and Kimmel [2] to use the revised human-Neandertal MRCA in their calculations. They assumed a date of 511,000 years ago for the human-Neandertal MRCA.

    Still, the paper gives enough detail to work out the effect of a lower human-Neandertal MRCA on their estimate. They obtained their lower bound (96,000 years) by assuming a human-Neandertal MRCA of 389,000 years. If we substitute in the Denisova-informed human-Neandertal MRCA, we can figure that the human MRCA will be around 130,000 years ago or so.

    That's awfully recent.

    I don't want to go too far with these numbers. My first objection is that they all assume the total absence of selection, when we have long known that some human mtDNA clades have been selected in some parts of the world. It's entirely possible that the human MRCA is recent because of natural selection on some mitochondrial-linked phenotype ("Complete Neandertal mitochondrial sequence, and selection on human (not Neandertal) mtDNA", "Has the dam broken on mtDNA selection?", "Selection, nuclear genetic variation, and mtDNA").

    And even if we assume no selection at all, there's not a lot to be gained by increased precision of these estimates. Branch lengths of an mtDNA genealogy give only extremely wide estimates of ancient events. Saying that something happened "around 50,000 years ago, plus or minus 35,000", it hardly matters whether we change that to "around 43,200 years ago, plus or minus 35,000." I would even argue that the round estimate is better, because it doesn't communicate a misleading impression of precision.

    Still, it does a lot of good to know whether estimates are systematically biased in one direction. And this work, combined with what we know about the Neandertal and Denisova complete mtDNA genomes, suggests that our mtDNA branch lengths may have been biased too high.

    It remains to be seen how much of the human mtDNA tree will be affected by this logic. The most recent branches can in many cases be calibrated against historical events, and ultimately parent-offspring comparisons. So those aren't likely to change much. What worries me is that critical period around 30,000--80,000 years ago, when human mtDNA lineages were diversifying worldwide. The timescale of mtDNA divergence is already out of whack with the rest of the genome. Pushing these divergences more recent will make the fit between mtDNA and autosomal estimates worse. But given the wide variance on coalescence times, Cyran and Kimmel's estimates are consistent with the hypothesis that these might be substantially higher -- so it's hard to guess whether the apparent mismatch is real or not.

    I might have missed this paper if it weren't for the press release about it from Rice University. But what a misleading release! It's headlined, "Mother of all humans lived 200,000 years ago" -- which the paper doesn't conclude. If that were the conclusion, it wouldn't be news, because it's confirming a widely-used estimate that's more than 20 years old.

    But there are actually several interesting angles to the story that the press release fails to mention. Their estimation method may prove useful for many species for which we have no good demographic model -- a problem that the release alludes to, but doesn't feature. The method they develop came from a similar process, which had formerly led to a much, much higher estimate of human MRCA. Their estimate is a lot lower -- in large part because they can exploit the Neandertal genetic information. And then there's the likely possibility that the actual MRCA may be much lower, which would truly be unexpected compared to most earlier work.

    At the end of their paper, Cyran and Kimmel give a short discussion of the history of the Out of Africa mtDNA story. They mention the idea that some people favoring the multiregional hypothesis had suggested older dates for the human mtDNA MRCA. Aside from O'Connell [3], however, they didn't cite this literature. The conclusion of a short timescale, with a MRCA around 200,000 years ago, was challenged by a number of geneticists [4],[5]. The most common point was that the upper confidence limit on the MRCA estimate must be very high -- potentially 800,000 years ago or more, because of the great uncertainty about rates, coming from the chimpanzee-human branch length. This remains a problem, although the availability of a Neandertal outgroup helps to clarify which changes on the human lineage are actually recent.

    It's sort of interesting that even in the current paper, we still have an upper estimate of the human MRCA that's nearly 450,000 years ago! I don't think that the assumptions going into that value are realistic, but there's no real upper confidence bound on the central estimate. It might well go as high as 450,000 years, given the huge uncertainty in the depth of the deepest branches of that African mtDNA genealogy.

    So I guess I'm not really sure we've advanced very far in 20 years!


    References

    Synopsis: 
    A study of human variation adds precision to the human mtDNA mutation rate; I compare to results from archaic humans.
  • Ardipithecus challenge explication: the molecular clock

    Sat, 2010-05-29 18:28 -- John Hawks

    I've had a chance to mull over the exchange between Esteban Sarmiento and Tim White and colleagues in Science this week (Sarmiento 2010, White et al. 2010). It is not really fair to rely on brief technical comments to straighten out the meaning of a fossil skeleton. Each set of authors had less than 1000 words to put forth their arguments, which means that there were doubtless many pieces of support that they had no space to include.

    But when I write a technical comment, I spend a lot of time and effort to make the important points in that small space. If we can't agree on the basic outlines of the issues in a thousand words, I expect that ten thousand wouldn't settle anything, either.

    I have a wide array of reactions to the points in these comments, but I think it will be most useful for me to focus on just three issues. I'm going to include a lot more text than I would for a technical comment, both so that I can include the direct quotes from Sarmiento and White and colleagues, and so that readers with less direct knowledge of the issues can follow along. And I'll divide each issue into its own post, so that it doesn't take a week to get something posted.

    Let's start with the molecular clock argument. Sarmiento puts it briefly, depending on citations to do the lifting:

    Over the past 40 years, a multitude of independent biomolecular studies based on different methods, some analyzing millions of DNA base-pair sequences, have arrived at a minimum human/African ape divergence date of ~3 to 5 million years before the present (19–26)—a date that accords well with those based on comparative anatomical studies of living and fossil hominoids (15). With a 4.4-million-year geologic age (1), Ar. ramidus probably predates the human and African ape divergence.

    As I mentioned earlier this week, I discussed the issue in some depth last fall. The same argument originally was made by Vince Sarich, when the biomolecular evidence was based on antibody reactions to blood albumin, and the question was whether Ramapithecus was too old to be a hominin. Sarich (1971:76) memorably wrote:

    [O]ne no longer has the option of considering a fossil specimen older than about eight million years a hominid no matter what it looks like.

    David Pilbeam and others had claimed Ramapithecus as a hominid mostly because of its dental similarities to Australopithecus. Later, it became clear (especially thanks to David Frayer and Leonard Greenfield) that Ramapithecus wasn't even a valid taxon; the remains were females of Sivapithecus. Later it was shown that Sivapithecus itself had the forelimb of an arboreal quadruped; it apparently did not have a locomotor strategy like that of living great apes.

    Sound familiar?

    Sarmiento is correct. Over the past ten years, the human-chimpanzee divergence time has usually been put around 4 million years ago. Two things make this a deeper problem than it may appear. This estimate refers to the population divergence, and is a function both of the average genetic divergence and the variance among genetic loci in that divergence. That means that a simple recalibration to a lower mutation rate may not be enough to raise the estimate substantially.

    Second, the date of Ardipithecus isn't 4.4 million years -- it's 5.5 million, the time assigned to Ardipithecus kadabba. Unless they want to sunder the genus, White and colleagues really need a much higher population divergence time than the range most studies have been reporting.

    It's a complicated issue. So I was very interested to see which parts of this problem White and colleagues were especially focused on. How would they respond to the Sarich argument?

    [Sarmiento] argues that biomolecular studies accurately converge on a divergence date of approximately 3 to 5 million years ago, concluding that Ar. ramidus "probably predates the human and African ape divergence." However, his cited estimates vary widely and all rely on inadequate calibration. Indeed, the strongest calibration is now from hominids themselves: Late Miocene fossils from Chad, Kenya, and Ethiopia whose derived characters effectively falsify late divergence estimates (2).

    I found this really disappointing. There's no attempt here at any sensible critique of the molecular divergence time. Why is the calibration inadequate? What is the maximum human-chimpanzee divergence date we get by assuming that Chororapithecus is on the gorilla clade? Do they have a candidate for a significantly earlier pongine than Sivapithecus indicus? Do White and colleagues advocate an Eocene divergence of hominoids and cercopithecoids? Do they claim a mutation rate slowdown in humans, or in hominoids?

    Instead of giving a sensible response, White and colleagues resort to circular logic. In their description, molecular comparisons can never show that Ardipithecus is too early to be a hominin, because we can never accept a calibration that shows Ardipithecus is too early to be a hominin.

    References:

    Frayer DW. 1976. A reappraisal of Ramapithecus Yearbook Phys Anthropol 18:19-30.

    Greenfield LO. 1979. On the adaptive pattern of "Ramapithecus". Am J Phys Anthropol 50:526-548.

    Sarich VM. 1971. A molecular approach to the question of human origins. In (P. Dohlinow & V.M. Sarich, Eds.) Background for Man: Readings in Physical Anthropology, pp. 60‐81. Boston: Little, Brown.

    Sarmiento EE. 2010. Comment on the paleobiology and classification of Ardipithecus ramidus. Science 328:1105. doi:10.1126/science.1184148

    White TD, Suwa G, Lovejoy CO. 2010. Response to Comment on the paleobiology and classification of Ardipithecus ramidus. Science 328:1105. doi:10.1126/science.1185462

  • Were there Cretaceous anthropoids? Part 2: What is an anthropoid?

    Wed, 2010-05-26 16:13 -- John Hawks

    This is the second post in a series, "Were there Cretaceous anthropoids?"

    Before I go too far, I think I'd better make sure everybody knows what an anthropoid is. The living anthropoids include Old World and New World monkeys (the superfamilies Cercopithecoidea and Ceboidea) and apes and humans (superfamily Hominoidea). To give a more technical definition, applicable to fossil as well as living taxa, I can hardly improve on this passage from Williams et al. (2010):

    By definition, crown Anthropoidea includes all species, living and fossil, descended from the last common ancestor of extant anthropoids. Stem Anthropoidea includes all fossil taxa that are more closely related to crown anthropoids than they are to tarsiers, but are outside the anthropoid crown group.

    The concepts of "crown" and "stem" groups are very important to paleontological systematics. The concepts recognize that the most recent common ancestor of the living anthropoids (the crown ancestor) lived after the most recent common ancestor of anthropoids and any other primates (the stem ancestor). Williams and colleagues (2010) assume that tarsiers are the closest living relatives of anthropoids, and this is the most widespread hypothesis today despite some detractors. A systematist tries to distinguish crown and stem groups based on derived features. A fossil that shares one or more derived features with a particular group of living anthropoids (and not some others) would be interpreted as a member of the crown clade. By contrast, a fossil that shares some anthropoid derived features, but none of the derived traits of any particular group of living anthropoids, would be a stem anthropoid.

    Williams and colleagues went on to list a number of features by which fossils might be recognized as stem anthropoids. These aren't always the features that would come to mind if you're thinking of living anthropoids.

    Many features distinctive to living monkeys and apes are soft tissue characters. These traits may be useful evidence about our relationship to other living primate superfamilies, because some of them are shared with tarsiers. Nowadays, many derived genetic markers are known to characterize anthropoids. Again, these are useful for ascertaining the relationships of anthropoids to other living primates, but not so useful for fossils.

    So I've extracted from Williams et al. (2010:4798) a list of those characters that comprise bony anthropoid derived characters:

    Most anthropoids have orbits that are relatively small, forward facing, and convergent (20). A bony lamina posterior to the orbit completely separates the eyes from the chewing muscles in the temporal fossa (21).

    Anthropoid features of the auditory region include a distinctive configuration of the internal carotid arterial system that supplies the orbit and much of the cerebrum (23). The middle ear cavity of the temporal bone extends forward into an air-filled accessory chamber containing a network of bony trabeculae (24). The tympanic bone that supports the eardrum is fused to the bony sidewall of the middle ear.

    Early fusion occurs in both the frontal metopic suture and the mandibular symphysis (25). The body of the mandible is relatively deep (26).

    Other anthropoid dental features include small, vertically implanted and spatulate lower incisors, simplified molar trigonids, and lower third molars with short heels (8).

    To these dental traits can be added another feature important to the identification of Eosimias as a stem anthropoid: the lower premolars are implanted obliquely in the jaw, rather than parallel to the mesiodistal axis of the mandibular corpus.

    And then there's the foot:

    The bony anatomy of the crown anthropoid foot is distinctive: the facet between the talus and fibula is steep-sided, and the groove for the tendon of the flexor fibularis muscle is in a midtrochlear position (6). The calcaneus is wide with a shortened heel and a distinctive calcaneocuboid joint shape (27). The peroneal tubercle on the first metatarsal that receives the tendon of the peroneus longus muscle is reduced in size (28).

    The ankle and foot are among the biggest contrasts between living tarsiers and anthropoids, as the tarsiers are highly derived in support of their leaping. Their fibulae are fused distally with the tibia and their foot bones are elongated and specialized.

    With this list of bony characters, we're relatively well-equipped to recognize an anthropoid skeleton. But fossils of potential stem anthropoids still present some obstacles. Of course, fossils are fragmentary, so a specimen may only preserve a small part of the anatomy. If it happens to be a partial mandible with the symphysis and a molar or two, we're on the right track. That gives us the opportunity to look at most of the dental and mandibular features that are derived in anthropoids. Or a partial skull that preserves the temporal features together with the orbit. That's really a rich environment for anthropoid-derived traits.

    The lack of associations among specimens can be a big problem. If we have a mandible with some anthropoid traits, and then we find a tibia, what are we to make of it? It may not be safe to assume that they come from the same kind of animal, even if both are anthropoid-like. If one has only one or two anthropoid-like features, and we assume that they come from the same species, it will influence our phylogenetic interpretation of other fossil lineages.

    The biggest obstacle is that the characters of crown anthropoids didn't all evolve simultaneously. A stem anthropoid may have some of them, but not others. It may also have its own derived traits not present in any crown anthropoids. In a complete specimen, this mixture of traits may be the best evidence for the pattern of evolution of the derived traits in the crown group. But in a fragmentary fossil, such a mixture may easily cause confusion.

    Next: Ghost lineages.

    References:

    Williams BA, Kay RF, Kirk EC. 2010. New perspectives on anthropoid origins. Proc Nat Acad Sci USA 107:4797-4804. doi:10.1073/pnas.0908320107

  • Were there Cretaceous anthropoids? Part 1. The problem in a nutshell

    Tue, 2010-05-25 00:19 -- John Hawks

    The evolution of early primates is a field that has developed rapidly in the last fifteen years. Many of the central issues were reviewed earlier this year by Blythe Williams, Richard Kay and E. Christopher Kirk ("New perspectives on anthropoid origins"). I want to touch on some issues, in a series of posts that may seem like a bit of a grab-bag.

    I got started with a fairly simple question -- the one from the title -- were there anthropoid primates in the Cretaceous? Since this has expanded into a series, I think I'd better lead with the answer: We really don't know.

    From the fossil perspective, fifteen years ago you'd have thought I was crazy to even ask the question. It was common knowledge that the living orders of mammals diversified after the extinction of the dinosaurs 65 million years ago. Even today, there are no widely-accepted anthropoid fossils earlier than the Middle Eocene. A few specimens argued to represent anthropoids are earlier, one as early as the Late Paleocene. But that's it. No positive evidence of earlier primate diversification, nothing that even looks like a primate before the Paleocene. Many paleontologists concerned with primate origins have assumed that the common ancestors of today's primates lived in the Late Paleocene, and that the primates had diverged from their closest relatives -- tree shrews, colugos, or bats -- sometime after the end of the Cretaceous.

    Opposed to this traditional view, molecular comparisons of living primates and other mammals have suggested an earlier diversification of primates, as early as 90 million years ago. For example -- and I'll review many others during the course of the series -- Steiper and Young (2006) estimated the ages of primate divergences from long sequences homologous to an area around the CFTR gene on human chromosome 7. Like most studies, they assumed some "calibration points" with dates based on fossil evidence. One of these was the human-chimpanzee divergence (7 million years, based on Sahelanthropus), the other was the macaque-baboon divergence (8 million years). Steiper and Young did not have a tarsier sequence, so they reported the estimated strepsirrhine/anthropoid divergence -- necessarily older than the first anthropoids, since tarsiers are the sister group of the anthropoid clade. Their estimate: between 88.2 and 110.2 million years ago. In addition to these "old" estimates, they reported a range of "young" estimates based on lower calibration times, only 60.8 to 75 million years ago. That's a mere 20 to 30 million years older than the oldest uncontroversial anthropoids.

    Primates are only one skirmish point of a much larger battle about the timing of diversification of mammal orders. DNA comparisons have consistently sketched out a long pre-Tertiary history for placental mammals. Many paleontologists favor a hybrid view, in which some superordinal groups of mammals -- like Afrotheria or Archonta -- existed in the Late Cretaceous, while the modern orders themselves got started after the extinction of the dinosaurs. That's a softer view of mammal diversification than the traditional idea of a rapid origination of all these groups after the K-T boundary. Even the hybrid hypothesis hides an apparent problem: It means that the K-T impactor must have spared dozens of distinct lineages of mammals, even as it wiped out every kind of dinosaur, large marine reptile, and 75% of the rest of species.

    For a brief review of this issue as applied to mammals generally, I can suggest a 2009 article by Jennifer Evans, "The disputed rise of mammals." She reports on the largest molecular phylogeny yet constructed for mammals, called the "supertree," and its apparent conflict with the fossil record:

    Although the supertree dated the origin of mammals at 93 million years ago and showed 43 placental lineages surviving the K/T boundary, Wible's analysis of more than 400 morphological characters in Cretaceous fossils across 69 taxa placed the oldest placentals at 63 million years ago. "There was no evidence in the fossil record that any of Cretaceous forms previously identified [by molecular biologists] as placentals were in fact placentals," says Wible.

    Although fossils were used to date divergence points on the supertree, they could only date back to around 55-65 million years ago, where paleontologists have fossil evidence of modern mammals. "Until there's [fossil evidence of] a Cretaceous primate that everyone agrees upon there will be conflict between molecular and paleontological evidence," says Ross MacPhee, from the American Museum of Natural History.

    Primates take center stage as one of the best-documented early mammalian lineages. Some expect to find anthropoids as early as 90 million years ago, yet the oldest well-established anthropoids are only around 45 million years old.

    You might think that the solution is simple. For example, we might posit a different mutation rate on early branches of the primate phylogeny. Or we might simply give up on proposed Miocene hominins like Ardipithecus. Move those calibration points, and you totally eliminate the conflict between molecular and fossil information. But there's uncertainty on the fossil side as well. We expect the known fossils to underestimate the ages of branches on the primate phylogeny -- missing information can do nothing else. Some scientists have suggested that the known primate record is fully consistent with a Cretaceous origin and diversification, given what we know about which lineages of living primates are missing from the fossil record.

    So understanding this problem will require some examination of both the fossil record and molecular evidence. Today this molecular side can be expanded to whole-genome comparisons of various kinds, and the data have expanded faster than anybody's analysis of them. That makes it a great topic -- there's work to do here, for those who understand the connections between the fossil and genetic records.

    Next: "What is an anthropoid?"

    References:

    Evans J. 2009. The disputed rise of mammals. The Scientist 23:47. Online.

    Steiper ME, Young NM. 2006. Primate molecular divergence dates. Mol Phylogenet Evol 41:384-394. doi:10.1016/j.ympev.2006.05.021

    Williams BA, Kay RF, Kirk EC. 2010. New perspectives on anthropoid origins. Proc Nat Acad Sci USA 107:4797-4804. doi:10.1073/pnas.0908320107

  • Return of the Neanderchimps

    Mon, 2010-05-17 23:42 -- John Hawks

    Back in 2005, I reviewed the first description of fossil chimpanzee teeth, from the Middle Pleistocene of the Kapthurin Formation, Kenya, dating to around 500,000 years ago. At the time, I noted that no chimpanzees have lived in the area in historic times, and that mtDNA evidence then suggested that East African chimpanzees (Pan troglodytes schweinfurthii) may have been recently derived from Central Africa. Together, those observations raised a mystery -- if today's chimps had no ancestors anywhere near Kenya 500,000 years ago, to what group did these fossil chimpanzee teeth belong? I suggested an answer: a cryptic population of chimpanzees partially or completely replaced by the dispersal of Eastern chimpanzees. In other words, Neanderchimps.

    Well, now that we know for sure that Neandertals are human, too... it's a good time to revisit the Neanderchimps. What can we say today about the population structure of chimpanzees in the past, and is it still possible that these chimpanzee fossil teeth are out of kilter with the population genetics of today's chimpanzees?

    A few weeks ago, we had Jody Hey visiting here on campus, and he gave a talk about his recent work on chimpanzee population genetics. Together with Rasmus Nielsen and others, Hey has been developing Bayesian methods for estimating the times of divergence, migration rates, and effective population sizes of species.

    The basic idea is that present-day samples of a species like chimpanzees reflect a branching process from an ancestral population. Each branch may exchange migrants with other branches, each branch has an effective population size, and each may begin with some kind of population bottleneck. That makes for a very complicated model -- for example, with only two populations, there are six parameters, not counting bottlenecks. With each additional population, the number of parameters is compounded by additional effective size, time of splitting, and migration rate to and from all other populations. The number of parameters increases faster than a factorial of the number of populations.

    Hey began this work several years ago, initially limited to the two-population case. Together with Yong-Jin Won, he showed that West African chimpanzees (P. troglodytes verus) have a substantially smaller effective size than central African chimpanzees (P. troglodytes troglodytes). These two subspecies appeared to have diverged within the last 300,000-400,000 years. And while there was little evidence for gene flow from central into west African chimpanzees, there was clear evidence for gene flow the other direction, from west into central Africa.

    Sound familiar?

    In a series of two-way analyses, Won and Hey showed that bonobos diverged from chimpanzees approximately 400,000-800,000 years ago, that there was no substantial evidence of gene flow into or out of bonobos after their speciation, and that the efective size of bonobos was around the same as that of west African chimpanzees, a bit under 10,000 effective individuals.

    Now, in 2010, Hey has extended both the data and method to encompass more than a single divergence between two populations. In the case of Pan, Hey has included three extant subspecies of common chimpanzees (P. t. troglodytes, P. t. verus, and P. t. schweinfurthii), together with bonobos (P. paniscus). Among those, in a bifurcating model of population divergence, there are three speciation times, ten effective sizes, and lots of asymmetrical migration rates, all scaled in one way or another to mutation rate. It takes a lot of data to estimate these parameters simultaneously. The study uses 73 loci from an average of 78 individuals split among the populations, which is apparently not quite enough data to get good parameter estimates for the migration rates, as the probability surfaces for these are shallow and relatively unresolved with a few exceptions.

    The parameters describing divergence times and effective sizes under the model have tighter posterior probability distributions, so that they are reasonably well estimated using these data. Here are the highlights:

    1. Bonobos split from chimpanzees around 930,000 years ago (680,000-1.54 million).

    2. The effective sizes of most populations were small (around 10,000 or less). The Pan ancestral population was moderately larger (around 17,000 effective individuals).

    3. Only central African chimpanzees were substantially larger in effective size, upward of 25,000-30,000 effective individuals during the last 460,000 years.

    4. All common chimpanzees (Pan troglodytes) descend from an ancestral population that existed 460,000 years ago (350,000-650,000).

    5. East African chimpanzees split very recently, only around 93,000 years ago (41,000-157,000) from central African chimpanzees.

    All these estimates result from a fairly restrictive model. Each population is described by two parameters, their interactions by an additional two parameters per population pair. The ideas of pulses of population mixture or founder effects are simply not possible in the model. I don't see this as a weakness -- I'd much rather begin with even simpler models. But it does mean that we cannot generalize the results past the model. In particular, we shouldn't compare these times and migration rates directly with those obtained under the model that Green and colleagues (2010) applied to the Neandertal genome.

    But after those words of caution, what can we make of this proposed population history for chimpanzees? Here are some possible conclusions relevant to human evolution:

    1. Eastern and central chimpanzee subspecies share a more recent history than would have been true of humans and Neandertal populations at the time the latter existed. Western chimpanzees are more distant from other chimps than the Neandertals and humans were from each other.

    2. For that matter, population differences between MSA humans within Africa may have been nearly as great as those between eastern and central African chimpanzee subspecies.

    3. Bonobos and chimpanzees split roughly a million years ago with little if any subsequent interbreeding. At least in the west (Africa, Europe and West Asia), Pleistocene human populations did not experience this kind of allopatric speciation. At the moment, I enter that as an assertion, which I'll follow up later by some discussion of the pre-Neandertal problem.

    4. The effective sizes estimated for ancient human populations are not especially low.

    5. Range expansions and partial or complete replacements were part of the population history of chimpanzees. They managed these dynamic events without handaxes, fire, projectile weapons, language, or any of the other proposed trappings of Pleistocene humans.

    I want to follow up on a couple of these. First, effective size: You often hear people claiming that humans have much lower genetic diversity than chimpanzees. It is true only in a limited sense. Bonobos, west African and east African chimpanzees are populations with lower genetic variation than humans. The estimate for the effective size of the common chimpanzee ancestral population, 7100, is substantially lower than estimated for the human ancestral population during the same time period, a period stretching from roughly a million to 460,000 years ago. The common ancestral population of chimpanzees and bonobos is inferred to have had an effective size close to that of ancestral humans at the same time, around 17,000 effective individuals prior to a million years ago.

    One may object that chimpanzees cover a much smaller area than Pleistocene humans, so we should expect their effective size to be much lower. But genetic variation can be related to population size only by assuming a population model, and Hey's analysis gives us a model quite starkly different from the usual. That doesn't mean it's correct, or that it is a better estimator of the census size of the ancient populations. But it reminds us that comparing the genetic variation of humans and chimpanzees is too simplistic; that the gene trees within each populations are very sensitive to the relative contributions of different parts of each species' range during the last 500,000 years. In chimpanzees, the high genetic variation mostly can be attributed to the central African subspecies; in humans, the extant genetic variation can mostly be attributed to Africa.

    Let's ponder chimpanzee range expansions for a moment longer. We know that in the early Middle Pleistocene, chimpanzee-like apes lived in western Kenya. The only chimpanzees who live anywhere near that area today seem to have been much more strongly connected to chimpanzees in western Congo prior to 93,000 years ago, and that central African population still has much more variation than the eastern ones. That suggests a recent range expansion, Late Pleistocene in age, into East Africa.

    We don't know that the earlier chimpanzees became extinct. They may have contributed genes into later P. schweinfurthii, just as Neandertals did into living humans. We can tell stories about climate change and the former East African chimpanzees, just as people have done about human origins, megadroughts and volcanoes. But one thing is clear about the chimpanzees: there was no modern chimpanzee revolution. The other chimpanzee subspecies, P. t. verus, is still here.

    UPDATE (2010-05-20): "More on chimpanzee population structure" discusses a subsequent paper on the same topic.

    References:

    Gagneux P, Gonder MK, Goldberg TL, Morin PA. 2001. Gene flow in wild chimpanzee populations: what genetic data tell us about chimpanzee movement over time and space. Phil Trans R Soc Lond B 356:889-897.

    Goldberg TL, Ruvolo M. 1997. Molecular phylogenetics and historical biogeography of east African chimpanzees. Biol J Linn Soc 61:301-324.

    Hey J. 2010. The divergence of chimpanzee species and subspecies as revealed in multipopulation isolation-with-migration analyses. Mol Biol Evol 27:921-933. doi:10.1093/molbev/msp298

    McBrearty S, Jablonski NG. 2005. First fossil chimpanzee. Nature 437:105-108. doi:10.1038/nature04008

    Won Y-J. Hey J. 2005. Divergence population genetics of chimpanzees. Mol Biol Evol 22:297-307. doi:10.1093/molbev/msi017

  • A low human mutation rate may throw everything out of whack

    Thu, 2010-03-18 16:30 -- John Hawks

    Last week, a paper looking for the genetic causes of Miller syndrome reported the whole genomes of four members of a single family: two siblings with the disorder and their two parents without. The idea was that they would simply compare the affected and unaffected genomes. They would then find candidate loci that might account for Miller syndrome in the affected siblings. By exploiting some other sources of information, they found what they were looking for. Daniel MacArthur covered the story in his post, "Disease hunting with whole genome sequences: the good news, and the bad news".

    I got interested in another aspect of the story. With whole-genome sequences of parents and offspring, it becomes possible to directly determine the rate of mutations in each generation. The paper by Roach and colleagues did just that -- they counted 28 in the 2.3 billion bases of sequence they included in their comparison. That makes a per-site mutation rate of 1.1 x 10-8 per generation.

    Which is a pretty interesting number. You see, it's less than half what it ought to be:

    [O]ur estimated human mutation rate is lower than previous estimates, the most widely cited of which is 2.5 x 10-8 per generation (10) based on three parameters: a human-chimpanzee nucleotide divergence per site (Kt) of 0.013, a species divergence time of five million years ago, and an ancestral effective population size of 10,000. More recent estimates indicate a nucleotide divergence of 0.012 (9), species divergence time between six and seven million years ago (11–15), and ancestral effective population size between 40,000 and 148,000 (16–19). With these parameter ranges and a generation length of 15 to 25 years, the mutation rate estimate is between 7.6 x 10-9 and 2.2 x 10-8 per generation, which is consistent with our intergenerational estimate of 1.1 x 10-8. Our estimate is within one standard deviation (SD) of an earlier estimate of 1.7 x 10-8 (SD: 9 x 10-9) based on 20 disease-causing loci (20). The rate we report is for autosomes, and should be several-fold lower than that of the Y chromosome, as in the male germline more cell divisions occur per generation. Though our rate differs approximately as expected from the recently reported estimate of 3.0 x 10-8 (95% CI: 8.9 x 10-9 – 7.0 x 10-8) for the Y chromosome, the error rates make this difference not significant (21).

    You can see the obvious implication: If this mutation rate is accurate, then the average human-chimpanzee gene divergence has to be up around 11 million years ago. That can be accommodated with a 7-million-year-old species divergence only if we assume a very large ancestral population -- on the order of 50,000 or higher. Or, the ancestral effective size could be lower -- but that would make the species divergence substantially older -- 9 million years or more.

    There is a second implication. Most studies of human genetic variation have assumed that 5-million-year-old human-chimpanzee divergence and the high associated rate of mutations. If the true rate is less than half that, then the coalescence times of human genes are more than double most estimates. That would include our estimates of human-Neandertal genetic differences.

    Well, that's a fine pickle.

    I'm not quite ready to believe the very low rate estimate. The analysis in this paper uncovered tens of thousands of false positives, and had to filter through those to arrive at 28 true mutations. The filtering involved resequencing all the positives to determine which were true and which were false, but maybe there's room in there for a substantial number of false negatives, too.

    If this low estimate were true of the human-chimpanzee divergence, it would imply vastly higher ages for other primate divergences, or a much lower rate on the human lineage specifically. So that allows another check on the process.

    But generally, I'll be looking at whole-genome family comparisons with great interest, because they will give us a much more precise understanding of the rate of mutations and recombinations across the genome.

    References:

    Roach JC and 14 others. 2010. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science (early online) doi:10.1126/science.1186802

    Synopsis: 
    Whole genome sequencing of a family finds a very low number of mutations, suggesting evolution doesn't have the timescale we thought.
  • Unbelievable Y chromosome differences between humans and chimpanzees

    Thu, 2010-01-14 00:11 -- John Hawks

    Holy crap!

    Indeed, at 6 million years of separation, the difference in MSY gene content in chimpanzee and human is more comparable to the difference in autosomal gene content in chicken and human, at 310 million years of separation.

    So much for 98 percent. Let me just repeat part of that: humans and chimpanzees, "comparable to the difference ... in chicken and human".

    This is from a new paper that's just shown up in the Nature advance publication zone. The authors are Jennifer Hughes and colleagues, and the subject is the first complete sequencing of the chimpanzee Y chromosome. "MSY" stands for "male-specific region of the Y chromosome" -- it's most of the Y, aside from a small fraction that recombines with the X chromosome.

    The Y chromosome was part of the initial chimpanzee genome draft, and was recognized then as a "clear outlier" in showing low human-chimpanzee sequence similarity (Chimpanzee Genome Consortium 2005). But it wasn't obvious just how different it was because the relatively short sequencing reads aligned fairly well with the human draft. That comparison also seems not to have included the missing genes (they might have just been missed during sequencing), or duplications. Moreover, the Y chromosome includes a high fraction of repetitive sequence, including long front-to-back, or "palindromic" passages. Only with very long reads with long overlaps is it possible to straighten out the large-scale sequence, and thereby detect sequence reorganizations and large copy number variants. This kind of intensive sequencing has so far been completed only for chromosome 21 and now the Y chromosome.

    I can't believe how sedated the reaction to this paper has been so far. The outcome of the sequencing is really, really weird. More than thirty percent of the chimpanzee Y chromosome has no homolog in humans, and likewise for the human Y in chimpanzees.

    I mean, really -- here's a map:

    Chimpanzee compared to human Y chromosome

    Just glancing at the ideograms, they don't even look like homologous chromosomes!

    Obviously they are; there's a whole lot of homologous sequence in there including functional genes. But the structure of both human and chimpanzee Y chromosomes has evolved incredibly fast compared to the rest of the genome.

    The central question: beyond its interest for Y chromosome structural evolution, what does this result say about the evolution of human (and chimpanzee) phenotypes?

    Option 1: Maybe nothing. The main mechanism for the rapid structural evolution was probably autologous recombination. Imagine that the Y chromosome wriggles around and different copies of repetitive sequences get together with each other.

    The molecular mechanisms that enabled this wholesale remodelling of ampliconic regions merit consideration. Although the chimpanzee and human MSYs do not normally participate in meiotic exchange with a partner chromosome, the mirroring of sequences in the ampliconic regions provides ample opportunity for ectopic homologous recombination within the MSY. This recombinational proclivity is well documented in the human MSY, where it has repeatedly given rise to large-scale structural polymorphisms during the past 100,000 years of human history as well as to Y-chromosomal anomalies that cause spermatogenic failure and sex reversal in current generations. We suggest that ectopic homologous recombination between MSY amplicons has similarly accelerated structural remodelling of the MSY in the chimpanzee and human lineages during the past 6 million years.

    That leads to rapid structural evolution, but not necessarily any functional changes.

    Option 2: Massive changes in gene regulation. Then again, widespread relocations of genes have a way of stripping them apart from upstream (or downstream) elements that may regulate their expression. Besides that, chimpanzees have lost several genes entirely, while humans have picked up a few that weren't in the common ancestor. So there's a potential for phenotypic evolution from these changes, possibly reverberating through the genome.

    In aggregate, the consequence of gene loss and gain in the chimpanzee and human lineages, respectively, is that the chimpanzee MSY contains only two-thirds as many distinct genes or gene families as the human MSY, and only half as many protein-coding transcription units.

    That's pretty amazing. They speculate that the most important phenotypic correlates of these genetic changes may be related to sperm or testicular function, which certainly is a target of rapid evolution elsewhere in the chimpanzee and human genomes.

    Option 3: Hitchhiking. OK, this isn't different or mutually exclusive from the above, but it's worth remembering that it only takes a single advantageous mutation to fix the entire Y chromosome in the population. That event carries with it whatever strange mutations might be on the same copy as the initial advantageous change. This kind of event may have happened dozens or even hundreds of times on the chimpanzee and human lineages. Indeed, if it was common enough, hitchhiking can drive its own dynamic, since it tends to fix lots of slightly deleterious variations that later have to be repaired or accommodated.

    An interesting possibility: Maybe the extreme evolution of the Y chromosome in the emerging human and chimpanzee lineages explains the unusual similarity of their X chromosomes.

    I'm thinking back to the story about chumans and the divergence of chimpanzee and human lineages ("The dawn chumans"). Patterson and colleagues (2006) suggested that the two lineages had undergone some kind of hybridization event long after they began to diverge. This surprising hypothesis was meant to explain why the X chromosome shows a substantially lower level of genetic difference between humans and chimpanzees, compared to the average autosomal locus. I don't think that a late hybridization is necessary to account for X chromosome similarity. A large ancestral effective population size implies a wide variance in coalescence times in the ancestral population; the average on the X will be lower than the autosomes, and if there was any hitchhiking the X would be lower still.

    But...that X chromosome similarity might have a different explanation. A fraction of the human Y chromosome continues to recombine with the X. Imagine an initially rapid divergence of Y chromosomes within the chuman population. For a while, there might have been a strong selection pressure on the ancestral X to equip it for the structural diversity of the Y. Possibly an inverse relation would have emerged: the as the Y becomes variable (possibly in partially isolated subpopulations), the X adapts to that variation until reproductive isolation finally occurs.

    Could this have been the proximate cause of human-chimpanzee reproductive isolation? The sex chromosomes are often implicated in speciation through Haldane's rule. It's a bit of speculation, but not too far from some discussion within the paper, particularly the relation between Y chromosome variations and infertility.

    References:

    Hughes JF and 16 others. 2010. Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature (early online) doi:10.1038/nature08700

  • Reviewing the clock, and phylogenomics

    Tue, 2009-11-17 18:27 -- John Hawks

    After reading yesterday's penguin post, one of my readers thought I'd given up the ghost on the molecular clock.

    But notice the bottom line of that message: those ancient penguins didn't tell us any thing new about the rate of mitochondrial changes over 10s of thousands of years. The rate, over at that time period, is pretty much what you would expect from comparing humans, or comparing Neandertals. Considering that the generation-to-generation rate of mutations of the mitochondrial DNA is maybe an order of magnitude higher, I'd say that consistency is pretty impressive.

    Much more important, when it comes to comparing humans and chimps, we've come billions of base pairs beyond the mitochondrial DNA alone. We have drafts of the complete the numbers of humans and chimpanzees, macaques, and working drafts for gorillas, orangutans, and a handful of other primates. We have a better ability than ever to reconstruct the phylogenetic relationships of those species, the times that they diverged from each other, and even something about the number of individuals and structure of their ancient populations.

    For the past five years, almost every study including more than a single gene has agreed on one central fact: humans and chimpanzees last exchanged genes less than 6 million years ago. Most of them place the date much younger -- an average of less than four and a half-million years ago.

    Still, these kinds of comparisons can be quite complicated, and many -- maybe most -- of my paleoanthropology colleagues would prefer to remain ignorant of the details.

    I can kind of sympathize. If somebody is willing to say it could be 6 million years, well, that doesn't sound so different from seven. And Sahelanthropus is only seven. What's the problem anyway?

    I've got to say, though, that attitude is a fundamental lack of seriousness about the data. It's like if I said about Lucy, "Hey it's just a pelvis, right, what's the big deal?"

    Well, it's the evidence, that's what. A. afarensis is a large and substantial sample with dozens of shared homologous features with humans and other hominins. If the genetics told us that humans and chimpanzees diverged less than 2 million years ago that would be a substantial conflict. Either that estimate would be wrong, or much of what we thought we knew about the pattern of hominin evolution would be.

    We are in fact at that point in genetics. If the human-chimpanzee divergence really were much older than 5 million years ago, then much of what we think we know about population genetics of primates must be wrong.

    I understand that many of my readers might welcome that suggestion. I, on the other hand, am having a hard time figuring out just how I'm supposed to make the divergence date much older than the current best estimates. In the 1990's, it was fashionable to just say that the clock was wrong, because our estimates of mutation rate were wrong, and leave it at that. People even did silly things like provide "confidence intervals" based on different assumptions about the human-orangutan divergence. If it was 12 million years ago, you'd get one (low) answer; if it were 16 million years ago, you'd get another (high) answer. Report the low and high ends, there's your "confidence" interval. Human-chimpanzee divergence: 4 to 6 million years.

    It was a joke, but that's where things stood.

    Nowadays, we know an awful lot more about the relations of these populations. I'm going to point everybody to a recent review paper -- it was released the same week as Ardipithecus was -- by Adam Siepel, in Genome Research. It's a very good review of the recent literature on the human-chimpanzee divergence, and by implication the human-gorilla and other primate divergences. It is not about building a phylogenetic tree -- it's about how we use sequence data from many genes to put together a phylogenomic tree, one that involves the divergences of populations and also their inbreeding and selection characteristics.

    The time we estimate for a population divergence depends on the size of the ancestral population, as well as the pattern of selection within it. These factors also affect the sorting of gene variants of the ancestors into the descendant populations. As Siepel points out, these effects have led to two different methods of examining the demography and divergence times of ancient species:

    Two simple, but ingenious, approaches were proposed early on, both of which exploited the fact that, with sparse sampling across the genome, the loci under study were likely to be unlinked, and their genealogies could be assumed to be statistically independent. The first method, by Takahata (1986), derived information about ancestral population sizes from the variance in the estimated divergence times for pairs of orthologous sequences. The second, by Wu (1991) (see also Hudson 1983a; Nei 1987), made use of the variance in tree topologies estimated from three or more orthologous sequences. Takahata's method essentially estimated [population divergence time] and [effective size] from the variance in estimates of [genetic divergence time] at multiple loci (in the notation above), while Wu's method estimated [effective size] from the relative frequency of topological inconsistency in reconstructed gene trees.

    Those topological inconsistencies began to show up during the 80's and 90's, when people would publish sequences that favored human-gorilla or chimpanzee-gorilla clades. These were genes in which humans really were more closely related to gorillas, because the human-chimpanzee (chuman) ancestral population was large enough to retain two divergent alleles for the two million or so years that chumans existed.

    Siepel goes on to review the literature using variants of these two approaches during the last seven or eight years. The Nature chimp-human hybridization paper by Patterson and colleagues (2006, which I reviewed here) forms a central part in the discussion, as people have reacted to that paper and the major issue it raised.

    Reading the review, one cannot help but notice the low age estimates that keep coming up again and again. Most of them are under 4.5 million years. Patterson and colleagues had one of the highest recent estimates, putting the speciation at less than 5.4 million years. That's because they assume a smaller effective size in the ancestral lineages -- pushing the date higher. The more that demography fiddles with the assortment of ancestral genes before a population divergence, the younger the resulting estimate of divergence date will be.

    To make the date older, you need to assume there was no demography -- an extreme chuman bottleneck. But that would be inconsistent with the evidence of incomplete lineage sorting -- those gorilla genes that we share. And it would take some magical rate discontinuities among genetic loci to get them the amount of interlocus variability that they have.

    The review mentions some recent work suggesting that background selection may have reduced the site diversity in the ancestral species -- work to explain why the human X chromosome is even more similar to chimpanzees than the autosomes. Taken to an extreme, background selection or massive hitchhiking could raise the divergence estimate a bit, but it doesn't overcome the issue of incomplete lineage sorting, either.

    You could push the human-orangutan divergence higher, or the human-macaque divergence, both of which help to calibrate the mutation rate. But that's not going to make 4 million years into 8 million, not unless orangutans diverged from us in the Oligocene.

    You could propose a massive slowdown in mutations in the chuman lineage. But why? How? Like I said earlier, you'd have to change something pretty fundamental about our understanding of primate genetics.

    No, it's very hard to see how these dates are going to get much older. What I'm saying is that you can't just wave them away; these are serious estimates and I don't see any simple way to get a better one.

    Now, the question is, do the geneticists insufficiently appreciate the hominins? Do they just not care about the havoc this wreaks in paleoanthropology-land?

    In fact, Siepel addresses this issue. The review mentions that Patterson and colleagues (2006) offered their hybridization idea in part to explain the early "hominin", Sahelanthropus. With the revelation of Ardipithecus' postcranial anatomy, I don't think we need to resort to chuman hybrids.

    I think it's more parsimonious to imagine a widespread population of chumans, a large-bodied, basically Ardipithecus-like primate, structured into regional populations in much the way that today's chimpanzees and gorillas are. This population was numerous and stable, and it gave rise over time to many more arboreally adapted branches -- first the gorillas and later the chimpanzees. The remainders, as it were, became the hominins.

    There are various hangups with this scenario that make me hesitate. I do take Orrorin seriously, for example -- it is hard to accommodate a 6-million-year old hominin under the large-population recent-divergence hypothesis.

    And on the genetic side, the substitution rate in the nuclear genome is affected by positive selection, background selection, duplications and unequal crossing over. It's quite possible that some odd demographic scenario might reduce the genetic divergence date yet further, or increase it to some extent.

    What's encouraging is that today's dense genetic data and fast modeling give us the chance to test these scenarios. We can model selection and demography directly and comparing results to observed genetic patterns.

    OK, it's bedtime. More on this later...

    References:

    Siepel A. 2009. Phylogenomics of primates and their ancestral populations. Genome Res 19:1929-1941. doi:10.1101/gr.084228.108

Pages

Subscribe to genetic divergence

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.