john hawks weblog

paleoanthropology, genetics and evolution

population genetics

  • Inbreeding impression

    Wed, 2012-09-26 00:13 -- John Hawks

    I ran across an io9 article from 2011, "Why inbreeding really isn’t as bad as you think it is", which is topical for some of the genetics I'll be teaching over the next couple of weeks in my introductory course. It has a lot of fun details about historical inbreeding, including the case of Charles II of Spain:

    From 1550 onward, not a single outsider married into the Spanish royal line. The result of all this was Charles II, quite possibly the most inbred person in history.

    Charles's ancestry was so ridiculously intertwined that he actually had a higher relationship coefficient than the child of two siblings, and 95.3% of his genes could be traced back to just five ancestors. While the previous kings had escaped their already considerable inbreeding relatively unscathed, Charles suffered from massive mental, physical, and emotional disabilities, earning him the nickname El Hechizado, "The Hexed."

    The article does a very good job of describing the effects of bottlenecks, concepts like "pedigree collapse" and the consequences of the exponential growth of genealogical ancestors going back in the past.

  • Tomoko Ohta profile

    Thu, 2012-08-23 01:16 -- John Hawks

    Current Biology has published an interview of the esteemed Japanese population geneticist Tomoko Ohta [1].

    But, you chose not to stay in the US? I was a Fulbright student, and four years was the maximum time students were allowed to stay in the US. So, in 1966, after finishing my PhD, I went back to Japan. I asked Dr Motoo Kimura at the National Institute of Genetics, Mishima, if I could do research in his laboratory, simply because he was the only theoretical population geneticist in Japan at that time. At first, he was skeptical to let me do research in his field, but he finally accepted me as a postdoctoral fellow. Kimura was a typical Japanese man of his time, who regarded women's scientific activities as insignificant. After two years or so, I had convinced him that I should continue to do research.

    I think that if we plotted biologists on two axes, (1) Scientific Value, and (2) Public Awareness of Their Work, Kimura and Ohta would be outliers with high value and unusually low awareness.

    (via Sandwalk)


    References

    1. Ohta T. Tomoko Ohta. Current Biology. 2012;22(16):R618 - R619.
  • Effective size through genealogy

    Thu, 2011-11-24 23:48 -- John Hawks

    Sandwalk: "What William the Conqueror's Companions Teach Us about Effective Population Size".

    Let's assume that there are 20 well-documented companions. Only one of these (William Mallet) has possibly passed on his Y chromosome to the present time and even that male line of descent is disputed. This is fully consistent with our understanding of genetics when you consider that most male lines are likely to die out in a few generations. Those that survive ten generations or so are unlikely to become extinct since there will likely be several male lines at that time.

    Only 10 of the companions have descendants who are alive today. This could be due to the fact that genealogists don't have perfect records for all the companions and their families but it's also quite in line with expectations.

    A nice illustration, with a link to my own review article (available free here), "From genes to numbers: Effective population sizes in human evolution".

  • Overdominance and rapid adaptation

    Mon, 2011-08-22 13:19 -- John Hawks
    Publication information: 

    This is a pre-publication manuscript. Please contact the authors for information about how to cite this article.

    Work status: 

    This paper is substantially complete but is being transitioned from a LaTeX manuscript by pieces, and so is not yet completely on the website.

    Abstract: 

    We suggest that in diploid organisms, most adaptive mutations of large fitness effect are overdominant. R. A. Fisher's geometric model of adaptation, which has been fruitfully applied to investigate the size and dynamics of adaptive mutations, has previously been limited to the haploid case. By extending it to the diploid case for mutations of additive effect on phenotype, we show that only a small fraction of mutations that bring the heterozygote phenotype closer to the optimum will also bring the homozygote phenotype closer to the optimum. More generally the probability that a mutation is adaptive in heterozygotes is greater than the probability that it will be equally or more adaptive in homozygotes. Together, these considerations imply that most mutations that are adaptive in heterozygotes will have lower fitness in homozygotes and will therefore reach an equilibrium frequency, not fixation. We show that this theoretical expectation is consistent with the evidence of recent adaptive mutations in human populations, where a significant deficit of fixed or near-fixed selective sweeps have been identified compared to the number of apparent new adaptive gene variants. Partial selective sweeps in human evolution should be much more common than complete selective sweeps.

    Introduction

    The last decade has seen an explosion of interest in ongoing and recent natural selection in human populations. The HapMap project and other surveys of SNPs in human populations revealed many regions in the genome where one SNP allele is surrounded by large regions of linkage disequilibrium while the other is on a more heterogeneous background [1][2][3][4]. The inference has been that the former were undergoing or had undergone a selective sweep, and had experienced an increase in frequency too great to be explained by genetic drift. There have been attempts to construct bottlenecked population histories that would explain these patterns, but the elevated concentration of this pattern in genic regions apparently makes such explanations unlikely [1][2].

    A puzzling finding of this line of research is that the sweeping SNPs are disproportionately at intermediate frequencies rather than near fixation. Part of the explanation is that the technology for discovering these regions usually relies on contrasting the linkage disequilibrium around the putative sweeping SNP with that around its allele, so both kinds of chromosomes must be present. There is also a ``shortage'' of fixed genomic regions that show high disequilibrium as would be produced by complete sweeps.

    Why are there so many incomplete sweeps? Some have suggested that most phenotypic adaptation occurs in the form of ``soft sweeps'', in which evolution of a phenotype occurs because of slight changes in frequency of many standing genetic variants [5]. But soft sweeps of standing genetic variants do not explain the pattern of long-range LD haplotypes of intermediate frequency. These appear to be young haplotypes that have risen to intermediate frequencies quite rapidly, not old haplotypes that have changed in frequency marginally. One good possibility is what we have called the ``stooge effect'' - after the Three Stooges all trying to get through a doorway at the same time. Selection following an environmental change will favor any allele (either standing variation or new mutation)that provides increased fitness in the new environment, perhaps leading to change at many loci that decreases the fitness advantage of any one advantageous mutation. For example, the fitness advantage of sickling hemoglobin must have been much greater long ago when there were no high frequency genetic responses to malaria. The competition among adaptive alleles may slow down and perhaps stop the sweep of any one new mutant.

    We know that some sweeping alleles, for example some of the hemoglobinopathies, are loss-of-function mutations, broken versions of the ancestral version. They can have strong negative effects in homozygotes, who have no working copy of the gene. Such overdominant mutations are easy to recognize and understand.

    Our analysis (based on Fisher's geometric model) suggests that overdominance is common in newly originated beneficial alleles of large effect, even when the mutation changes gene function, rather than reducing or eliminating it. This is a natural consequence of a high degree of pleiotropy. Such overdominant alleles will never fix. While homozygote fitness will be lower than heterozygote fitness in these cases, usually it will not be extremely low - will not cause death or obvious disease, and so will not be easily observed.

    Fisher's geometric model

    Fisher considered adaptation as an aspect of an organism's phenotype.

    An organism is regarded as adapted to a particular situation, or to the totality of situations which constitute its environment, only in so far as we can imagine an assemblage of slightly different situations, or environments, to which the animal would on the whole be less well adapted; an equally only in so far as we can imagine an assemblage of slightly different organic forms, which would be less well adapted to that environment [6].

    Fisher defined the organism's fitness as its capacity for intrinsic population growth, the 'Malthusian parameter'. He imagined an optimal phenotype from which no change in the phenotype, however slight, could increase the organism's fitness—in some particular environment, obviously. A well-adapted organism would be one whose phenotype was very close to that optimum. In his discussion, he did not distinguish between the case where the optimal form is the phenotype of an individual, or the average phenotype of a population. Fisher used this model as part of his argument that most evolutionary changes are small, suggesting that most individuals would be very near the average phenotype of the population most of the time.

    Fitness can thus be considered as a function of position in a multidimensional phenotype space. Fisher assumed that the distribution of fitness in that phenotype space has a single optimum (point O). Coordinates are normalized so that fitness is a function of the distance from O and declines monotonically as that distance increases.

    Consider a non-optimal phenotype, which we model as a point A in this phenotype space, at a distance d/2 from O. Fisher considered this distance as the radius of a multidimensional sphere centered on O. All the points on that sphere have the same suboptimal fitness; you might say that they are all equally poorly adapted.

    Now imagine a mutation that shifts the phenotype a distance r in some random direction, moving it to a new position B. If B is inside the hypersphere, it will be closer to O than is the initial point A, and so the mutation will be favored by selection. If B is on the hypersphere, the mutation will have equal fitness to the wild type, if B is outside the hypersphere, the mutation is deleterious relative to the wild type. Hence, the probability that the mutation is adaptive is the same as the probability that B is inside the hypersphere.

    We can derive this probability by considering the boundary condition in which the new position B lies exactly on the hypersphere: that is, when the distance |AO| = |BO|. The angle θ' between AO and AB in that case determines the maximum angle of a change that improves fitness: θ' = arccos(r/d).

    [Figure 1 here]

    If the angle between AB and AO is less than θ, B is closer to the optimum than A and fitness increases. If the angle is greater, B is farther from the optimum than A and fitness decreases.

    In order to determine the probability that a random change of size r will increase fitness, we need to find what fraction of the surface of a hypersphere of dimension n lies within the cap with half-angle θ'. Hartl and Taubes [7] gave an exact expression for this probability:

    \begin{equation}
    \frac{\int_0^{\theta'} \sin ^{n-2} \theta d\theta}{\int_0^{\pi}
    \sin^{n-2} \theta d\theta}
    \label{eq:surface-fraction-adaptive}
    \end{equation}

    Fisher argued that n, the effective dimensionality of the phenotype space, was likely to be large in real cases because many different traits influence biological success. He proceeded to develop a large-n approximation for this probability integral that gives considerable insight. Back in 1930, it must also have been considerably easier to calculate than the exact integral.

    \begin{equation}
    \frac{1}{\sqrt{2 \pi}} \int_{x}^{\infty} e^{-t^2/2}dt, x=r (n/d)^{\frac{1}{2}}
    \end{equation}

    One can see from this expression that the probability of a favorable change is close to 1/2 when r is small, while decreasing rapidly as the size of the change becomes larger than d/√n - which one might call the "standard magnitude" of change. Fisher concluded that mutations with small favorable effects are the main players in adaptive evolution.

    But there are other factors that Fisher did not consider in his analysis. Kimura [8] considered an additional aspect of selective dynamics that influences the effect size of adaptive phenotypic changes: the fact that new mutations that confer small increases in fitness are likely to be lost by chance, almost as likely as a neutral mutation. The probability of success of a beneficial mutation increases linearly with its fitness benefit [9]. Thus, although larger phenotypic changes are less likely to be adaptive, changes of large effect that are adaptive are more likely to persist in the population. Kimura showed that a population is most likely to undergo adaptive changes that are intermediate in effect-large enough to survive genetic drift, but small enough that they remain relatively likely to move the phenotype closer to the optimum.

    Orr [10] considered not only the first adaptive change, but the entire series of adaptive changes as a population approaches a phenotypic optimum. He showed that Kimura's relation held for the first adaptive change, bringing the population a considerable distance toward the optimum. Subsequent changes are likely to be smaller. As the phenotype nears the optimum, large phenotypic changes are less and less likely to approach it more closely, so the entire sequence is dominated by small changes. Considering the process as a whole, Orr showed that the effect sizes of adaptive changes will be exponentially distributed. The exponential distribution is also the expectation drawn from extreme value theory of the effect sizes of beneficial mutations.

    Diploid genotypes and Fisher's model

    Fisher's geometric analogy holds up well for a haploid, because the phenotypic change induced by a mutation may be thought of as a vector, just as in Fisher's model. The difference between individuals and populations need not be strictly defined in the model, because the effect of a mutation on the fitness of an individual will be the same as on the fitness of a population. Given the amount of effort put into extending Fisher's phenotype model to mutational effects - even in diploid organisms like Drosophila — it seems remarkable that nobody has observed that the analogy does not hold for diploid genotypes. Diploids are difficult to treat with this geometric model, because each genotype may have a distinct phenotypic effect. Unlike the haploid case, we cannot assume that the effect of a mutation in an individual is the same as the effect of a substitution in the population. For an autosomal mutation to proceed to fixation in the population — thereby becoming a substitution — a mutant homozygote must have fitness equal to or greater than that of a heterozygote. Even if a heterozygote has a phenotype that is intermediate between original-allele and mutant homozygotes, its fitness may not be.

    For simplicity, we assume that phenotypic change is a linear function of gene dosage. In that case, two copies of the mutant allele result in exactly twice the change in phenotype caused by one copy. We designate the original phenotype as point A and the phenotype resulting from one copy of the mutation as point B, as before. We will designate the phenotype in mutant homozygotes as point C. This means that the distance AC is exactly double AB (that is, 2r in Fisher's model). Given this constraint, we can find the angle φ at which the heterozygote and homozygotes have equal fitness, that is, where |BO| = |CO|. Since the displacement is larger, φ, the critical angle for homozygotes, is smaller than θ, the critical angle for heterozygotes.

    [Figure 2 here]

    We can use this expression to calculate the probability that individuals who are homozygotes for a beneficial mutation of effect size r are fitter than heterozygotes. If r is small, 50% of mutations increase heterozygote fitness, almost all of which confer even higher fitness in homozygotes. If r is large, most mutations will not increase heterozygote fitness, and even those that do are unlikely to increase fitness further in homozygotes.

    \begin{table}[h]\centering
    \begin{tabular}{|c|ccccc|}
    %\multicolumn{5} {c} {\bf Overall Heading} \\
    \hline
    & $|r|$ & $0.01$ & $0.1$ & $0.25$ & $0.5$ \\
    \hline
    N & & & & & \\
    \hline
    16 & & 0.4924 & 0.4244 & 0.3163 & 0.1666 \\
    32 & & 0.4890 & 0.3911 & 0.2441 & 0.0803 \\
    64 & & 0.4842 & 0.3462 & 0.1606 & 0.0223 \\
    128 & & 0.4776 & 0.2868 & 0.7918 & 0.0021 \\
    \hline
    \end{tabular}
    \caption{Probability that fitness tncreases in heterozygotes}
    \end{table}

    \begin{table}[h]
    \centering
    \begin{tabular}{|c|ccccc|r|}
    \hline
    & $|r|$ & $0.01$ & $0.1$ & $0.25$ & $0.5$ \\
    \hline
    N & & & & & \\
    % \hline
    16 & & 0.9846 & 0.8277 & 0.5266 & 0.1230 \\
    32 & & 0.9775 & 0.7411 & 0.3289 & 0.0190 \\
    64 & & 0.9675 & 0.6181 & 0.1388 & 0.0005 \\
    128& & 0.9532 & 0.4524 & 0.0270 & 0.0000004 \\
    \hline
    \end{tabular}
    \caption{Probability that homozygotes are fitter than
    heterozygotes} \label{tab:second}
    \end{table}

    In other words, there are three tests that an adaptive mutation must pass in order to reach fixation. First, it must increase fitness in heterozygotes, which, as Fisher showed [6], is unlikely if its phenotypic effect is large. Second, it must avoid stochastic loss when rare. Haldane showed that a favorable mutation's probability of avoiding stochastic loss is about 2s, when s is the selective advantage. This means that a new allele will probably be lost if its effect size is small, because a small phenotypic effect leads to a small selective advantage. Third, it must confer higher fitness in homozygotes than in heterozygotes, which is unlikely if it has a large phenotypic effect. The first two tests are considered in most treatments of the genetics of natural selection, but the third has seldom been discussed.

    Our assumption of linearity is optimistic. If the phenotypic change induced in homozygotes is nonlinear and in a significantly different direction in phenotype space than the change in heterozygotes, fitness will almost certanly be lower than in heterozygotes, since favorable changes are possible only in a narrow range of angles in a high-dimensional space.

    True loss-of-function mutations, in which the gene's function is eliminated, rather than merely reduced, make up an important class of alleles with nonlinear effects. These changes, which eliminate rather than merely reducing gene function, are usually nonsense mutations, frameshifts, radical amino acid changes, etc. Many such alleles have moderate phenotypic effects in single dose — effects that can increase fitness — while causing drastic fitness decreases in homozygotes, which have no working copy of the gene. Many are lethal. Such alleles cause some of the most common human genetic diseases, such as cystic fibrosis (the ΔF508 mutation) and the β0 thalassemias.

    Generally speaking, this kind of direction-changing nonlinearity should become more and more likely as effect size increases. If it is significant, the phenotypic change in homozygotes will be essentially unrelated to the beneficial change in heterozygotes, it will not even be in the same general direction in phenotype space, and so such mutations are almost always deleterious in homozygotes.

    Lower fitness in homozygotes than in heterozygotes doesn't necessarily imply that homozygote fitness is extremely low or obviously depressed. For example, consider a case in which one copy of a new allele increased fitness in past environments by 5%, while two copies increased fitness by 2%. The new allele would never go to fixation: it would eventually approach an equilibrium frequency of about 71%. But, quite possibly, none of these genotypes (0,1, or 2 copies) would be obviously ill or seek medical attention. This is particularly the case in modern environments, which are less harsh in many ways than those our ancestors experienced. For example, alleles that conferred protection against famine or smallpox might have reached high frequencies in modern populations, but their advantages would be unnoticeable and effectively unmeasurable in modern populations. We would have next to no chance of determining small differences in the fitness effects of that allele in heterozygotes and homozygotes.

    The X chromosome

    The fate of an adaptive variant that appears on the X chromosome is quite different in eutherian mammals — humans, for example. Males have only a single copy of the X, so their gene dosage does not vary. Females have two copies, but only one copy of the X chromosome is active in each cell, while the other copy is condensed and inactive [11]. Upwards of 85% of all genes on the condensed chromosome are inactive, except in the pseudoautosomal regions. Since one of the X chromosomes is randomly inactivated in each cell, female heterozygotes will have the wild-type allele in some cells and the adaptive variant in others, while female homozygotes will have the same effective gene dosage as males with their one copy.

    So, if a variant on the X chromosome increases fitness in males, it is likely to have the same effect in females with two copies. The effective dose of the new allele will be half as large in heterozygous females, but if the phenotypic effects are linear with gene dosage, heterozygote fitness should still be higher than wild-type In Fisher's model, a fitness increase for a given displacement in a particular direction implies a fitness increase for any smaller displacement in that same direction.

    Under these assumptions, most X-chromosome gene variants that increase fitness in males should go to fixation, as long as they escape stochastic loss. Interestingly, the tight regulation of gene dosage on the X chromosome implies that changes in dosage do indeed influence fitness — why else would X-inactivation have evolved?

    Such X-linked beneficial recessive alleles sweep more slowly than an autosomal allele with the same selective advantage, since only one third (those in males) manifest the advantage when the allele is rare. However, the elimination of the requirement that the new variant confer higher fitness in homozygotes than heterozygotes is more important, and should result in a disproportionate number of completed sweeps on the X chromosome. As it happens, that is exactly what we see in humans.

    SNPs with high allele frequency differences are relatively rare in humans, but they are particularly common on the X chromosome [12][13] Out of the 3.2 million SNPs in the HapMap data set, only 479 have FST greater than or equal to 0.90. Of those 479 high-Fst SNPs, 379 are on the X chromosome . The majority of those highly differentiated SNPs cluster into five distinct regions. Those five regions apparently correspond to six selective sweeps. Two of these sweeps happened near the centromere (in different populations). Of the six sweeps, five are in populations outside sub-Saharan Africa, with the sweep reaching fixation in East Asia, existing at lower frequencies in Europe and West Asia, while being rare in Sub-Saharan Africa. One of the two sweeps near the centromere has the opposite pattern — essentially fixed in sub-Saharan Africa and rare outside. That second pattern is unusual: there are few high-Fst SNPs for which the derived alleles are near fixation in sub-Saharan Africa. The best-known example is the mutation responsible for the Duffy Fy*O blood type.

    Conclusion

    Our conclusion is that most mutations with strong effects that increase fitness in heterozygotes confer lower fitness in homozygotes — that is, are overdominant.

    This effect may not matter much in a well-adapted, stable species. Over many generations, overdominant alleles that partially solve some adaptive problem should eventually be replaced by alleles that confer high fitness in both heterozygotes and homozygotes and so go to fixation. This could occur through the evolution of modifier loci and by rare favorable mutations that are essentially additive. In steady-state, there should be relatively few common overdominant alleles, except for cases of frequency-dependent selection.

    It may, however, play an important role in species that have experienced strong selection, ones whose environment has changed drastically. As it happens, this is the case for a number of species of interest: we would put humans and most domesticated species in this category.


    References

    1. Hawks J, Wang ET, Cochran G, Harpending HC, Moyzis RK. Recent acceleration of human adaptive evolution. Proceedings of the National Academy of Sciences, U. S. A. [Internet]. 2007;104:20753–20758. Available from: http://dx.doi.org/10.1073/pnas.0707650104
    2. Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Research [Internet]. 2009;19:826–837. Available from: http://dx.doi.org/10.1101/gr.087577.108
    3. Voight BF, Kudaravalli S, Wen X, Pritchard JK. A Map of Recent Positive Selection in the Human Genome. PLoS Biology [Internet]. 2006;4. Available from: http://dx.doi.org/10.1371/journal.pbio.0040072
    4. Wang ET, Kodama G, Baldi P, Moyzis RK. Global Landscape of Recent Inferred Darwinian Selection for Homo sapiens. Proceedings of the National Academy of Sciences, U. S. A. [Internet]. 2006;103:135–140. Available from: http://dx.doi.org/10.1073/pnas.0509691102
    5. Pritchard JK, Pickrell JK, Coop G. The Genetics of Human Adaptation: Hard Sweeps, Soft Sweeps, and Polygenic Adaptation. Current Biology [Internet]. 2010;20:R208–R215. Available from: http://dx.doi.org/10.1016/j.cub.2009.11.055
    6. Fisher RA. The Genetical Theory of Natural Selection. Oxford: Clarendon Press; 1930.
    7. Hartl D, Taubes C. Towards a theory of evolutionary adaptation. Genetica [Internet]. 1998;102-103:525–533. Available from: http://dx.doi.org/10.1023/A:1017071901530
    8. Kimura M. Some Problems of Stochastic Processes in Genetics. Annals of Mathematical Statistics. 1957;28:882–901.
    9. Haldane JBS. A Mathematical Theory of Natural and Artificial Selection, Part V: Selection and Mutation. Proceedings of the Cambridge Philosophical Society. 1927;23:838–844.
    10. Orr AH. The Distribution of Fitness Effects Among Beneficial Mutations. Genetics. 2003;163:1519–1526.
    11. Carrel L, Willard HF. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature. 2005;434(7031):400 - 404.
    12. Lambert CA, Connelly CF, Madeoy J, Qiu R, Olson MV, Akey JM. Highly Punctuated Patterns of Population Structure on the X Chromosome and Implications for African Evolutionary History. The American Journal of Human Genetics [Internet]. 2010;86:34–44. Available from: http://dx.doi.org/10.1016/j.ajhg.2009.12.002
    13. Casto AM, Li JZ, Absher D, Myers R, Ramachandran S, Feldman MW. Characterization of X-linked SNP genotypic variation in globally distributed human populations. Genome biology [Internet]. 2010;11:R10+. Available from: http://dx.doi.org/10.1186/gb-2010-11-1-r10
  • Founder effect

    Sat, 2011-08-06 13:34 -- John Hawks
    Synopsis: 
    The founder effect is a special case of genetic drift that can happen when a small number of individuals found a new population
    The founder effect is caused by genetic drift in a small number of initial founders of a new population.

    One of the most important manifestations of genetic drift is in the founding of new populations by a small number of colonists. For example, the Afrikaner population of the country of South Africa today descends from Dutch colonists who arrived during the seventeenth century. Some of the earliest colonists to arrive had a large genetic contribution to the later Afrikaner population, because they had a chance to have lots of offspring who intermarried with later arrivals. The first Dutch colonists landed in 1652, and one of these colonists was a man who carried an allele causing Huntington's disease, a rare genetic disorder of the nervous system. Huntington's is a dominant genetic disorder, affecting all individuals who carry the allele, but it exerts most of its effect late in life --- after people generally reproduce. Although this harmful allele was carried by only one individual, it was a relatively large proportion of the new founder population --- much higher in frequency than it had been in Holland. After strong population growth, today's Afrikaners have a high frequency of the Huntington's allele, mainly from this single founder (Ridley 2002). This phenomenon of genetic drift is often called the \term{founder effect}.

    \subsection{Population structure and genetic drift}

    Genetic drift is stronger when there is more variability in reproduction.

    A simple reason for variability in reproduction is the different reproductive efforts of males and females. Female mammals face a high cost of reproduction. Mothers provide space and nutrients to their developing young while they still in the womb, and mothers provide high-energy milk and protection to their young after they are born. Although female fish and frogs may lay hundreds --- or even thousands --- of eggs, female mammals are limited to many fewer offspring over the course of their lifetimes. Males, on the other hand, do not face the same reproductive costs. If a male can mate with many females, he can potentially have many times the number of offspring of any single female. But males face a different cost: if they want to mate at all, they must first face competition from other males. In many species, a lucky few males may mate with many females, while most males do not mate at all. Thus, males are often much more variable in their reproductive success than females. Each generation of offspring in such a population includes the genes of many different females but only a few males. Only a few genes may be responsible for the and all the genes of these few males are boosted by genetic drift.

    Human history appears to have included some cases where single male lineages had exceptionally high mating success. Geneticists can trace male reproduction through the Y chromosome, which is passed from only from father to son. Because of this unique pattern of inheritance, the Y chromosome marks \term{patrilines}, lineages of males. In many human societies, social status or power may also be passed along patrilines, as kings and chiefs pass power to their sons. This cultural pattern of inheritance generally lasts only for a few generations, as some member of the male lineage ultimately fails to have a son as an heir, or the patriline simply loses power. But the history of some cultures gave a few patrilines exceptional mating opportunities, as kings and other high-ranking men sometimes kept harems of dozens or more women for their own exclusive mating.

    \begin{figure}
    \includegraphics[width=\textwidth]{genghis.png}
    \caption[Frequency of ``Genghis Khan'' Y chromosome haplotype in Asia]{Frequency of the ``Genghis Khan'' Y chromosome haplotype in samples of Asian populations. The ``star cluster'' refers to the rapid expansion in numbers of the haplotype in different populations since its origin around 1000 years ago. Reprinted from Zerjal \emph{et al.} (2003).}
    \label{fig:genghis}
    \end{figure}

    Two Y chromosome haplotypes in Asia are shared by many millions of men, even though they emerged within the past thousand years. One of these, carried by 8 percent of men in Central and Northeast Asia, appears to have originated in Mongolia around a thousand years ago [1]. At this frequency, the haplotype would occur in as many as 16 million men, all descendants of a single man within the past 1000 years. The large current population implies that these men descend from an exceptionally widespread and productive patriline. During the past 1000 years in Asia, the best candidate for such a patriline is that of the Mongol emperor Genghis Khan, who lived from around A.D. 1162--1227. After conquering history's largest land empire, Genghis and his descendants installed their male relatives as rulers of much of Asia. These descendants themselves must often have had extraordinary reproductive opportunities, so that their Y chromosomes became more and more common in Asian populations. A second Y chromosome haplotype is carried by around 3 percent of people in China and Mongolia, and may derive from the Manchu dynasty, which dates to the year 1644 [2]. Together, these haplotypes illustrate the chance for some rare alleles to increase greatly in frequency due to genetic drift in human history.


    References

    1. Zerjal T, Xue Y, Bertorelle G, Wells RS, Bao W, Zhu S, Qamar R, Ayub Q, Mohyuddin A, Fu S, et al. The Genetic Legacy of the {Mongols}. American Journal of Human Genetics. 2003;72:717–721.
    2. Xue Y, Zerjal T, Bao W, Zhu S, Lim S-K, Shu Q, Xu J, Du R, Fu S, Li P, et al. Recent Spread of a Y-Chromosomal Lineage in {Northern China} and {Mongolia}. American Journal of Human Genetics. 2005;77:1112–1116.
    Study questions: 
    1. Can you think of other populations in human history that might have undergone a founder effect?
    2. What evidence can we use to test whether a founder effect can explain the high frequency of an allele?
  • Kin selection strikes back

    Thu, 2011-03-24 19:37 -- John Hawks

    Last year I noted the publication of a paper in Nature by Martin Nowak, Corina Tarnita and Edward O. Wilson, which claimed that kin selection is not a sufficient explanation for anything in biology. My post ("Inclusive fitness works") basically expressed my incredulity that Nature published the thing.

    This week, Nature published several comments on the paper, including one signed by 137 evolutionary biologists. I think the best source to read is Jerry Coyne's post about the commentary ("Big dust-up about kin selection"):

    If the Nowak et al. paper is so bad, why was it published? That’s obvious, and is an object lesson in the sociology of science. If Joe Schmo et al. from Buggerall State University had submitted such a misguided paper to Nature, it would have been rejected within an hour (yes, Nature sometimes does that with online submissions!). The only reason this paper was published is because it has two big-name authors, Nowak and Wilson, hailing from Mother Harvard. That, and the fact that such a contrarian paper, flying in the face of accepted evolutionary theory, was bound to cause controversy. Well, Nature got its controversy but lost its intellectual integrity, becoming something of a scientific National Enquirer. Oh, and boo to the Templeton Foundation, who funded the whole Nowak et al. mess and highlighted the paper on their website.

    "Scientific National Enquirer"...wow, harsh words, but then the Weekly World News was unavailable for comment...

    Oh, and Richard Dawkins shows up in Coyne's comments section, including the awesome response, "You are still wrecked among heathen dreams."

  • DNA relatives

    Mon, 2011-03-07 15:07 -- John Hawks

    Steve Mount works through the math of "relative finder" predictions from 23andMe (and by extension, other personal genome tests): "Genetic genealogy and the single segment".

    He does a nice short explanation of a point that is counter-intuitive to many people. You don't actually share much DNA with your relatives by descent, and because chromosomes are inherited in chunks, you quickly (within 6 generations) get to a point where you're not likely to have any DNA in common at all. Yet...you do have to have DNA from somebody, which means that if you do share DNA, you'll probably share a big chunk of it.

  • What is the human mutation rate?

    Thu, 2010-11-04 01:33 -- John Hawks

    Last spring I wrote about a study that used whole-genome comparisons between parents and offspring to estimate the rate of per-genome mutation in humans ("A low human mutation rate may throw everything out of whack").

    The study was by Jared Roach and colleagues [1], and as you might guess from my post title, the result was surprising. Previous work had suggested a human mutation rate around 2.5 x 10-8 per site per generation. The new study found less than half the expected number of mutations between these parents and offspring, an estimated rate of only 1.1 x 10-8 per site.

    If this lower rate of mutation were to hold up, it would affect much of our understanding of the chronology of human evolution. Fossils and archaeological sites would not change in date, but some hypotheses about their relationships would be challenged. For example, the higher rate of 2.5 x 10-8 per site suggests a chimpanzee-human population divergence around 4 million years ago. A new rate of 1.1 x 10-8 would not have a linear effect on this divergence time -- the genes don't have genealogical roots at the same instant as the population divergence. But the human-chimpanzee divergence time would be radically higher than in many recent estimates.

    The same might be true for other primate divergences, and for genealogical relations within human populations today. Basically any times that are estimated from genetic differences may be affected by our knowledge of the per-generation rate of mutations.

    What does this mean? Open below the fold to read more.

    What mutations are we counting?

    Human genomes differ from each other in many ways. There are single base-pair changes in sequences, insertions and deletions, repeat polymorphisms, and larger-scale rearrangements such as inversions and gene duplications. Recent work suggests that some of these larger-scale effects may be very important to phenotypic variation among people. So why should we be talking about only the first of these kinds of variation?

    Single nucleotide mutations have been the focus of most attention about mutation rates because they are relatively easy and quantify. In high-quality sequence data, a single nucleotide change is relatively unambiguous. Reversals are fairly unlikely, although at a small fraction of "hotspot" sites, recurrent mutations can make a big difference.

    It is somewhat misleading to refer to "a" rate of single nucleotide mutations, because some kinds of sites (e.g., CpG nucleotides) have had a much higher probability of mutations than others. This affects the apparent rate of mutations in noncoding versus synonymous sites [2]. Also, the germline in males has been estimated to be as much as 6 times more likely to suffer mutations than the germline in females (discussed by Crow [3]). The idea of a genome-wide rate assumes that when we bin all the single nucleotide mutations together, across large amounts of sequence, we do arrive at a relatively stable rate that can be applied to similarly broad extents of sequence data. Or at least that we can identify sequence regions with compatible rates (e.g., noncoding DNA or synonymous sites).

    At the moment, technical issues make it hard to find and quantify many other kinds of variation. The current generation of sequencing devices tend to generate short reads, which make it difficult to assess the presence of insertions or deletions of more than a few base pairs. Duplications and other rearrangements require special treatment such as higher coverage or longer sequence reads. By contrast, a single nucleotide mutation will typically align in the proper location and be quite evident in a read. In principle, we can just run down the genome and count them.

    Still, finding novel mutations is not without its problems. Recent sequencing projects have yielded a very high rate of false positives. The rate of false negatives is really not yet known. We have a good reason to suspect that the false negative rate will be high. In a low-coverage genome, many short segments of the genome will have very low read numbers, making it likely that the sequence reads represent only one of the two copies of the genome present at that location. Any novel mutations in that area have a 50-50 chance of being missed by our sequencing efforts. This false negative risk can be reduced by adding higher sequence coverage, but we're not yet at the point where we have a lot of genomes sequenced at the 10x or higher coverage that we would really want.

    So while sequencing a parent and offspring genome is the most direct way to estimate the per-generation mutation rate, it is not yet ideal.

    Where did the high rate come from?

    That means we need to look very closely at other sources of data, to see if they may provide some independent confirmation of a lower per-generation mutation rate. In the process, we should ask, why did the higher rate, around 2.5 x 10-8 per generation, become so widely accepted?

    The source cited by Roach and colleagues for the higher rate, 2.5 x 10-8 per site, is a paper by Michael Nachman and Susan Crowell [4]. Nachman and Crowell examined processed pseudogenes in humans and chimpanzees, under the assumption that mutations in these pseudogenes would be neutral to selection in the human and chimpanzee lineages.

    The average mutation rate was calculated from the average autosomal rate of evolution assuming a generation time of 20 years (Table 3). Recent estimates of the time since humans and chimpanzees diverged (T) include 4.5 mya (TAKAHATA and SATTA 1997 ), 5.5 mya (KUMAR and HEDGES 1998 ), and 6.0 mya (GOODMAN et al. 1998 ). ARNASON et al. 1998 estimated the Homo-Pan divergence at 10–13 mya; however, their estimate is based on a calibration using distant, nonprimate species and is at odds with most other recent estimates. Mutation rates were calculated for a range of different human-chimpanzee divergence times and for two different ancestral population sizes. Mutation rate estimates vary from 1.3 x 10-8 (assuming T = 6 mya and Ne = 105) to 2.7 x 10-8 (assuming T = 4.5 mya and Ne = 104). If the average generation time is assumed to be 25 years (e.g., EYRE-WALKER and KEIGHTLEY 1999 ), then mutation rates are estimated to be between 1.6 x 10-8 and 3.4 x 10-8.

    Wait a minute. There's no independent estimate of mutation rate here at all!

    What they did was to assume values for the human-chimpanzee divergence and ancestral (chuman) effective size, and then provide an estimate of mutation rate consistent with those assumptions. That's perfectly reasonable as a way of quantifying the genetic divergence that they observed. If our goal is to predict the per-generation mutation rate from interspecific divergence, that's more or less the kind of estimate that we want.

    But many, many other studies have instead used a citation to the Nachman and Crowell rate as a justification for their own estimates of the human-chimpanzee divergence time! That's not perfectly reasonable, in fact, it's perfectly circular. It's turtles all the way down!

    Worse, those citations tend to cite the midpoint of Nachman and Crowell's range of estimates (2.5 x 10-8) as if it were a true value measured with little error. Reading the original reference, you can plainly see that Nachman and Crowell reported estimates that varied over a factor of three, corresponding to a wide range of chuman population histories. From their discussion:

    Mutation rates estimated for a range of divergence times and ancestral population sizes fall between 1.3 x 10-8 and 2.7 x 10-8 assuming a generation time of 20 years (Table 3) or between 1.6 x 10-8 and 3.4 x 10-8 assuming a generation time of 25 years. We suggest that 2.5 x 10-8 is a reasonable estimate of the average mutation rate per nucleotide site (but caution that the actual rate may be between 1.3 x 10-8 and 3.4 x 10-8).

    That 2.5 x 10-8 is simply the midpoint of their range of estimates with the 25-year generation time.

    What would be more reasonable? For hominins and chimpanzees, we probably want to apply a shorter generation length, a larger ancestral effective size, and a higher time of divergence. All of these would have yielded a lower rate for the Nachman and Crowell data. But we don't want to just assume these values, we should try to test whether they are valid based on other data.

    Other mutation rates from phylogenetic comparisions

    Nachman and Crowell have not been alone in their ultimate reliance on fossil evidence as an assumption underlying the per-generation mutation rate. But several other studies came to a slower mutation rate. Mostly, these studies have assumed that the human-chimpanzee divergence happened significantly earlier than 5 million years ago. Necessarily, then, the human per-generation mutation rate would have to be lower, as long as the sequence divergence remained the same.

    These estimates are ultimately rooted in the date of one or more fossils, among which the generation time certainly varied. The resulting per-site mutation rates are often reported as per-year instead of per-generation. For example, Yi and colleagues [5] yielded a rate of 0.99 x 10-9 per year for the human-chimpanzee comparison, which would multiply to 1.98 x 10-8 per 20-year generation. They propose this as a maximal rate, assuming that Sahelanthropus at a minimum date of 6 million years ago is a hominin. With an older divergence date, they propose a correspondingly lower rate (e.g., 0.79 x 10-9 per year, not accounting for ancestral population polymorphism).

    Similarly, Steiper and Young [6] considered a long (1.9 Mb) alignment of sequence from 19 primate species. In their model to estimate relative rates on different branches of the primate phylogeny, they incorporated the assumption that Sahelanthropus is on the hominin clade. A divergence date of 6 million years gave rise to a human per-site mutation rate of 0.65 x 10-9 per year (1.3 x 10-8 per 20-year generation). A divergence date of 7 million years lowered the mutation rate to 0.57 x 10-9 per year.

    Low mutation rates do not always result from these studies. Several have arrived at either a high human mutation rate or a recent human-chimpanzee divergence time. Sometimes a recent human-chimpanzee divergence emerges simply by assuming the rate given by Nachman and Crowell. Yang [7] provides an example of this -- a paper that very thoroughly explores the relationship of divergence time and ancestral effective population size, but ultimately roots the estimates on a single value for mutation rate. This rate we have already seen was itself based on an assumption about divergence time.

    Kumar and colleagues [8] came to a much lower estimate for the human-chimpanzee divergence time, based on an Old World monkey-hominoid divergence at 23.8 million years ago. This estimate did not consider the effect of ancestral polymorphism on the mean genetic divergence time, and so should -- in the language of computer software -- be deprecated.

    I should reiterate that none of these estimates are suitable for testing the times of phylogenetic divergences, because they all assume that the date of some particular fossil (or set of fossils, by fitting a model) is the minimum divergence time for a clade.

    So much of the literature in this area is ultimately circular, I'm pulling out my sparse hair reading through it. By the time we get back to the mid-1990's, the sequence data are even sparser than my hair by today's standards -- only a few hundred base pairs, or a sampling of restriction sites. But the divergence time estimates have propagated forward from that time to today, recycled through the assumptions of papers in the intervening time. It's like the genetic equivalent of money laundering!

    Evidence from parent-offspring sequence differences

    There is another way besides phylogenetic comparison: Simply look at living people and see how many new mutations they have.

    But this is tricky because we are rarely in a position to know which mutations are new. Most variations that we see between two people have persisted in the population for hundreds of generations or more. It takes a special kind of mutation to make its newness evident.

    Up until the advent of large-scale sequencing, the most important source of information about the mutation rate came from the rates of spontaneous Mendelian diseases. When a person has a dominant genetic disorder not carried by either of his parents, you know that the mutation must be new. Disease rates have long been tracked as standard public health data.

    However, the per-genome or per-locus rate of Mendelian disorders can estimate the per-site rate of mutations only by adding well-resolved information about the target size of functional genes. For example, if we know the average gene length and the proportion of different amino acids in functional proteins we can make some estimate of the ratio of synonymous to nonsynonymous sites. But we would still lack information about the fraction of nonsynonymous mutations that cause deleterious effects on protein function. For this reason, it was possible for very early workers (e.g., Haldane) to come within the ballpark of per-locus mutation rates even before the genetic code was available. Yet such estimates are not strictly useful for understanding per-site rates of mutation.

    By 2000, widespread sequencing had begun to identify disease-causing mutations at the sequence level. When exons are known, it is possible to determine the "target size" -- the number of sites at which loss-of-function mutations may occur. These two values provide the numerator and denominator for an estimate of the per-site mutation rate.

    Kondrashov [9] applied this method to estimate the per-site mutation rate across 20 human genes. He surveyed the literature for genes where more than 100 patients had been sequenced completely for the causative locus, finding the causal mutations. Using this value and the disease incidence allowed an estimate of the per-site rate of mutation for different categories of transitions and transversions. There was some variation among loci, with an average rate of per-site mutation equal to 1.8 x 10-8 per generation.

    Kondrashov observed a few hotspots in these genes, with substitution or deletion rates as much as a hundred times the average site. He also observed that the per-gene rate of mutation varies according to the number of CpG sites. The rate of short deletions was on the order of 5 x 10-10, insertions were even less frequent.

    The rate estimate by Kondrashov is within the range considered by Nachman and Crowell, but only 3/4 of the value 2.4 x 10-8 widely cited as the long-term estimate. If this rate were applied to Nachman and Crowell's pseudogene data, it would predict a human-chimpanzee divergence time around 6 million years.

    This year, Lynch [10] performed a more extensive comparison using similar methods as Kondrashov. Including more genes, and considering a broader range of mutational effects (including missense as well as nonsense coding mutations), Lynch found an even lower estimate of mutation rate per generation -- only 1.28 x 10-8 per site.

    These estimates are not precisely the same as comparing parent-offspring pairs, but they are exceedingly powerful because the data on disease rates encompass very large populations of people.

    We should keep in mind the result of Subramanian and Kumar [2], which showed that exons have a higher effective rate of substitution than do noncoding regions. That result implies that the genome-wide rate of change should be lower than estimated by Lynch, because his estimate encompasses only coding mutations. Also, any effect of purifying selection on these mutations will tend to decrease the long-term rate of substitutions per site to a lower value than the rate of mutations. The rate estimated by Lynch should then be an overestimate of the substitution rate that would be applicable to hominoid phylogenetic relationships.

    A slower rate

    These estimates of the per-generation mutation rate are all low compared to the commonly-cited 2.5 x 10-8. They are not quite as low as the rate estimated by Roach and colleagues [1], but the Lynch estimate is very close: 1.28 x 10-8 compared to 1.1 x 10-8 per site.

    The lower estimate from Roach and colleagues is a direct comparison of parent and offspring. In my earlier discussion of that rate, I suggested that false negatives in the sequence comparisons might have lowered the apparent rate of mutations. I still think we can't rule out that possibility. But the rate is not alone, and so it is less surprising than it may have seemed.

    My post last week on the 1000 Genomes Project results ("Now for anthropological genomics") mentioned that the 1000 Genomes comparisions have arrived at essentially the same rate as Roach and colleagues. Comparison of one family trio led to a rate of 1.0 x 10-8 per site per generation; the other family trio gave rise to an estimate of 1.2 x 10-8 per site per generation. These bracket the estimate given by Roach and colleagues.

    My basic observation about the human-chimpanzee divergence time is still sound:

    If this mutation rate is accurate, then the average human-chimpanzee gene divergence has to be up around 11 million years ago. That can be accommodated with a 7-million-year-old species divergence only if we assume a very large ancestral population -- on the order of 50,000 or higher. Or, the ancestral effective size could be lower -- but that would make the species divergence substantially older -- 9 million years or more.

    As we go further back in time, this lower human mutation rate may be less and less relevant, because different primate lineages may have higher (or lower) rates. When some of the kinks have been worked out of whole-genome sequencing, it would be tremendously useful to sequence parent-offspring pairs in other primate species. With those data, rate heterogeneity could be tested directly.

    For events within the hominins, the parent-offspring rate of mutations ought to be better than a rate estimated from phylogenetic distance. Phylogenetic distances are estimated with even more error than mutations, increasingly so as our methods for comparing genomes improve. But some fraction of new mutations will ultimately be lost to purifying selection. That implies, again, that the longer term rate of substitutions will be lower than the rate of mutations measured from parent-offspring comparisons.

    A rate of 1.1 x 10-8 would have no effect on the number of genetic differences observed between people, because these differences are just counted, not estimated by genealogical relationships that are known. It is the unknown genealogical relationships, which are estimated from genetic differences, that may change substantially.

    Let's consider an example. Harris and Hey [11] sequenced 4200 bp of the gene PDHA1, an X-linked gene whose product is part of a mitochondrial enzyme complex. At the time of their study (1999), their result was one of the oldest coalescence times estimated for non-African populations based on sequence data; they estimated the root of the PDHA1 genealogy was 1.8 million years old. This estimate was based on the assumption that human and chimpanzee copies, which differed by an average of 40.42 substitutions, had diverged at 5 million years ago. That would imply that the average genetic difference between humans across the deepest root of the genealogy, 15.05 mutational differences, corresponds to 1.86 million years of time. If we instead assert a per-generation rate of 1.1 x 10-8 per site, these data would generate an estimate of 163,000 generations for the root of the genealogy, roughly 3.3 million years.

    In other words, a coalescence that appeared to have happened in early Homo now looks rooted at the age of A. afarensis. The chimpanzee-human genetic root would be around 8.7 million years for these data.

    These estimates would likely be biased too low, because the X chromosome has a lower rate of mutation than the autosomes by some extent. That issue was addressed by Lynch [10], due to the fact that X chromosomes are in males (with their higher rate of mutations) only 1/3 of the time compared to 1/2 the time for autosomes. Any purifying selection would also bias the estimate too low. If these 4200 bp have a higher-than-average CpG content, that is one factor that might require a higher per-generation rate.

    Is any of this a problem? I don't think we know yet. A lower rate must readjust the apparent correspondence of some molecular time estimates with the archaeological record. But to be honest, most of the apparent correspondences of such dates have been illusory, because genealogical relationships among genes have such large expected variance under any realistic human population model. It is really the availability of whole-genome comparisons that has a chance of improving these population models.


    References

    1. Roach JC, Glusman G, Smit AFA, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, et al. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science [Internet]. 2010;328:636–639. Available from: http://dx.doi.org/10.1126/science.1186802
    2. Subramanian S, Kumar S. Neutral Substitutions Occur at a Faster Rate in Exons Than in Noncoding DNA in Primate Genomes. Genome Research [Internet]. 2003;13:838–844. Available from: http://dx.doi.org/10.1101/gr.1152803
    3. Crow JF. The origins, patterns and implications of human spontaneous mutation. Nature Reviews Genetics [Internet]. 2000;1:40–47. Available from: http://dx.doi.org/10.1038/35049558
    4. Nachman MW, Crowell SL. Estimate of the Mutation Rate per Nucleotide in Humans. Genetics [Internet]. 2000;156:297–304. Available from: http://www.genetics.org/cgi/content/abstract/156/1/297
    5. Yi S, Ellsworth DL, wen-Hsiung Li. Slow Molecular Clocks in {Old World} Monkeys, Apes, and Humans. Molecular Biology and Evolution. 2002;19:2191–2198.
    6. Steiper ME, Young NM. Primate molecular divergence dates. Molecular Phylogenetics and Evolution [Internet]. 2006;41:384–394. Available from: http://dx.doi.org/10.1016/j.ympev.2006.05.021
    7. Yang Z. Likelihood and Bayes Estimation of Ancestral Population Sizes in Hominoids Using Data From Multiple Loci. Genetics [Internet]. 2002;162:1811–1823. Available from: http://www.genetics.org/cgi/content/abstract/162/4/1811
    8. Kumar S, Filipski A, Swarna V, Walker A, Hedges BS. Placing Confidence Limits on the Molecular Age of the Human-Chimpanzee Divergence. Proceedings of the National Academy of Sciences, U. S. A. [Internet]. 2005;102:18842–18847. Available from: http://dx.doi.org/10.1073/pnas.0509585102
    9. Kondrashov AS. Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases. Hum. Mutat. [Internet]. 2003;21:12–27. Available from: http://dx.doi.org/10.1002/humu.10147
    10. Lynch M. Rate, molecular spectrum, and consequences of human mutation. Proceedings of the National Academy of Sciences [Internet]. 2010;107:961–968. Available from: http://dx.doi.org/10.1073/pnas.0912629107
    11. Harris EE, Hey J. X chromosome evidence for ancient human histories. Proceedings of the National Academy of Sciences, U. S. A. 1999;96:3320–3324.
    Synopsis: 
    The 1000 Genomes Project is finding that the mutation rate is half the value usually assumed.
  • mtDNA, purifying selection and "distorted" genealogies

    Sat, 2010-10-23 11:13 -- John Hawks

    I'm going to pass along this paper without much comment, it's by Jon Seger and colleagues and it came out earlier this year in Genetics [1]:

    Gene Genealogies Strongly Distorted by Weakly Interfering Mutations in Constant Environments

    Neutral nucleotide diversity does not scale with population size as expected, and this "paradox of variation" is especially severe for animal mitochondria. Adaptive selective sweeps are often proposed as a major cause, but a plausible alternative is selection against large numbers of weakly deleterious mutations subject to Hill–Robertson interference. The mitochondrial genealogies of several species of whale lice (Amphipoda: Cyamus) are consistently too short relative to neutral-theory expectations, and they are also distorted in shape (branch-length proportions) and topology (relative sister-clade sizes). This pattern is not easily explained by adaptive sweeps or demographic history, but it can be reproduced in models of interference among forward and back mutations at large numbers of sites on a nonrecombining chromosome. A coalescent simulation algorithm was used to study this model over a wide range of parameter values. The genealogical distortions are all maximized when the selection coefficients are of critical intermediate sizes, such that Muller's ratchet begins to turn. In this regime, linked neutral nucleotide diversity becomes nearly insensitive to N. Mutations of this size dominate the dynamics even if there are also large numbers of more strongly and more weakly selected sites in the genome. A genealogical perspective on Hill–Robertson interference leads directly to a generalized background-selection model in which the effective population size is progressively reduced going back in time from the present.

    The topic arises for me at the moment because of some inconsistencies between the apparent timing of events from mtDNA estimates compared to nuclear DNA estimates. Across the crucial "out of Africa" time interval between 200,000 and 50,000 years ago, the mtDNA is not really giving the same chronology as might be expected from nuclear DNA comparisons.

    The mutation rate of mtDNA genome-wide is very high, giving rise to the possibility of interaction between weakly deleterious mutations on the same sequence. It is widely known that the apparent rate of mtDNA mutation depends on the timescale of the comparison in humans. Mothers and their offspring differ by much more than would be predicted by longer pedigrees or by comparisons between populations. Recently diverged populations (such as those in island Polynesia) differ much more than would be predicted from the difference between humans and Neandertals or humans and chimpanzees.

    This apparent "speed-up" of rate as we get closer to the present is consistent with the action of strong purifying selection. So establishing the other genealogical effects of this selection should help us understand the patterns of mtDNA sequence differences found in humans.


    References

  • Inclusive fitness works

    Wed, 2010-09-01 07:53 -- John Hawks

    I can't believe the amount of attention the paper by Martin Nowak, Corina Tarnita and Edward O. Wilson [1] has gotten. It was in last week's Nature. The basic idea was that the evolution of eusociality in insects could be explained in a different way that the usual explanation, which involves calculating the relatedness of worker insects to their reproductive siblings. Eusociality has been one of the most visible applications of inclusive fitness theory -- that is, the observation that the fitness of a gene that alters behavior may be calculated in terms of its effects on the reproduction and survival of relatives. The paper notes that some aspects of eusociality are not well explained in terms of relatedness, and derives an alternative explanation.

    The weird part of the paper is the way it describes inclusive fitness as some kind of theoretical afterthought, useful only as an ad hoc explanation for eusocial insects. It contrasts the inclusive fitness concept with "standard natural selection" as if it were possible for organisms to erase the fact that they're related to each other! And the authors imply that they have fatally damaged the concept of kin selection.

    It's so contrary to evolutionary theory, that I thought maybe I was missing something. But I've been spending time on another problem this week and haven't had time to follow it up.

    Fortunately, Jerry Coyne and Richard Dawkins have both given the paper some attention, and written notes and reactions to it. First Coyne ("A misguided attack on kin selection") reminds us of why kin selection has been such a successful part of "standard" evolutionary theory for the past fifty years.

    Sex ratio theory, in which mothers produce different proportions of males and females, has been a particularly fruitful area for applying inclusive fitness theory. So has “altruism”—suicidal honeybees are just one example. And so are parental care and aspects thereof, especially parent-offspring conflict, a field brought to life by Bob Trivers using inclusive fitness theory. How else can you explain weaning conflict except by a conflict between the mother’s genetic welfare and that of her offspring?

    I’m baffled not only by Nowak et al.’s apparent and willful ignorance of the literature, but by statements that are just wrong. They flatly assert, for instance, that “inclusive fitness theory” is something different from “standard natural selection theory.” But it’s not: it’s simply a natural extension of population genetics to the situation in which one’s behavior affects related individuals.

    Richard Dawkins has also posted notes about the paper:

    Kin selection is not a subset of group selection, it is a logical consequence of gene selection. And gene selection is (everything that Nowak et al ought to mean by) 'standard natural selection' theory: has been ever since the neo-Darwinian synthesis of the 1930s. Inclusive fitness theory is not some kind of supernumerary excrescence, to be 'resorted to' only if 'standard natural selection theory' is found wanting (Misunderstanding One). On the contrary, inclusive fitness theory is one way of expressing what was logically inherent in the synthesis ever since Fisher and Haldane, but had been largely overlooked because people (with the exception of those two geniuses) didn't think about collateral kin.

    Yes, unless they're going to repeal the Price equation, they'll have to rely on relatedness to explain those phenotypes that never occur in reproductive individuals. As Dawkins puts it, "You have to talk about shared genes in individuals, with conditional phenotypic expression."


    References

    1. Nowak MA, Tarnita CE, Wilson EO. The evolution of eusociality. Nature [Internet]. 2010;466:1057–1062. Available from: http://dx.doi.org/10.1038/nature09205

Pages

Subscribe to population genetics

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.