john hawks weblog

paleoanthropology, genetics and evolution

exponential growth

  • Handling exponential growth in demographic models

    Fri, 2008-06-06 10:50 -- John Hawks

    Exponential growth is a feature of current human populations, and was may represent how the human population behaved during some episodes of its demographic history. However, "exponential" can mean different things to different people, if you're not used to thinking mathematically about growth. So I need to lay out some definitions:

    1. Linear population growth: The same number of individuals is added in each successive time interval. Hence, population size is a linear function of time. Think of driving your car at a constant velocity. Or, you deposit your paycheck every month into a bank account, without interest.

    2. Geometric population growth: The same proportion of individuals is added in each discrete time interval -- for example, in each generation. Time is not measured continuously. Consider a bank account, compounded annually.

    3. Instantaneous population growth: At one discrete time, the population is considered to transition immediately, without any time passing, from a small to a large size. Suddenly, a benefactor makes a large deposit in your bank account.

    4. Exponential population growth: The population grows by a constant proportion per unit time, measured continuously. Consider a petri dish with a growing colony of E. coli, or a bank account compounded continuously.

    If you drive your car at a constant speed, then in half the time it would take to reach your destination, you will be halfway there.

    But exponential growth does not work this way. Suppose you have a dollar in the bank now, and you invest at a continuous rate equivalent to 5 percent annually. In 100 years, you expect to have $148. If your account grew linearly, you would have $74 in 50 years. But at your exponential growth rate, you will have only $12. In fact, it will take 86 years for your account to reach halfway to its "destination" of $148.

    Now, what if we approached the question from the opposite direction? Suppose that our account really does grow exponentially, that we really did put in one dollar at the beginning, and we really did end up with $148 after 100 years. But suppose that we also really did have $74 in the account after 50 years. The form of the solution here is obvious: we are dealing with at least two different rates of increase -- one for the early part of the 100-year interval, and a different rate for the later part.

    In fact, there are an infinite number of ways that the rate might change over time to attain this result. Maybe it changed 30 years into the span, or 55 years in. Maybe it changed continuously. Maybe the account shrank at some times and grew at others.

    We can only attempt to deal with these unknowns by taking additional samples. What was the account balance after 20 years? After 21? 22? 73? I'll call these observations "signposts" -- because they give us markers along the path taken by the size of the account.

    You get the idea: this bank problem is very much like our problem reconstructing ancient demography in human populations. When we consider genetic variation, what we observe in today's genes was affected not only by the population sizes at the signposts that we observed in the past, but by every point in between.

    Suppose that our bank account was not merely symbolic money, but that the bank put in actual pennies when the amount increased. It's a simple enough matter to examine all 14,800 pennies at the end of the 100 years. We can ask, how many of those pennies will have mint marks dating 20 years into the span? How many will have mint marks dating 73 years in? The answers to those questions depend on the account balances across the entire 100-year span. That is the kind of question that we address about human history when we observe today's genetic variation. How many people today share haplotypes that originated 5000 years ago? What about 35,000 years ago? 143,000?

    When we make a prediction from evolutionary theory -- for example, the prediction of the age distribution of haplotypes in a population given the assumption of no selection -- then we must assert a model of demographic history. It used to be that you could simply assert a constant population size. But that's no longer any good for human evolution, since our population has obviously grown massively over time.

    If we want our predictions to relate to the real population history, then we ought to use as many signposts as we can find, so that we can constrain our models. For human demographic history, those signposts come from several sources, including the archaeological record, ethnographic comparisons, and increasingly genetic sampling. As I'm going to show, it's really not good enough to just pick numbers out of thin air. The reason is that there are many ways that your model can work against you unless you put in as accurate numbers as you can find.

    How not to handle exponential growth

    A simple exponential model has the benefit of simplicity. But if we don't choose our signposts carefully, a simple model will lead us badly wrong. Here, I'm going to examine the demographic simulations performed by Voight et al. (2006). I'm not picking on this paper in particular -- it actually stands out as a relatively good example of demographic modeling in genetics. This paper has been cited a lot of times, and it is valuable in part because of its detailed analysis of the power of detecting recent selection.

    Some of the power analyses were based on demographic models applied to the data from the Yoruba HapMap sample. Voight et al. (2006) considered only exponential growth models for the Yoruba (as opposed to the Asian and CEU HapMap samples, for which they also considered bottlenecks of various kinds). At the low end, the authors considered a model with no growth at all -- a constant effective population of 11,156 individuals. At the high end, they considered a model in which the population grew exponentially from an ancestral size of 10,018 individuals up to a current size of 1,910,000 individuals, with growth commencing 750 generations in the past. Other models were in between these extremes, although many had earlier onsets of population growth (up to 4000 generations ago). These values are reported in the online correction to the original article.

    At the outset, we can observe that these values are far too low, both for the ancestral and the current populations. The current population size of sub-Saharan Africa is on the order of 650 million individuals. This, of course, disproportionately represents the last few generations of rapid growth. But even in the year 1500, sub-Saharan Africa had a population on the order of 80 million people (Biraben 2003). The effective size of this population would be between 20 and 40 million. Of course, the Yoruba HapMap sample does not represent this population uniformly. The present population of Nigeria is 148 million, the number of Yoruba within this population approximately 30 million. Applying the same growth constant, we might estimate that this population had numbered around 5 million in the year 1500. But as we go back in time, we must encompass a wider cone of ancestry, as genes have flowed into the Yoruba from other populations. Hence, an effective 2 million individuals is certainly too small for the present population by a factor of five to ten, and plausibly too small for the population of 500 years ago by a smaller factor.

    The ancestral size is more seriously in error. Certainly, going back to 500,000 years ago or earlier, the long-term effective population size for humans really was on the order of 10,000 individuals. Since autosomal genes coalesce across that span or longer, we need to employ demographic models that incorporate this small ancestral size. However, we now know that this small size did not characterize any of the Late Pleistocene of Africa (as I discussed last month). Instead, the African population had reached an effective 38,000 individuals by 144,000 years ago, and grew after that time. So the initial size used by Voight et al. (2006) is small by a factor of more than four.

    But what matters much more is the combination of date and size. That's because the entire period matters to genetic variation, not merely the signposts.

    The models applied by Voight et al. (2006) may be fourfold too small at the beginning of the Late Pleistocene. But what does archaeology tell us about the African population in the early LSA, around 20,000 years ago, when Voight et al. (2006) suggest it had just begun to increase in numbers? Biraben (2003) puts the world population over 5 million individuals by that time. Taking this estimate, the sub-Saharan fraction of the global population at that time may have been substantial, more than a million individuals. That would mean that the Voight et al. (2006) estimate is perhaps only a thirtieth of the true value. Still, Atkinson et al. (2008), surveying mtDNA variation, found that the sub-Saharan population was apparently small compared to southern Asia around 20,000 years ago, with a sub-Saharan effective size less than 100,000 individuals. In that view, the Voight estimate is at least a tenth of the most accurate value.

    But what across the span from 10,000 to 5000 years ago -- the time range corresponding to the highest fraction of ascertained selection in their data? At the end of this time range, 5000 years ago, the best demographic estimates place the sub-Saharan African population around 6 million individuals, or perhaps 1.5 to 3 million effective individuals. The largest exponential growth model applied by Voight et al. (2006) predicts a continuous growth rate of 0.00028 per year during the last 750 generations. That would predict an effective size 5000 years ago of only 470,000 individuals -- perhaps a third to a sixth of the real value.

    In other words, the simulations conducted by Voight et al. (2006) have overestimated the power of genetic drift during the last 144,000 years, and most critically in the period around 20,000 to 5000 years ago. The problem is that the signposts are wrong: replace the demographic assumptions with better ones, and you bring them more into line with reality. In this case, the estimate of current effective size was wrong, but not unreasonably so -- it's possibly within factor of two. But the early values are wrong by a factor of ten or more, and the errors compound by the use of the simple exponential growth model. Replacing the more recent interpolated values with real estimates taken from archaeological and ethnographic models would be more complicated, but would actually remove uncertainty in the model.

    What are the effects of these models on the results of the paper? Figure 4 in the corrected paper shows the comparison of the real Yoruba data to the simulated datasets. In all cases, the simulated datasets have less variation in the critical statistic than the real data, which indicates the presence of widespread selection within the real data. If we incorporated a more accurate demographic model, the variation within the simulated data should reduce yet more, because genetic drift should have been much weaker than in the simulations performed by Voight et al. (2006). This would increase the proportion of inferred selection represented by the data. Likewise, the power to detect selection should increase for lower-frequency selected alleles -- because of the smaller chance that a long haplotype would increase by genetic drift alone.

    Next: Bottlenecks

  • Natural selection 101. Episode 1: The miracle of compound interest

    Sun, 2007-11-11 18:16 -- John Hawks

    --Originally posted August 24, 2007.

    Once upon a time, somebody probably told you that biologists don't need to know any calculus. Well, I suppose they were right: it is certainly true that most biologists don't use any calculus in their work. A purely practical biologist is like a purely practical banker -- as long as the computers do their jobs, why does anybody need to know how to calculate?

    Still, there is some point to knowing the theory that underpins the study of life. Math gives the theory its power. Understand the math, and you can unleash that power to find answers to new problems.

    During the last year or so, I have written nothing here about natural selection, quite purposively, even though anyone who knows me at all can tell you I hardly talk about anything else. Well, I tend not to write about what I'm working on; especially when it involves other people's observations as well as my own. I don't like it that way, but sometimes it's necessary. It especially stings when the major news in biology is that the world has changed to make selection relevant again. Still, to do my part in this change, I've maintained a respectable silence.

    Over this time, I have learned many mysteries about Darwin's force. Most geneticists approach natural selection as a kind of black magic. You see, find the right pattern of selection, and you can explain almost anything. You might think this is a desirable quality in a scientific hypothesis, but many people don't see it that way. Selection, in their view, is too often unfalsifiable. It's too hard to disprove. And besides, some things really do happen by chance alone. We have to give random chance at least a fair shot as an explanation, and if you can't disprove genetic drift (so the story goes), then you don't need to invoke selection.

    Besides, genetic drift is a much happier, friendlier hypothesis than selection. If somebody dies by genetic drift, it's nobody's fault. "Ooops, just a spot of bad luck, there! Move along, nothing to see here." By contrast, selection thuggishly entails that deaths and births have causes. For some reason, the idea that something should have a cause is offensive to some biologists. That is, after all, the point of The Spandrels of San Marco: Adaptationism, the assumption that phenotypic "traits" have discrete (and identifiable) causes, is a metaphysical assumption, not a tenet of Darwinism. Even those biologists who don't conform to the philosophy of narrow adaptationism, as described by Gould and Lewontin, have often felt the sting of the word; a real scarlet "A" for their dossiers.

    Perhaps more to the point, you can learn the essentials about genetic drift with a bit of algebra. Drift in a constant population is a linear process, and drift in non-constant populations can generally be approximated by linear modifications to the case of constant size. In contrast, natural selection is a logistic process, and understanding it requires differential equations.

    A combination of philosophy and calculus. You can see how selection got its reputation as black magic.

    Darwin's non-mathematical math

    The foundations of Darwinism are economic. This should not come as a shock: Darwin took his inspiration from Thomas Malthus, who formalized the idea that the geometric growth in population would outstrip resources that grow at a linear rate. That's math -- math that Darwin found compelling and used as the basis for his concept of natural selection. Here's a passage from page 47 of "On the Variation of Organic Beings in a state of Nature":

    It is the doctrine of Malthus (1826) applied in most cases with tenfold force. As in every climate there are seasons, for each of its inhabitants, of greater and less abundance, so all annually breed; and the moral restraint which in some small degree checks the increase of mankind is entirely lost. Even slow-breeding mankind has doubled in twenty-five years; and if he could increase his food with greater ease, he would double in less time. But for animals without artificial means, the amount of food for each species must, on an average, be constant, whereas the increase of all organisms tends to be geometrical, and in a vast majority of cases at an enormous ratio. Suppose in a certain spot there are eight pairs of birds, and that only four pairs of them annually (including double hatches) rear only four young, and that these go on rearing their young at the same rate, then at the end of seven years (a short life, excluding violent deaths, for any bird) there will be 2048 birds, instead of the original sixteen. As this increase is quite impossible, we must conclude either that birds do not rear nearly half their young, or that the average life of a bird is, from accident, not nearly seven years. Both checks probably concur. The same kind of calculation applied to all plants and animals affords results more or less striking, but in very few instances more striking than in man.

    Darwin sat on this expressly mathematical insight for nearly twenty years, until Alfred Russel Wallace arrived at it independently. Wallace sent Darwin his manuscript, Darwin forwarded it to Charles Lyell, and Lyell arranged the remarkable double publication of Darwin's and Wallace's essays in the Journal of the Linnean Society. Wallace's essay contains a very similar section to Darwin's quoted above -- the observed birth rate of animals should lead to geometric growth, yet this is impossible except over the shortest time span, so the natural check on population growth must cause competition and selection of traits favorable to survival.

    Math-avoiding biologists have a true hero in Darwin, who -- even allowing for his characteristic nineteenth-century modesty -- was profoundly self-conscious about his failure to master algebra. In an autobiographical chapter of the collected papers edited by his son Francis, Charles Darwin himself describes his resignment about math:

    I attempted mathematics, and even went during the summer of 1828 with a private tutor (a very dull man) to Barmouth, but I got on very slowly. The work was repugnant to me, chiefly from my not being able to see any meaning in the early steps in algebra. This impatience was very foolish, and in after years I have deeply regretted that I did not proceed far enough at least to understand something of the great leading principles of mathematics, for men thus endowed seem to have an extra sense. But I do not believe that I should ever have succeeded beyond a very low grade (Darwin 1887:46).

    So it is ironic that Darwin's greatest insight was so expressly mathematical. The force of natural selection emerges from the necessary conflict between the potential of geometric population growth and the constraint of limited resources. The conflict arises from excess reproduction itself, for if many are being born but the population still does not grow, then we can infer that just as many must die. Wallace's essay makes this point crystal clear, after considering that birds produce four or more offspring per year:

    A simple calculation will show that in fifteen years each pair of birds would have increased to nearly ten millions! whereas we have no reason to believe that the number of the birds of any country increases at all in fifteen or in one hundred and fifty years. With such powers of increase the population must have reached its limits, and have become stationary, in a very few years after the origin of each species. It is evident, therefore, that each year an immense number of birds must perish — as many in fact as are born; and as on the lowest calculation the progeny are each year twice as numerous as their parents, it follows that, whatever be the average number of individuals existing in any given country, twice that number must perish annually,—a striking result, but one which seems at least highly probable, and is perhaps under rather than over the truth (Wallace 1858:55).

    Many historians of science have found it very meaningful that the two men independently arrived at this formulation. It suggests that the idea of natural selection was in some sense "ripe" -- that the tenor of the times made science ready for Darwinism.

    Maybe so. But this "zeitgeist" argument misses an important point: this mathematical theory went without any mathematical description for over fifty years.

    To some extent, this lack of development can be blamed on the lack of a satisfactory theory of inheritance. When the mathematical development of a theory of natural selection was finally advanced by Haldane and Fisher, they had Mendelism to build it upon. If inheritance had turned out not to be Mendelian, a mathematical description of selection would likely have been harder. It is plausible that an earlier acceptance of Mendelian inheritance would have led to an earlier population genetic theory -- it certainly didn't take very long after Mendelism was rediscovered for G. H. Hardy and Wilhelm Weinberg to describe its statistical foundations (Jim Crow described the context of these discoveries in a 1999 perspective piece).

    Demography and selection

    Still, I don't find the lack of a gene theory to be a very satisfactory explanation. There is nothing genetic about Darwin's and Wallace's logic. Both men posed the problem in exclusively demographic terms. Certainly, both assumed that characters are inherited in some way, because without inheritance, natural selection would be impossible. But they were content to refer to the competition between varieties, which itself is quite sufficient as a basis for a theory of selection. The replacement of one variety by another shares a common demographic basis as the replacement of one gene by another.

    In other words, Darwin's and Wallace's description of selection emerged from facts about demography, not inheritance. Both Darwin and Wallace make clear that selection depends on the conditions of existence -- it may be abated when resources are abundant, and it may intensify when populations decline. These demographic conditions could have been easily modeled along the lines that both Darwin and Wallace suggested. The essential facts are all there in the 1858 papers: when populations shrink, varieties that gain resources less effectively may disappear, and when populations grow, more fecund varieties will replace less fecund ones. This is the distinction between survival and fertility selection, already present in Darwin and Wallace.

    We can imagine an alternative history in which these insights were rapidly developed into a demographic model of selection. Mathematical models of demography were not only available at the time Darwin and Wallace wrote, they were the advancing frontier of social science. Mathematical descriptions of demography became important in the 1800's for the same reason they remain important today: actuarial predictions. In the 1820's, Benjamin Gompertz considered the effects of changing mortality, while the logistic model had been formulated by Pierre Verhulst as early as 1838. Both models presented substantial refinements of Malthus' conception of geometric growth, including the very thing Darwin and Wallace most needed: a description of an equilibrium. For that matter, Euler developed a true age-structured model of population growth in 1760! When we consider that the demographic model of natural selection is entirely pre-Darwinian, the possibility of an earlier development of theoretical population genetics seems quite plausible.

    Such speculations are something like steampunk, that narrow corner of fiction that supposes Babbage had really built his Difference Engine No. 2, and imagines what would have happened next. But there is a point to it: Nineteenth-century demography was already well-equipped to incorporate selection. Doing so may at the least have jump-started epidemiology, which could have made much of good actuarial records. Tracking thousands of people was already undertaken by governments. On the other hand, the development of genetics required somebody to track thousands of flies, and that wouldn't happen for a while. Still, a good demographic theory of selection might have been incorporated into developmental biology, giving Mendelism a run for its money.

    So why didn't any biologist realize the potential of such modeling for understanding evolution? I can't find any historians of science who have considered this question, but we have some hints. Darwin and Wallace changed the direction of biology, but not its main research approaches. The nascent study of embryology and morphology, what we now would call "evolutionary developmental biology," was not based on demography, and had a radically different conception of possible mathematical descriptions of change. This may also account for the failure of biology to recognize the importance of Mendel's work -- another example of the power of algebra.

    Another reason for the tardy mathematical development: Rather than limiting themselves to a simplistic reductionist approach, biological theorists immediately tried to take in the full scope of nature in their evolutionary explanations. Haeckel was well known for this tendency in comparative biology -- he had to subsume every aspect of morphology into his Biogenetic Law. But the problems of demography could be equally baffling, if not reduced into a consideration of a single species at a time. For example, Alfred Lotka (1925:62) quotes this passage from Herbert Spencer's First Principles:

    Groups of organisms display this universal tendency towards a balance very obviously. In § 85, every species of plant and animal was shown to be perpetually undergoing a rhythmical variation in number -- now from abundance of food an absence of enemies rising above its average, and then by a consequent scarcity of food and abundance of enemies being depressed below its average. And here we have to observe that there is thus maintained an equilibrium between the sum of those forces which result in the increase of each race, and the sum of those forces which result in its decrease. Either limmit of variation is a point at which the one set of forces, before in excess of the other, is counterbalanced by it. And amid these oscillations produced by their conflict, lies that average number of the species at which its expansive tendency is in equilibrium with surrounding repressive tendencies. Nor can it be questioned that this balancing of the preservative and destructive forces which we see going on in every race must necessarily go on. Since increase of numbers cannot but continue until increase of mortality stops it; and decrease of number cannot but continue until it is either arrested by fertility or extinguishes the race entirely (Spencer 1867:502).

    Spencer and others were not content with describing what happened to a single population, because the dynamics of one population obviously depend on the populations of other species -- predators, competitors, and prey. An equilibrium between "expansive and repressive" forces required a consideration of those other species. Interestingly, Lotka quoted this passage in the context of providing just such a complicated model -- a system of equations modeling the interactions of an entire community of species.

    Demographic modeling would not make an impact on evolutionary theory until after 1900. Much of the revival was due to Lotka, who not only developed a continuous version of the Euler age-structured equation for population growth, but also extended the work of Vito Volterra to account for predator-prey relationships. Verhulst's logistic model was revived in 1920 by Raymond Pearl and Lowell Reed to describe the growth of the U.S. population.

    By this time, the first population geneticists, including Haldane, Fisher, and Wright, were ready to think about the demographic foundations of natural selection. Fisher showed how Mendelian genes could explain the variation in quantitative traits. Haldane showed how an advantageous gene would behave in a population. And then, in rapid order, Fisher demonstrated the essential connection of natural selection to demography.

    Compound interest

    Most descriptions of natural selection begin with Mendelism, and follow Haldane's formulation of the replacement of a deleterious allele by an advantageous one. Certainly there is merit in this approach, but it's not especially Darwinian. Haldane's model is surprisingly complicated in its mathematics -- no doubt to the consternation of many would-be population geneticists. Moreover, its assumption of a static population bears little resemblance to the continuous demographic flux described by Darwin and Wallace.

    So I'm going to do something very different. Instead of beginning with Haldane, I'm going to start with Fisher's demographic model. Fisher's model is based on the Euler-Lotka equation, and it is often overlooked by geneticists -- in fact I've never seen it in any population genetics text other than Gillespie's. But it is the foundation of life history theory and led directly to Hamilton's insights about strategy variants, later developed by Price and Maynard Smith. Plus, it takes a form that builds immediately upon the logic of Darwin and Wallace.

    The essential insight is one that any nineteenth-century banker would understand: population growth is like compound interest.

    A hundred dollars in the bank at four percent annual interest will grow to $104 in a year. In two years, you'll have $108.16. That's the initial $100 times 1.04 (104 percent) for one year, times 1.04 again for the second year.

    A simple equation will give us that result: if t is the time in years, r is the rate of interest, and x0 is the original principal, then after t years the account balance will increase to:

    x_t = x_0 * (1 + r)^t

    Now, if you will have $104 in a year, how much will you have in your account in six months? Simply, if we allow t to equal one-half (0.5) in the equation above -- for half a year of interest -- we find that the right amount is $101.98.

    The amount of interest in the first six months is different from that in the second six months -- and in general, the amount of interest in any period depends not only on the rate of interest but also the amount of principal at that instant. Banks generally simplify matters (to your slight disadvantage) by compounding interest only at long intervals of a month or more.

    However, we can write these relations in another form that will make them much more useful to us. In the equation above, we can consider the term (1 + r)t as two parts: a base (1 + r) and an exponent (t). We may substitute a different exponent and base if we choose. In particular, if we substitute the base e, then the equation above may be written:

    x_t = x_0 e^rt

    The exponential base e is exceedingly handy. Transforming our growth equation into an exponential growth equation lets us examine change as an continuous process. What is k? The value of k that will satisfy the equation is k=ln(1 + r). It is often called the constant of proportionality -- it represents not the annual rate of change, but the instantaneous rate of change. For a four percent annual rate of interest, the value k ≅ 0.0392. In other words, a bank could pay our account 4 percent interest compounded annually by giving us the proceeds from 3.92 percent compounded continuously, and pocket the difference. It's not much of a margin, since r exceeds k by such a small amount. In fact, this amount is the interest on the interest earned continuously during the year.

    The equation, xt = x0ekt, is a solution to the differential equation

    dx/dt = kx

    This equation says that the rate of change in x at each instant equals the product of k and x at that instant.

    Malthusian population growth

    Malthus translated this simple logic underlying compound interest to an insight about populations. To do this, he had to ignore all the complexities that would later be pointed out by Darwin and Wallace. True, the annual numbers of births and deaths within natural populations are always changing. Natural resources change, sources of food, enemies, diseases, and all of these cause fluctuations in the birth and death rates. But if we ignore these fluctuations, and assume that the birth and death rates are perfectly constant, then a population should behave just like a bank account. If the annual rate of births (per individual) is higher than the annual rate of deaths (per individual), then the population will grow according to the equations above. This kind of population growth is generally called Malthusian growth.

    During the 1950's up to the 1970's, the human population of Earth grew by around 2 percent annually. Since that time the global population growth has been somewhat less, and the United Nations estimates that in the year 2000, the global population grew at an annual rate of 1.14 percent.

    Biologists tend to measure time in generations rather than years. Anthropologists and geneticists often assume a generation length of 20 to 25 years, although these values vary in different populations. These times are intended to represent the average age at which people have children, but of course the actual times vary substantially. Why does all this variation matter? Well, for one thing, it's why we want to use a continuous model instead of a model that involves discrete generations. Since continuous means calculus, it's nice to have a reason for the effort!

    In the end, we will do a bit better than this for a model of population growth, by directly considering the variation in the age at reproduction. That will take a bit more doing, which will come after a couple more episodes.

    At the current annual rate of growth (1.14%), we can estimate the growth rate per 20-year generation as (1.14)20, or 25.4 percent. If this is the rate r per generation, we can estimate the constant of proportionality k as 0.226 per generation.

    Clearly Malthus was right: over the long term, this kind of population growth is not sustainable. Indeed, over the very long term, no rate of population growth can be sustainable. And yet, over evolutionary time, no species that is incapable of long-term growth can survive: the inevitable consequence of an indefinite decline in numbers is extinction.

    To examine natural selection, we will need a slightly more complicated model of demography -- one that combines the potential of growth with the fact that growth cannot continue indefinitely. In the next installment, we will see that model, and consider some of its distinctive predictions about the rate of change. These demographic conditions, as Darwin and Wallace saw, provide the context by which one variety may replace another.

    References:

    Crow JF. 1999. Hardy, Weinberg and language impediments. Genetics 152:821-825. Full text

    Darwin C. 1858. On the tendency of species to form varieties; and on the perpetuation of varieties and species by natural means of selection. J Proc Linnean Soc Lond Zool 3:46-50.

    Darwin C. 1868. The variation of animals and plants under domestication. 1 ed., vol. 1. John Murray, London.

    Darwin F, ed. 1887. The life and letters of Charles Darwin, including an autobiographical chapter. vol. 1. London: John Murray.

    Pearl R, Reed LJ. 1920. On the rate of growth of the population of the United States since 1790 and its mathematical representation. Proc Nat Acad Sci USA 6:275-288.

    Spencer H. 1867. First principles. Williams and Norgate, London.

    Wallace AR. 1858. On the tendency of varieties to depart indefinitely from the original type. J Proc Linnean Soc Lond Zool 3:53-62.

  • Two recent bottleneck studies

    Thu, 2005-01-06 22:31 -- John Hawks

    References:

    Marth, Gabor, et al.. 2003. "Sequence variations in the public human genome data reflect a bottlenecked population history. Proceedings of the National Academy of Sciences, USA 100:376--381.
    PubMed

    Marth, G. T., E. Czabarka, J. Murvai, and S. T. Sherry. 2004. "The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations." Genetics 166(1):351--372.
    PubMed

    Conclusions

    These two studies by Gabor Marth and colleagues are attempts to test hypotheses of demographic history from genome-wide surveys of human single nucleotide polymorphism (SNP) data. The power of these data are that they are widely available--they are dispersed throughout the genome and they have been taken from standard panels of populations for the purposes of localizing polymorphisms linked to disease expression. This means that they are probably the single largest source of evidence about human genetic variation at this point. However, they are also rather limited in scope--since they are surveyed for non-demographic purposes, the data present certain problems with ascertainment bias, potential non-independence, and lack of knowledge about their evolutionary dynamics that are unique. Marth and colleagues have taken pains to develop modes of analysis that correct for these problems to the extent possible. In other words, they apply certain corrections to the data to make it consistent with the assumptions of the Fisher-Wright population model.

    The first of these studies examined the density of SNP candidates in different genomic regions as an indication of the frequency spectrum of mutations in the human genome. The second study went further by examining SNP's in three separate populations, including European, Asian, and African-American samples. Thus, the first study is informative only in a global context, as some kind of average of what may have happened to all populations, while the second study differentiates different populations from each other in terms of the frequency spectrum of mutations.

    According to Marth et al. (2003), the SNP data are consistent with a global population bottleneck, dating to around the time of the Upper Paleolithic. This overstates their conclusions somewhat, because the support for the bottleneck came entirely from the assumption of a particular degree of recombination among the sites they examined. This assumption might be accurate, but it is notable that without it there was no support for a bottleneck. And interestingly, the bottleneck model that fit the data best included a net decline in the human population rather than an expansion. In other words, this was not a bottleneck with a subsequent exponential growth of the human population (as we know has been happening recently). This was a bottleneck followed by growth to a smaller level than before the bottleneck. The the best fitting bottleneck had an onset at 1,600 generations (or approximately 40,000 years) ago, and a release at 1,200 generations (30,000 years) ago. No confidence limits for these estimates were provided in the study.

    The second study (Marth et al. 2004) provided best-fit models of population history for the three separate population samples. For African-Americans, the best-fit model was a simple expansion of population size, from around 10,000 to around 18,000 some 7,500 generations (187,500 years) ago. For Asians, the best-fit model was a bottleneck from 10,000 to 3000 individuals at 3,800 generations (95,000 years) ago, with an expansion to 25,000 individuals some 80,000 years ago. For Europeans, the bottleneck from 10,000 to 2,000 began 3500 generations (87,500 years) ago and ended 3,000 generations (75,000 years) ago with an expansion to 20,000 people. A model of confidence interval is presented for the European sample, which illustrates the relationship between the severity of the bottleneck (i.e. how many people there were then) and its duration. Simply put, a longer bottleneck can have a larger population and still fit the data (because of the longer time for an effect), while a more severe bottleneck can be shorter and have the same effect.

    There is little point to dissecting these values, since they are simply best-fit numbers under the assumptions in the studies. It is worth pointing out the large differences between the two studies. Notice in particular the discrepancy in the timing of the putative bottlenecks, especially considering most of the sample used in the first study, taken from public genome datasets, must probably represent Americans of European descent. Why they do not have a population history more similar to the Europeans in the second study is unexplained (although they do note the discrepancy on page 362 of the second study). It is also interesting that the bottleneck in the second study is so ancient. Presumably if it actually reflects the population dynamics within Europe 80,000 years ago, then it is reflecting the population history of Neandertals! Or maybe the Levantine ancestors of later Europeans were facing population pressure when Neandertals moved south during the Würm glaciation to occupy Kebara and Amud caves? Whatever is the case, the numbers are certainly strange.

    The reasons for the strangeness of these numbers are what interest me about the papers, and it is to these issues that I devote some more attention.

    Parameters

    Both studies assume a model for population histories in which a population may either grow or shrink at discrete times in the past. These times divide the entire population history into "epochs." For example, if the population never changed in size, the history would be a "one-epoch" population history, because all of time would be described by a single population size. If the population changed in size (grew or shrank) at a single time in the past, then it has a two-epoch history, reflecting the population size before and after the event. A two-epoch history is a three-parameter model, because three separate values must be known to predict the genetic characteristics of the population: the size before the event, the size afterward, and the time the event happened. Certain values for these three independent parameters may alter the expected diversity and frequency spectrum of mutations in populations with such histories.

    A test for a past expansion of population size is, then, a statistical power test of the hypothesis that the best three-parameter model is significantly better than the best-fitting one-parameter (one-epoch) model. A test for a past bottleneck takes this one step further. A bottleneck is a three-epoch model, with an ancient large population size crashing at one particular time in the past, then at a later time expanding to another, larger size again. This model has five parameters: three for the population sizes in each of the three epochs and two for the times of the population crash and subsequent growth. Testing for a past bottleneck is the test of whether the best five-parameter model fits the data significantly better than both the best three-parameter and the best one-parameter model.

    You may notice that even a three-epoch (five-parameter) model is probably not very much like the actual behavior of ancient populations. A natural population decreases and increases in size on a generation-by-generation basis. There may have been large-scale changes in the past, but populations do not crash instantly at one time, or grow instantly. Instead, they grow gradually and fitfully, perhaps geometrically or perhaps not. In modeling terms, the demography of a natural population would require as many epochs to describe as there have been generations in its history, or even more.

    But genetic data do not preserve evidence of every generation in a population's history. Most of these possible pleuriepochal histories are very similar to each other--so similar as to be indistinguishable. And genes are actually very weak discriminators, so that a three-epoch model is about as far as we can expect them to be informative. So the question we can pose is ultimately a very simple one: is the history of the population more similar to a constant population size, a single expansion, or a bottleneck?

    What hypothesis are we testing?

    But consider for a moment the opposite corollary: the fit of a model must get better as we add parameters. The statistical test asks whether the additional parameters make the fit significantly better, so that we don't automatically accept a more complicated model when a simpler one matches as well as we could expect it to. But in this case, our models are only representative of demographic processes, and not any other factor that may have affected human genes in the past. What if one of these other factors significantly reduced the fit of the one-epoch model? Clearly the two-epoch model would be a better fit, and might even be a significantly better fit if the unknown factor had a similar effect to a change in population size in the past.

    There are good reasons to think that exactly this situation might affect human genetic variation. For example, natural selection on genetic loci can produce a similar frequency spectrum of mutations as population growth in the past. If we have a sample of genes including some that have been under positive selection in the past, then a two-epoch model may fit the distribution of variation much better than any one-epoch model entirely because of this history of selection. The magnitude of the effect of selection depends on the number of genes that have experienced selection and the pattern of that selection, but it wouldn't take very many to make a two-epoch model a better fit.

    But a two-epoch demographic model obviously does not describe the effects of selection perfectly. What happens if we add another parameter? Presently, nobody knows the answer to this question. I speculate that it might very easily happen that a three-epoch model would significantly better fit a population with a combination of selective effects on different genes, because some of the genes would appear entirely unaffected by selection (these are the ones that look like their variation survived a "bottleneck") and other genes would be highly affected (these genes would look like variation had been lost during the "bottleneck"). But this is just speculation.

    The real issue is that adding parameters is very misleading if the assumptions underlying the models cannot be rigorously verified. In the case of demographic models about the past, the biggest assumption is selective neutrality. This assumption is necessary to using genes to test demographic hypotheses, because only genetic drift and mutation among the evolutionary forces have effects that are strongly linked to the size of the population. But we know that many genes were not neutral.

    Presently, most molecular geneticists do not take this concern seriously. Marth et al. (2004, p. 363) consider the issue as follows:

    We must also acknowledge that the current shape of human variation structure is the result of a combination of neutral and nonneutral (selective) forces. The current state of the art in recognizing the effects of selection in variation data has been reviewed recently (Bamshad and Wooding 2003). Positive selection resulting in genetic hitchhiking can mimic the effects of population expansion in that it gives rise to an excess of low-frequency alleles (Kaplan et al. 1989; Braverman et al. 1995). Recent efforts have been aimed at detecting loci that exhibit signatures of positive selection (Cargill et al. 1999; Sunyaev et al. 2000; Akey et al. 2002; Payseur et al. 2002). However, the exact proportion of genes that have been targets of strong positive selection within our evolutionary past is unclear (Bamshad and Wooding 2003). It is also unclear, in general, how far the effects of hitchhiking extend beyond the locus under selection (Wiehe 1998). Given that only a few percent of the human genome represents coding DNA, and that not all genes are expected to be targets of positive selection, we speculate that the distortion due to selective forces on the AFS in our data set of >20,000 randomly selected genomic loci is small when compared to the global effects of drift modulated by long-term demography.

    Basically boilerplate in studies like this one for, "we know our assumptions are not entirely accurate, but we think it doesn't matter too much." But does it? When studies vary so widely in their estimated demographic parameters, what reason should we logically adduce to explain the results?

Subscribe to exponential growth

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.