john hawks weblog

paleoanthropology, genetics and evolution

demography

  • From genes to numbers: effective population sizes in human evolution

    Mon, 2011-08-22 16:02 -- John Hawks
    Research authors: 
    Publication information: 

    This is a pre-review manuscript version of the book chapter published in Recent Advances in Paleodemography, J-P Bocquet-Appel, ed., Springer, doi:10.1007/978-1-4020-6424-1_1 (citation information)

    Work status: 

    This manuscript represents the completed work before peer review. It is posted here in accordance with the Springer copyright agreement. All citations and references to this work should direct readers to the final published version in the edited volume by Jean-Pierre Bocquet-Appel.

    Abstract: 

    The effective population size has become a central aspect of our understanding of the ancient structure of human populations. It is through this concept that the genetic variation of present-day humans may inform us about the number and relationships of humans in the past. However, effective population size itself is not a demographic parameter. If the theoretical model does not apply accurately to human evolution, then inferences based on the estimates of effective population size may be in error. Here, I present the theoretical basis of effective population size, including many of the demographic and evolutionary conditions that can confound the relationship of genetic variation and population size.

    Demography is the engine of evolution. Changes in allele frequencies require differential births and deaths of the individuals who carry the alleles. Under natural selection, these births and deaths approximate a deterministic process favoring the survival and reproduction of carriers of a particular allele. The histories of alleles themselves are demographic phenomena: the fitness advantage of a selected allele may be expressed as a relative intrinsic growth rate; its frequency over time follows a logistic growth curve.

    In the absence of selection, allele frequencies vary as a stochastic process. The parameters influencing this process are themselves demographic: population size and mating pattern. Ultimately, the rate of evolution of a population must be constrained by these parameters. This means that the observable genetic characteristics of populations are to some extent natural estimators of demographic characteristics. The relationship between the demographic parameters of a population and its genetic characteristics may in some cases be approximated by a single parameter: the ``effective population size.'' Effective population size refers the demographic complexity of some real population to the simplicity of some ideal population --- in other words, it is a measure of the extent to which a natural population corresponds to some theoretical population model.

    The effective population size has become a central aspect of our understanding of the ancient structure of human populations. It is through this concept that the genetic variation of present-day humans may inform us about the number and relationships of humans in the past. However, effective population size itself is not a demographic parameter. If the theoretical model does not apply accurately to human evolution, then inferences based on the estimates of effective population size may be in error. Here, I present the theoretical basis of effective population size, including many of the demographic and evolutionary conditions that can confound the relationship of genetic variation and population size.

    The Wright-Fisher model

    The mathematical theory of population genetics was developed early in the twentieth century, principally by Ronald A. Fisher, Sewall Wright, and J. B. S. Haldane [1]. The initial success of population genetics was the development of mathematical account of inheritance that reconciled Mendelian inheritance with continuous traits [2]. This development made possible a deterministic model of Darwin's natural selection in terms of change in gene frequencies [3][4][5]. However, the deterministic model depends on differential equations that are strictly true only in an infinite population. In a finite population, stochastic factors also change gene frequencies. The evolution of natural populations is caused by a hierarchy of factors, some of which are deterministic in their effect on the gene frequency, others predictable only in their variance, and yet others unique or idiosyncratic [6]. The importance of the stochastic factor was considered by both Fisher (1930) [4] and Wright (1931) [5]; their disagreement about its importance became a major focus of theoretical population genetics.

    Many phenomena in finite populations may amplify or dampen stochastic change in gene frequencies. In an infinite population, the variance in the time or number of events such as births, deaths, and matings does not matter to the gene frequency. Absent selection or mutation, an infinite population does not evolve. In a finite population, variance in the times or numbers of births, deaths, and matings causes evolution even in the absence of selection and mutation, as gene frequencies fluctuate slightly from generation to generation. Other factors may increase or decrease the variance in births, deaths or matings, such as assortative instead of random mating, high variance in mating success, or inbreeding instead of outbreeding.

    In the course of several publications, Wright and Fisher explored the stochastic factor by application of a simple population model (e.g. [5][4], which became known as the Wright-Fisher model. In this model, the population consists of N diploid individuals. These individuals mate randomly, die immediately upon reproduction, and are monoecious (i.e., no sex-specific effects of alleles, selfing possible). The population therefore contains 2N genes in each generation, which are assumed to be sampled randomly from the 2N genes in the preceding generation, with replacement.

    The main feature of this model is that it is mathematically tractable. The gene frequency in any given generation is a binomial random variable based on the frequency in the previous generation [7]. The expectation of a gene frequency pt is simply its frequency in the preceding generation pt-1 --- that is, no change in frequency on expectation. The variance in the gene frequency is equal to pt-1(1-pt-1)/(2N) --- this variance is larger for smaller N and for gene frequencies near 0.5. The probability of fixation of a given allele is equal to the initial frequency of the allele, so that the fixation probability of a new introduced mutation is 1/2N. Likewise the probability that two genes taken at random in the population are descendants of a single parent gene is 1/2N. The model is a Markov process in which the transition matrix (probabilities of pt given pt-1 has a maximum nonunit eigenvalue equal to 1-(1/2N). As can be seen from these relations (summarized in Ewens 2004 [7]), stochastic evolution in the Wright-Fisher model is determined by the single parameter of population size --- indeed, the model assumes all other possible factors constant.

    Mutation may be added to the model, at a rate u per gene, in which case the expected number of new mutations in any given generation is 2Nu [4]. When mutations are included in the model, it is possible to derive expectations for sample characteristics such as the frequency spectrum of alleles and the probability of gene identity [8]. Such values involve the parameter θ=4Nu, which indicates that mutation and finite population size are inversely related stochastic factors: A small population with a high mutation rate may have similar sample characteristics to a large population with a low mutation rate.

    No natural population reproduces according to this simple model. However, the model gives rise to calculations of the expectation and variance of many genetic characteristics that might be empirically observed in natural populations. Wright (1931) considered that deviations from the simple model might be treated in terms of their effects on sample characteristics. In this respect, a nonideal population with N individuals might behave in a similar way to the ideal population of some different size, Ne, which he termed the ``effective population size.'' The effective population size of a study population is therefore the number of individuals in an ideal Wright-Fisher model with the same sample characteristics as the nonideal population under study.

    But from the considerations above, it is evident that different sample characteristics depend differently on population size in the Wright-Fisher model. In particular, the probability of identity of two randomly chosen genes depends on the probability of inbreeding (1/2N in the Wright-Fisher model), while the change in gene frequency over time depends on the variance in gene frequency (pt-1(1-pt-1)/(2N) in the Wright-Fisher model). Departures from the Wright-Fisher model may affect these two values in different directions. For example, assortative mating may greatly increase the probability of gene identity without greatly affecting the allele frequency. This insight can be important to conservation, since inducing assortative mating may allow more effective selection against deleterious recessives without materially reducing the frequencies of other genes [9].

    Evidently, a single ``effective'' population size cannot summarize all departures from the Wright-Fisher model: natural populations are not described by a single stochastic parameter. For this reason, three distinct concepts of effective population size are often considered. The inbreeding effective population size is the size of the Wright-Fisher population with the same probability of inbreeding as the study population. The variance effective population size is the size of the Wright-Fisher population with the same variance in gene frequencies as the study population. The eigenvalue effective population size is the size of the Wright-Fisher population in which the maximum nonunit eigenvalue is the same as the study population. It is important to note that ``study population'' here may refer to an empirically observed natural population, or it may apply to a population model. It is also worth noting that population models other than the Wright-Fisher model are sometimes considered, such as the Cannings model [10] or the Moran model [11]. These models sometimes give rise to different effective population sizes, because the parameterization of population size may differ from the Wright-Fisher version.

    These effective population sizes have different uses. Molecular data empirically provides estimates of sample characteristics such as the probability of gene identity and the frequency spectrum of alleles, both of which depend on the probability of inbreeding. For this reason, the inbreeding effective size is most relevant for most studies of genetic data. Sometimes inbreeding is relevant to ecological comparisons; in other cases the variance in gene frequencies may be more relevant. In particular, the variance effective size is relevant to conservation because conservation efforts often attempt to assess the rate of gene frequency change [12]. The eigenvalue effective population size is based on the transition probabilities among gene frequencies, with a leading nonunit eigenvalue of 1-(1/2N) in the Wright-Fisher model. Like variances in gene frequencies, these transition probabilities are not easily estimable from empirical molecular samples, and the eigenvalue effective size has rarely been applied in human population genetics. However, it is important in modeling and has emerged recently in considerations of metapopulation dynamics (e.g. [13][14].

    The model-dependence of effective population size is rarely considered in analyses of molecular data. Ewens (2004) [7] gives a good account of the problem:

    Except in simple cases, the concept [of effective population size] is not directly related to the actual size of the population. For example, a population might have an actual size of 200 but, because of a distorted sex ratio, have an effective population size of only 25. This implies that some characteristic of the model describing this population, for example a leading eigenvalue, has the same numerical value as that of a Wright-Fisher model with a population size of 25. It would be more indicative of the concept if the adjective ``effective'' were replaced by ``in some given respect Wright-Fisher model equivalent.'' Misinterpretations of effective population size calculations frequently follow from a misunderstanding of this fact (Ewens 2004: 37-38) [7].

    Changing population size

    The utility of effective population size comes from the fact that it concatenates many separate stochastic phenomena into a single parameter. As an example, a gene frequency is a single value, with a single degree of freedom. It is therefore sufficient to estimate only a single parameter. This approach obviously runs into trouble when more than one stochastic factor varies in the population.

    One of the most troublesome cases is a change in population size. A population that changes in size violates a basic element of the Wright-Fisher population model. Sjodin et al. (2005) [15] assert that ``effective population size'' in meaningless in the context of most changes in population size, because the allele frequency spectrum, variance in gene identity, and other sample characteristics will be altered in ways that have no equivalent in the Wright-Fisher model. In their view, only changes in size that occur on a different time scale (either much shorter or much longer) than genealogical events can be reconciled with the concept of effective size. Indeed, a survey of the literature on human prehistoric population dynamics shows that changes in size create much confusion, with divergent definitions and concepts of ``long-term effective population size.''

    Nevertheless, the treatment of changing population size in terms of effective size originated with Wright himself and is well-entrenched. Wright (1938) [16] considered the effect of fluctuating population size on inbreeding, finding that the effective size of a population that fluctuates in size is approximated by the harmonic mean of population size taken across all generations. The harmonic mean is much closer to the smallest of a set of values than the largest; effective population size is generally closer to the minimum population size than the maximum. This is the inbreeding effective population size, which predicts gene identity and other sample characteristics that derive from it, such as allele frequency spectra.

    The harmonic mean approximation breaks down as changes in population size become more and more rare or exceptional. For example, we might estimate an ``effective size'' for a population that has undergone a bottleneck, a period of small population size flanked by which would be useful for predicting the expected heterozygosity. But the coalescence times of different genetic loci would be much more variable than expected for the corresponding Wright-Fisher population. For many bottlenecks, these times might have a bimodal distribution --- some genes having been fixed by drift during the bottleneck, others having escaped fixation. This bimodal distribution may particularly characterize different gene loci that themselves have different effective numbers, for instance autosomal versus mitochondrial genes [17].

    Simple population growth induces a disequilibrium compared to the Wright-Fisher model, in which the number of new alleles arising by mutation increases more rapidly than the mean difference between individuals [18]. For growing populations, different characteristics of single molecular samples may lead to very divergent estimates of effective population size. For instance, allele number may lead to a large effective population size estimate at the same time that gene identity generates as small estimate. The discrepancy emerges from the temporal scope of inbreeding underlying the two observed values --- some are influenced by population growth more rapidly than others. The disequilibrium itself serves as a test of population growth [18][19].

    Natural selection

    Generally, analyses of effective population size assume neutrality --- that is, they attempt to quantify the stochastic factor in the absence of selection. Natural selection is a deterministic force, which itself is influenced by the stochastic factors in finite populations. Still, genes under selection are influenced by demography. For example, the long-term selective balance affecting many HLA loci has preserved their allelic diversity over millions of years, but the major functional alleles themselves occur on different haplotypes that are neutral relative to each other, and respond to the population effective size [20]. Balancing selection may mask the effects of population growth, or vice versa [21]. And the long-term survival of polymorphisms under selection assumes some demographic prerequisites \citep{Ayala:1995}, which may be used to test demographic hypotheses.

    Linkage to selected sites may impact the variation of neutral sites, distorting estimates of effective size. The relationship of recombination rate and genetic diversity may reflect these selective processes [22][23]. ``Genetic hitchhiking'' is a phenomenon in which neutral sites linked to a positively selected allele show vast reductions in variability [24][25]. Hitchhiking induces disequibria that resemble those resulting from population growth, naturally because positive selection is the logistic growth of one adaptive allele. Constant purifying selection across the genome can reduce the variation of linked neutral alleles, a phenomenon called ``background selection'' [26][25]. Gillespie (2000) [27] showed that recurrent positive selection could restrict the variation of weakly linked neutral sites even in a population of infinite size. This gives rise to a stochastic effect called ``pseudohitchhiking,'' which generates an estimate of effective population size even for evolutionary models where it is undefined. If the force is powerful in natural populations, it would greatly restrict genetic variation below the amount expected for the Wright-Fisher population model. Pseudohitchhiking may even generate an ``effective population size'' for a population of infinite numbers [28].

    As evolutionary factors, both genetic drift (influenced by population size and mating structure) and natural selection influence the genetic variability of natural populations. For any particular locus, these factors may confound each other, so that the reasons for a particular level of genetic variability may not easily be attributed to either. For any bias in the genetic parameters that might result from selection, an equivalent bias may be found as a product of some demographic history. Indeed, this equivalence marks a deep symmetry between the stochastic effects of drift and selection: ultimately, selection is a demographic phenomenon as concerns a particular allele, as opposed to a full population. It has often been assumed that the effects of drift and selection may be clearly differentiated by among-locus analyses --- while selection should affect different functional loci differently, genetic drift should affect all loci in the same way. However, pseudohitchhiking exerts stochastic effects across many loci [27]. This may explain some cross-species comparisons, which show that genetic diversity does not correlate strongly with population size [29], including mtDNA where there is no correlation between population size and diversity across large groups of animal species [30]. The importance of selection in shaping genome-wide variation remains an unresolved question.

    Genetic versus ecological estimates

    From its definition and application to theoretical populations, it should be clear that the utility of ``effective population size'' is that it provides a way of relating the genetic characteristics of a population to those expected of an ideal population under the Wright-Fisher model. Yet, the genetic characteristics of a population always trail to some extent the demographic and ecological factors that influence them. Because genetic variation ``looks to the past'' in this way, a discrepancy arises between estimates of effective size based on genes and so-called ``ecological'' estimates based on observations of demography and behavior.

    Nunney and Elam (1994) [31] reviewed genetic approaches to estimating effective population size, compared to approaches based on field observations of ecology. Genetic approaches are very straightforward: mathematical expressions derived from the Wright-Fisher model generally include population size. Genetic data from a natural population may be entered into these expressions, yielding a solution for population size. This solution is the effective population size --- it is the value of population size in the Wright-Fisher model that corresponds to the observed genetic data. Nunney and Elam (1994) divided genetic approaches into ``long-term'' and ``short-term'' methods. Long-term methods track the changes in gene frequencies over time, and require recurrent sampling of populations over timescales long relative to their generation lengths. Such surveys may be plausible for genes that are phenotypically apparent (e.g., coat color polymorphisms), although estimates must ensure that such traits are neutral. Sampling of molecular characteristics is more costly, and tracking gene frequency change in long-lived populations may be impractical --- for example, no such study has been performed on a human population. Nevertheless, such long-term studies have great relevance to conservation because they assess the variance effective size. Most important, they estimate the \emph{current} variance effective size, without being confounded by the cumulative effects of genetic drift in the past.

    The vast majority of studies that estimate effective population size from genetic data are short-term studies. These use the characteristics of a single genetic sample, taken at one time, and the result is generally an estimate of the inbreeding effective size. This estimate entails all of the potential confounding factors that have influenced gene frequencies over a long, long time in the study population; generally over a period spanning four times as many generations as the estimate of effective size. Thus, an estimated effective size of 10,000 individuals is an assertion that the gene frequencies have been changing by drift in a population of this size for a time period on the order of 40,000 generations. Such estimates obviously have weaknesses as applied to conservation: although they may assess the current level of variation, they do not inform about the current rate of change in gene frequencies. Most important, because the potential confounding effects include both ancient demographic changes and ancient selection over a very long time period, these estimates have a necessarily uncertain connection to current or historic demography.

    For this reason, ecological estimates of effective size may be more satisfactory. Such estimates require observations concerning natural population densities, migration rates, life history, sex ratio and other aspects of mating pattern. The practical interest in conserving natural populations has engendered a substantial body of theoretical work on the relationship between census and effective population sizes, considering variation in these factors. The following list discusses several classes of factors that influence the ratio of effective to census population size. The list is not intended to be comprehensive, but gives a sampling of important phenomena in natural populations and their effects on neutral genetic variation. These factors are considered in terms of their effects on the inbreeding effective population size, although for the most part they influence variance and eigenvalue effective sizes in similar ways.

    Age structure

    Age-structured populations are all those in which death is not coincident with reproduction. For mammals, the reproductive lifespan is relatively long and features intermittent births of single or multiple offspring. This life history pattern leads to an overlap of two or more generations within the population at any given time. Because a large proportion of individuals are either pre- or post-reproductive, the effective population size of an age-structured population is generally half or less the census size [32].

    1. Maturation age: A higher maturation age leads to a higher proportion of nonreproductive juveniles in the population, reducing effective size relative to census size [32][33].
    2. Variance in breeding age: Earlier breeding has a greater effect than later breeding on changes in gene frequencies [4], so that a population with a high variance in reproductive ages will have a reduced effective size.
    3. Postreproductive lifespan: A long postreproductive lifespan increases the number of individuals without increasing the birth rate, reducing effective size relative to census size. Postreproductive helpers may enable a higher birth rate than otherwise possible, but only among those females for which mothers or other postreproductive helpers have survived. In this way, helpers may also tend to decrease effective population size relative to census size.
    Population structure

    Splitting a population into partially isolated subpopulations or groups tends to impede the fixation of alleles in the population as a whole. But if these subpopulations themselves undergo evolutionary stochasticity, then the fate of alleles will be tied to the fate of the subpopulations. When the population behaves as a metapopulation [34], different subpopulations may have greatly different net reproduction, some areas of suitable habitat may be unoccupied, and the fission and subsequent growth of successful subpopulations may dominate the population history [35].

    1. Subpopulations: A population divided into partially inbred subpopulations retains more genetic variation than a panmictic population of the same size. This is a major factor increasing effective population size in geographically dispersed populations.
    2. Isolation by distance: Wright (1943) [36] defined the concept of effective population size in his isolation by distance model to encompass a finite ``neighborhood'' of spatially proximate individuals. The neighborhood size is used to estimate the inbreeding coefficient for this model, and is much smaller than the total population size.
    3. Source/sink dynamics: A species with static population size may nevertheless occupy geographic areas that differ in productivity. Areas where reproduction is lower than the replacement rate will contribute relatively little to the ancestry of the total population over the long term. The effective population number is reduced by such variation [37][38].
    4. Extinction and recolonization: At an extreme, local groups frequently become extinct and are replaced by colonists from other groups. The population will be derived from a small number of groups at earlier times, which may drastically reduce genetic variation and effective population size [39].
    Family size

    Family size is simply the number of offspring per individual. Under the Wright-Fisher population model, a substantial proportion of individuals have no offspring at all — which makes genetic drift possible. But when the variation in family size exceeds the binomial number predicted under the Wright-Fisher model, genetic drift may be substantially stronger.

    1. Variation in family size: Low variance in family size tends to increase effective size relative to census size; high variance tends to decrease effective size.
    2. Heritability of family size: If large families generate offspring that themselves tend to have large families, this inheritance can vastly decrease effective population size [40].
    3. Polygyny/polyandry: These mating systems tend to alter effective sex ratio away from 1.0, which increases the variance in family size in the population, and decreases effective population size.
    4. Distribution of family size: The Wright-Fisher model predicts that family size will follow a Poisson distribution [41]; different distributions (e.g., binomial) may increase or decrease effective population size.

    The majority of these phenomena tend to reduce genetic variability below that expected for a Wright-Fisher model of the same population size, although there are several exceptions to this trend. This bias toward factors that reduce variation may emerge as a natural consequence of fitness-seeking by organisms: if given a chance, individuals should tend to increase the representation of their own genes at the expense of other individuals. Equal representation of all individuals in the gene pool — as in the Wright-Fisher model — is an unlikely outcome. Natural factors that deviate from the Wright-Fisher model should often bias the gene pool toward a subset of individuals, which increases both inbreeding and the rate of change of gene frequency.

    Human societies

    No study of a human population has considered more than a handful of the factors that might influence the relation of effective population size and census size. Some of the factors, such as the effect of age structure or migration, are relatively visible in the ethnographic present. In a village census, the demographer can note the ages of respondents and their place of birth. She may be able to determine inbreeding patterns (e.g., cousin marriages) and factors influencing reproductive variance (e.g., polygyny). But longer-term factors such as population extinction and recolonization, imbalanced migration, or fluctuations in population size are generally beyond measuring with ecological or demographic means in humans. But although no study of ecological factors influencing effective population size in humans is comprehensive, each provides important evidence about the constraints that affect gene frequencies and gene identity over the short run. They may be evaluated in the context of longer-term genetic data to examine the way that human demography itself may have evolved over time.

    Wood (1987) [42] applied the ecological approach to a human society, using the methods of [32] and [43]. He estimated the ratio of effective to census population size for the Gainj tribe of highland New Guinea, a group of slash-and-burn horticulturalists numbering around 1500 individuals at the time of the study. There were two important departures in this study population compared to the Wright-Fisher model: overlapping generations and a high male reproductive variance. Both features tend to decrease effective size compared to census size; with a census count of 1318 individuals in the study, Wood estimated an effective population size of 650.5, for a ratio of Ne/N of approximately 1/2. In the Gainj, reproductive heterogeneity in males was mainly a result of polygyny. However, although the male reproductive variance was approximately three times that of females, this mating structure was estimated to decrease effective population size by a relatively modest 7 percent. However, Wood noted that the estimate of approximately 1/2 for Ne/N is substantially higher than the value of 1/3 that had often been taken for humans. He interpreted this discrepancy in terms of reproductive lifespan — in his sample, individuals of reproductive age made up a larger proportion than 1/3 of the population. High infant mortality and higher adult mortality rates tend to increase the ratio of effective to census population size.

    Austerlitz and Heyer (1998) [44] (see also [45] examined pedigrees from French Canadian families, finding an autocorrelation in family size from one generation to the next. In this population, large families themselves tended to beget large families, leading to a strong reduction in the effective population size. They estimated that the harmonic mean of this growing population to have been ca. 17000; but the inheritance of family size reduces the effective size to only ca. 1000 individuals. This leads to an estimate of the ratio of effective to census size well under 1/10. Sibert et al. (2002) [46] found that such intergenerational correlations in family size could affect gene genealogies in a similar pattern as population size bottlenecks. It is not known to what extent family size may be inherited in most human population. Quebeçois may be an extreme example where rapid growth is concentrated in large families, or perhaps stationary populations may also have such strong intergenerational correlations.

    Migration is an important influence on genetic diversity in most human populations. It is very difficult to examine the effect of migration apart from other factors, because migration patterns have depended strongly on local population growth. Cavalli-Sforza (1959) [47] considered the effect of migration on effective population size for village isolates in Parma, Italy. With a unique knowledge of the historical context of migration among these villages, Cavalli-Sforza was able to demonstrate that their present genetic differentiation was a product of their history. This genetic differentiation does not characterize all human populations, but provides an important reason why genetic diversity may exceed estimates based on other demographic observations.

    Social stratification by cultural mechanisms may affect genetic differentiation within and among human groups. A single society with little gene flow from outside will tend to have a reduction in heterozygosity if stratification affects mating, just as for assortative mating and other deviations from panmixia. Estimates of effective population size will be more strongly influenced by differential gene flow into different social strata. For example, Bamshad et al. (2001) [48] found that genetic samples from higher-ranking castes in India tended to share more alleles with Europeans than samples form lower-ranking castes, which share more alleles with other Asians. Since gene flow from different source populations appears to have been correlated with caste, the overall effect of stratification has been to inflate the overall genetic diversity of the population while limiting within-caste variation. Likewise, differences in admixture rates between Africans and other populations within the New World has influenced the genetic diversity of local geographic regions. For example, Parra et al. (2001) [49] assessed the frequencies of genetic markers in African Americans in different parts of South Carolina, finding that European gene flow increased with distance from the Atlantic Coast, and exhibited a historic sex bias. The net effect was an increase in genetic diversity and differentiation with geographic location. Boundaries between living hunter-gatherers and agricultural populations may exhibit differential gene flow that generates similar patterns of differentiation. This may be an important reason for the apparent high genetic diversity of living hunter-gatherer populations within Africa, despite their current small census sizes [50][51].

    Pleistocene human populations

    Ancient human material and skeletal remains have been found across large parts of Africa, Asia, and Europe. By the beginning of the Middle Pleistocene, some 780,000 years ago, ancient humans occupied at least 35 million square kilometers [52][53][54]. This estimate includes large parts of the tropical and subtropical Old World, but excludes constant and periodic desert, rain forest, inundated continental shelf, and the northern tier of steppe and boreal forest. Although there were likely substantial fluctuations in geographic range over time, the estimate of 35 million km2 is conservatively low for the past 500,000–800,000 years.

    To arrive at an estimate of population numbers, the geographic range must be multiplied by some population density. The range includes areas with varying resource densities, some of which may have been marginal for ancient hunter-gatherers without projectile weapons or sophisticated organizational strategies [55][56]. Therefore, the population density applied across this entire range would be substantially lower than might have obtained within long-lasting local breeding populations. Observations of population densities in ethnographic hunter-gatherers vary substantially. Weiss (1984) [54] applied estimates of population density based on ethnographic observations in recent Native Australian groups [57][58]. The overall estimate of Australian population density before European contact was approximately 0.28 persons per square kilometer [54]. However, this overall continental estimate includes groups with widely varying ecologies, from those living in subtropical rainforests, to temperate open woodlands or desert. Birdsell (1993) [59] estimated that the range of population densities among Australian groups may have varied from 1 person per square kilometer in areas of dense resource availability to 1 person per 100 square kilometers in marginal desert regions. Applying the minimum estimate of 1 person per 100 km2 yields a global census size estimate of 350,000 individuals. This is likely to have been near the minimum of a long-term fluctuating population of Pleistocene humans.

    This estimate of 350,000 individuals would be of the census population size of humans globally during the Middle Pleistocene. In strong contrast, the effective population size of humans globally during this time period has been estimated from many sources at only 10,000 individuals.

    The earliest studies of variation used protein polymorphisms to arrive at this figure [60][61][62]. Haigh and Maynard Smith (1972) proposed that the slight amount of human polymorphism might be explained by an ancient bottleneck of population size — a period of time during which human populations were very small compared to their present numbers. This hypothesis was later applied to a broader range of protein polymorphism data [29], and then RFLP data from the mitochondrial DNA [63]. Later studies discovered consistent levels of variation for Y chromosome [64] and autosomal genes [65][66]. The Wright-Fisher equivalent of the ancestral human population would have contained 10,000 persons.

    Considering the number of ways that natural populations may differ from the Wright-Fisher model, there might have been many reasons that human populations had such low genetic variation compared to their census numbers. It is important to note that this discrepancy between census and effective sizes characterizes most mammal species to some extent, with carnivores and primates in particular showing low genetic variation compared to their census sizes [29]. A number of phenomena may explain this discrepancy, at the same time providing valuable information about the dynamics of Pleistocene human groups.

    One explanation for low human genetic variation is that ancient population structures resulted in higher inbreeding than typical today. Takahata (1994) [67] applied a model of extinction and recolonization of subpopulations to human evolution. In this model, the human population is assumed to have consisted of small groups that frequently became extinct and were replaced by other groups. Eller et al. (2004) [68] extended the model to demographic parameters drawn from the ranges observed in recent hunter-gatherers. This kind of model can account for a severe reduction in genetic variation compared to the expectations for the census size of a population, because most of the population will be descended from a few ancestors at any earlier time. Considering the fluidity of hunter-gatherer groups, it may be unclear whether a model of recurrent extinctions and low migration is appropriate [69].

    In many other respects, it seems likely that the ratio of effective to census population size actually decreased over time. For example, overlapping generations present more of a limit on genetic variability today than at any time during the Pleistocene, because the human lifespan is much longer [70], generating a much larger number of postreproductive individuals. Likewise, migration distances greatly reduced after the advent of agricultural economies, increasing the genetic differentiation of local populations from each other.

    A second explanation for low genetic variation relative to census population size is that the census population size used to be much smaller. A bottleneck with a short duration can explain some aspects of human genetic variation, such as the much lower variation of mtDNA and Y chromosome compared to autosomes and the X chromosome [71]. However, a short bottleneck can have only a slight effect on the overall level of genetic variation. A number of researchers adopted the hypothesis that current human genetic variation is the product of a very long history of small population size in equilibrium [72][73][74]. In this view, the reason why human genetic systems
    have an inbreeding effective size on the order of 10,000 is that the number of breeding individuals in the human species was in fact near 10,000 during most of the Pleistocene. A corollary of this hypothesis is that many ancient human fossils must represent different species not ancestral to any living people — otherwise, their genes should remain with us today and inflate the current level of genetic variation.

    Since the population size is clearly much larger than 10,000 today, the bottleneck hypothesis also requires a massive expansion of population size during the late Pleistocene. It is clear from archaeological data that human populations did expand massively during the Late Pleistocene [75]. But there is little genetic evidence for such an expansion, aside from the mtDNA and Y chromosome [76][21]. Instead, autosomal variation suggests at best a very slight bottleneck during the past 70,000 years [77][78]. And a long-term bottleneck down to as few as 10,000 individuals is inconsistent with anatomical and genetic evidence for gene flow among Pleistocene human populations [79][80][81]. This evidence supports the hypothesis that a substantial proportion of Pleistocene human remains represent ancestors of living people instead of extinct species.

    A third hypothesis is that selection has limited the genetic variation of humans and other species. In order to affect both functional and apparently nonfunctional sites, this selection would involve widespread hitchhiking or pseudohitchhiking. Theoretical models suggest that pseudohitchhiking may explain some empirical results, such as the lack of relationship of mtDNA variation and census size across animal species [30], or the association of genetic diversity and local recombination rate in Drosophila [82]. It is now known that recent selection was very widespread in human prehistory [83][84]. {However, there is no strong association of local recombination rate and genetic diversity in humans [85], even though hitchhiking would predict such an association [23].

    None of these three hypotheses yet provides a compelling account of human effective population size. It is clear today that an effective size of 10,000 individuals refers only to a theoretical model that is inaccurate in many possible ways. But we do not know whether a more correct population model would have 30,000 individuals or 300,000 — or even more. Therefore, it is not yet obvious whether human genetic variation can inform us about the geographic location or mating systems of ancient people. The few estimators available are very course in their resolution. Deciding which factors actually operated on Pleistocene humans remains an active area of theoretical interest.

    Summary

    Effective population size is one of the central concepts of population genetics, but its complexity is seldom fully understood. The concept pertains to an ideal population model, the Wright-Fisher model. The primary purpose of the model is mathematical simplicity, and no natural population conforms to its predictions. However, the model forms a kind of baseline against which the variation in natural populations of the same size can be measured. The genetic evolution of a population is predicted to be constrained by demography in accordance with the effective size. However, at least three different effective population sizes (inbreeding, variance, and eigenvector) predict different aspects of the genetic evolution of a population.

    Several demographic and evolutionary factors may deviate from the Wright-Fisher model. Most of these tend to reduce effective population size compared to the census size. Of these, the largest effects relevant to human evolution come from fluctuations in population size, hitchhiking due to selection on linked sites, overlapping generations, and between-generation autocorrelation of family sizes.

    Human populations during the Middle Pleistocene and later appear to have had census numbers of 350,000 persons or more. In contrast, human genetic variation is consistent with a Wright-Fisher population of only 10,000 persons. The apparent discrepancy between these values has led to much theoretical and empirical investigation of human genetic variation. At present, the relative importance of demography, selection, and changing environments to human genetic variation during the past million years remain unclear.


    References

    1. Provine WB. 1971. The Origins of Theoretical Population Genetics. Chicago.
    2. Fisher RA. 1918. The Correlation Between Relatives on the Supposition of {Mendelian} Inheritance. Transactions of the Royal Society of Edinburgh 52:399–433.
    3. Haldane JBS. 1927. A Mathematical Theory of Natural and Artificial Selection. Transactions of the Cambridge Philosophical Society 23:19–41.
    4. Fisher RA. 1930. The Genetical Theory of Natural Selection. Oxford.
    5. Wright S. 1931. Evolution in Mendelian Populations. Genetics 16:97–159.
    6. Wright S. 1955. Classification of the Factors of Evolution. Cold Spring Harbor Symposia in Quantitative Biology 20:16–24D.
    7. Ewens WJ. 2004. Mathematical Population Genetics. Cambridge, UK.
    8. Ewens WJ. 1972. The Sampling Theory of Selectively Neutral Alleles. Theoretical Population Biology 3:87–112.
    9. Templeton AR, and Read B. 1994. Inbreeding: one word, several meanings, much confusion. In: Loeschcke V, Tomiuk J, Jain SK Conservation Genetics. Conservation Genetics. Birkhaduser Verlag. p 91–106.
    10. Cannings C. 1974. The Latent Roots of Certain {Markov} Chains Arising in Genetics: A New Approach. 1. Haploid Models. Advances in Applied Probability 6:260–290.
    11. Moran PAP. 1958. Random Processes in Genetics. Proceedings of the Cambridge Philosophical Society 54:60–71.
    12. Crow JF, and Denniston C. 1988. Inbreeding and Variance Effective Numbers. Evolution 42:482–495.
    13. Whitlock MC, and Barton NH. 1997. The effective size of a subdivided population. Genetics 146:427–441.
    14. Lehmann L, and Perrin N. 2006. On Metapopulation Resistance to Drift and Extinction. Ecology 87:1844–1855.
    15. Sjödin P, Kaj I, Krone S, Lascoux M, and Nordborg M. 2005. On the Meaning and Existence of an Effective Population Size. Genetics [Internet] 169:1061–1070. Available from: http://dx.doi.org/10.1534/genetics.104.026799
    16. Wright S. 1938. Size of a population and breeding structure in relation to evolution. Science 87:430–431.
    17. Fay JC, and Wu CI. 1999. A human population bottleneck can account for the discordance between patterns of mitochondrial versus nuclear {DNA} variation. Molecular Biology and Evolution 16:1003–1005.
    18. Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis of {DNA} polymorphism. Genetics 123:585–595.
    19. Fu Y, and Li W. 1997. Estimating the age of the common ancestor of a sample of {DNA} sequences. Molecular Biology and Evolution 14:195–199.
    20. Takahata N, and Satta Y. 1998. Footprints of intragenic recombination at \\emph{HLA} locus. Immunogenetics 47:430–441.
    21. Harpending HC, and Rogers AR. 2000. Genetic perspectives on human origins and differentiation. Annual Review of Genomics and Human Genetics 1:361–385.
    22. Nachman MW, Bauer VL, Crowell SL, and Aquadro CF. 1998. {DNA} variability and recombination rates at X-linked loci in humans. Genetics 150:1133–1141.
    23. Nachman MW. 2001. Single Nucleotide Polymorphisms and Recombination Rate in Humans. Trends in Genetics 17:481–485.
    24. Braverman JM, Hudson RR, Kaplan NL, Langley CH, {}, and Stephan W. 1995. The hitchhiking effect on the site frequency spectrum of {DNA} polymorphisms. Genetics 140:783–796.
    25. Kim Y, and Stephan W. 2000. Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics 155:1415–1427.
    26. Charlesworth B, Morgan MT, and Charlesworth D. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289–1303.
    27. Gillespie JH. 2000. Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155:909–919.
    28. Gillespie JH. 2001. Is the population size of a species relevant to its evolution?. Evolution 55:2161–2169.
    29. Nei M, and Graur D. 1984. Extent of protein polymorphism and the neutral mutation theory. Evolutionary Biology 17:73–118.
    30. Bazin E, Glémin S, and Galtier N. 2006. Population Size Does Not Influence Mitochondrial Genetic Diversity in Animals. Science [Internet] 312:570–572. Available from: http://dx.doi.org/10.1126/science.1122033
    31. Nunney L, and Elam DR. 1994. Estimating the Effective Population Size of Conserved Populations. Conservation Biology 8:175–184.
    32. Hill WG. 1972. Effective Size of Populations with Overlapping Generations. Theoretical Population Biology 3:278–289.
    33. Nunney L. 1993. The influence of mating system and overlapping generations on effective population size. Evolution 47:1329–1341.
    34. Levins R. 1969. Some Demographic and Genetic Consequences of Environmental Heterogeneity for Biological Control. Bulletin of the Entomological Society of America 71:237–240.
    35. Gilpin M. 1991. The Genetic Effective Size of a Metapopulation. Biological Journal of the Linnaean Society 42:165–175.
    36. Wright S. 1943. Isolation by Distance. Genetics 28:114–38.
    37. Beerli P, and Felsenstein J. 2001. Maximum likelihood estimation of a migration matrix and effective population sizes in \\emph{n} subpopulations by using a coalescent approach. Proceedings of the National Academy of Sciences, U. S. A. [Internet] 98:4563–4568. Available from: http://dx.doi.org/10.1073/pnas.081068098
    38. Wakeley J. 2001. The Coalescent in an Island Model of Population Subdivision with Variation among Demes. Theoretical Population Biology [Internet] 59:133–144. Available from: http://dx.doi.org/10.1006/tpbi.2000.1495
    39. Maruyama T, and Kimura M. 1980. Genetic variability and effective population size when local extinction and recolonization of subpopulations are frequent. Proceedings of the National Academy of Sciences, U. S. A. 77:6710–6714.
    40. Nei M, and Murata M. 1966. Effective population size when fertility is inherited. Genetical Research 8:257–260.
    41. Hudson RR. 1990. Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7:1–44.
    42. Wood JW. 1987. The genetic demography of the {Gainj} of {Papua} {New} {Guinea}. 2. {Determinates} of effective population size. American Naturalist 129:165–187.
    43. Emigh TH, and Pollak E. 1979. Fixation Probabilities and Effective Population Numbers in Diploid Populations with Overlapping Generations. Theoretical Population Biology 15:86–107.
    44. Austerlitz F, and Heyer E. 1998. Social transmission of reproductive behavior increases frequency of inherited disorders in a young-expanding population. Proceedings of the National Academy of Sciences, U. S. A. 95:15140–15144.
    45. Gagnon A, and Heyer E. 2001. Intergenerational Correlation of Effective Family Size in Early {Québec} ({Canada}). American Journal of Human Biology [Internet] 13:645–659. Available from: http://dx.doi.org/10.1002/ajhb.1103
    46. Sibert A, Austerlitz A, and Heyer E. 2002. {Wright-Fisher} Revisited: The Case of Fertility Correlation. Theoretical Population Biology 62:181–197.
    47. Cavalli-Sforza LL. 1959. Some Data on the Genetic Structure of Human Populations. In: Proceedings of the 10th International Congress on Genetics. Vol. 1. Proceedings of the 10th International Congress on Genetics. Toronto. p 389–407.
    48. Bamshad M, Kivisild T, Watkins SW, Dixon ME, Ricker CE, Rao BB, Naidu MJ, Prasad RBV, Reddy GP, Rasanayagam A, et al. 2001. Genetic Evidence on the Origins of {Indian} Caste Populations. Genome Research [Internet] 11:994–1004. Available from: http://dx.doi.org/10.1101/gr.GR-1733RR
    49. Parra EJ, Kittles RA, Argyropoulos G, Pfaff CL, Hiester K, Bonilla C, Sylvester N, Parrish-Gause D, Garvey WT, Jin L, et al. 2001. Ancestral Proportions and Admixture Dynamics in Geographically Defined {African Americans} Living in {South Carolina}. American Journal of Physical Anthropology [Internet] 114:18–29. Available from: http://dx.doi.org/10.1002/1096-8644(200101)114:1%3C18::AID-AJPA1002%3E3.0.CO;2-2
    50. Chen Y-S, Olckers A, Schurr TG, Kogelnik AM, Huoponen K, and Wallace DC. 2000. {mtDNA} Variation in the {South African} {Kung} and {Khwe} –- and Their Genetic Relationships to Other {African} Populations. American Journal of Human Genetics [Internet] 66:1362–1383. Available from: http://dx.doi.org/10.1086/302848
    51. Tishkoff SA, and Williams SM. 2002. Genetic Analysis of {African} Populations: Human Evolution and Complex Disease. Nature Reviews Genetics [Internet] 3:611–621. Available from: http://dx.doi.org/10.1038/nrg865
    52. Hawks JD. 1999. The Evolution of Human Population Size: A Synthesis of Fossil, Archaeological, and Genetic Data.
    53. Biraben JN. 1979. Essai sur l'evolution du nombre des hommes. Population 1:13–25.
    54. Weiss KM. 1984. On the number of members of the genus \\emph{Homo} who have ever lived, and some evolutionary implications. Human Biology 56:637–649.
    55. Whallon R. 1989. Elements of cultural change in the later {Paleolithic}. In: Mellars P, Stringer CB The Human Revolution: Behavioural and Biological Perspectives on the Origins of Modern Humans. The Human Revolution: Behavioural and Biological Perspectives on the Origins of Modern Humans. Edinburgh. p 433–454.
    56. Gamble CS. 1994. Timewalkers. The Prehistory of Global Colonization. Cambridge, MA.
    57. Birdsell JB. 1972. The human numbers game. In: Supplement No. 9 in Human Evolution. Supplement No. 9 in Human Evolution. Chicago. p 291–294.
    58. Tindale NB. 1940. Distribution of {Australian} Aboriginal tribes: a field survey. Transactions of the Royal Society of South {Australia} 64:140–231.
    59. Birdsell JB. 1993. Microevolutionary Patterns in Aboriginal {Australia}: A Gradient Analysis of Clines. Oxford, UK.
    60. Nei M. 1970. Effective size of human populations. American Journal of Human Genetics 22:694–696.
    61. Haigh J, and Maynard Smith J. 1972. Population size and protein variation in man. Genetical Research [Internet] 19:73–89. Available from: http://dx.doi.org/10.1017/S0016672300014282
    62. Nei M, and Roychoudhury AK. 1982. Genetic relationship and evolution of human races. In: Hecht MK, Wallace B, Prace GT Evolutionary Biology. Vol. 14. Evolutionary Biology. New York. p 1–59.
    63. Cann RL, Stoneking M, and Wilson AC. 1987. Mitochondrial {DNA} and human evolution. Nature 325:31–36.
    64. Underhill PA, Li J, Lin AA, Mehdi QS, Jenkins T, Vollrath D, Davis RW, Cavalli-Sforza L, and Oefner PJ. 1997. Detection of numerous {Y}–chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography. Genome Research 7:996–1005.
    65. Wang DG, Fan J, Siao C, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, et al. 1998. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077–1081.
    66. The International HapMap Consortium. 2005. A Haplotype Map of the Human Genome. Nature [Internet] 437:1299–1320. Available from: http://dx.doi.org/10.1038/nature04226
    67. Takahata N. 1994. Repeated failures that led to the eventual success in human evolution. Molecular Biology and Evolution 11:803–805.
    68. Eller E, Hawks J, and Relethford JH. 2004. Local Extinction and Recolonization, Species Effective Population Size, and Modern Human Origins. Human Biology 76:689–709.
    69. Yellen J, and Harpending H. 1972. Hunter-Gatherer Populations and Archaeological Inference. World Archaeology 4:244–253.
    70. Caspari R, and Lee S-H. 2004. Older Age Becomes Common Late in Human Evolution. Proceedings of the National Academy of Sciences, U. S. A. 101:10895–10900.
    71. Fay JC, and Wu C-I. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405–1413.
    72. Harpending HC, Sherry ST, Rogers AR, and Stoneking M. 1993. The genetic structure of ancient human populations. Current Anthropology 34:483–496.
    73. Harpending HC, Batzer MA, Gurven M, Jorde LB, Rogers AR, and Sherry ST. 1998. Genetic traces of ancient demography. Proceedings of the National Academy of Sciences, U. S. A. 95:1961–1967.
    74. Sherry ST, Harpending HC, Batzer MA, and Stoneking M. 1997. \\emph{Alu} evolution in human populations: using the coalescent to estimate effective population size. Genetics 147:1977–1982.
    75. Stiner MC, Munro ND, and Surovell TA. 2000. The Tortoise and the Hare: Small-Game Use, the Broad-Spectrum Revolution, and {Paleolithic} Demography. Current Anthropology 41:39–73.
    76. Hawks J, Hunley K, Lee SH, and Wolpoff MH. 2000. Bottlenecks and {Pleistocene} human evolution. Molecular Biology and Evolution 17:2–22.
    77. Marth G, Schuler G, Yeh R, Davenport R, Agarwala R, Church D, Wheelan S, Baker J, Ward M, Kholodov M, et al. 2003. Sequence Variations in the Public Human Genome Data Reflect a Bottlenecked Population History. Proceedings of the National Academy of Sciences, U. S. A. [Internet] 100:376–381. Available from: http://dx.doi.org/10.1073/pnas.222673099
    78. Marth GT, Czabarka E, Murvai J, and Sherry ST. 2004. The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three Large World Populations. Genetics 166:351–372.
    79. Evans PD, Mekel-Bobrov N, Vallender EJ, Hudson RR, and Lahn BT. 2006. Evidence that the Adaptive Allele of the Brain Size Gene \\emph{microcephalin} Introgressed Into \\emph{Homo sapiens} from an Archaic \\emph{Homo} Lineage. Proceedings of the National Academy of Sciences, U. S. A. [Internet] 103:18178–18183. Available from: http://dx.doi.org/10.1073/pnas.0606966103
    80. Frayer DW, Wolpoff MH, Thorne AG, Smith FH, and Pope GG. 1994. Getting it straight. American Anthropologist 96:424–438.
    81. Wolpoff MH, Hawks J, Frayer DW, and Hunley K. 2001. Modern Human Ancestry at the Peripheries: A Test of the Replacement Theory. Science 291:293–297.
    82. Betancourt AJ, Kim Y, and Orr AH. 2004. A Pseudohitchhiking Model of {X} vs. Autosomal Diversity. Genetics 168:2261–2269.
    83. Wang ET, Kodama G, Baldi P, and Moyzis RK. 2006. Global Landscape of Recent Inferred {Darwinian} Selection for \\emph{Homo sapiens}. Proceedings of the National Academy of Sciences, U. S. A. [Internet] 103:135–140. Available from: http://dx.doi.org/10.1073/pnas.0509691102
    84. Voight BF, Kudaravalli S, Wen X, and Pritchard JK. 2006. A Map of Recent Positive Selection in the Human Genome. PLoS Biology [Internet] 4. Available from: http://dx.doi.org/10.1371/journal.pbio.0040072
    85. Hellmann I, Ebersberger I, Ptak SE, Pääbo S, and Przeworski M. 2003. A Neutral Explanation for the Correlation of Diversity with Recombination Rates in Humans. American Journal of Human Genetics [Internet] 72:1527–1535. Available from: http://dx.doi.org/10.1086/375657
  • Agriculture, population expansion and mtDNA variation

    Mon, 2011-05-23 11:50 -- John Hawks

    Earlier this spring, I wrote about a paper by Brenna Henn and colleagues that presented new data on SNP variation in recent African hunter-gatherer populations [1] ("Population structure within Africa: has 'modern human origins' become a non sequitur?").

    Another paper that came out this spring from the same research group is also very interesting. Christopher Gignoux, Henn and Joanna Mountain [2] examined the evidence for Holocene population growth in Europe, Africa and Southeast Asia, from within-haplogroup variability of mtDNA haplogroups. The idea is that earlier samples were not finely resolved enough to examine events of the last few thousand years, either because they included only small sequences (e.g., control region) with limited variation, or because they included whole mtDNA genomes with too few individuals to look at within-haplogroup coalescents. So here they add more individuals. It is still a small number (425 total) and so I expect that we will see better ones in the next few years.

    The results are nonetheless useful because they provide some nice matches for the archaeology of early agriculture. For example, in Africa:

    We find two periods of population expansion within our sample of lineages originating during the Holocene in western Africa. Although the majority of coalescent events occur during the Holocene, a number of lineages from this sample also coalesce during the Upper Paleolithic. The earliest growth begins at ≈38,000 ya (CI: 33,500–45,000 ya) (Table 1 and Fig. S1) and the second period begins at ≈4,600 ya (CI: 3,000–10,000 ya) (Table 1 and Fig. 1B). The correspondence between the timing of genetic evidence for a sharp increase in population size at 4,600 ya in our Holocene sample of sub-Saharan Africans and the archaeological evidence for origins of agriculture in western Africa is quite close (Fig. 1B and Table 1). In contrast, our southern African Upper Paleolithic sample representative of hunter-gatherers shows no growth over the past 20,000 y. We suggest Bantu-speaking farmers and other pastoralist groups migrated throughout southern Africa 2,000 ya (27) without impacting southern African mtDNA lineages (Fig. 1B).

    We can't really understand the pattern of genetic variation within Africa without understanding when the population grew. In Africa, Middle Stone Age genetic variation must have been more extensive than that in other regions of the world. But the survival of that MSA variation to the present day depends on the demography of populations over the past 50,000 years. In a growing population, fewer lineages will be lost by random genetic drift. So if Gignoux, Henn and Mountain are right about the growth of West African populations by 35,000 years ago, we might expect that region to preserve some extensive variation from MSA times. That might explain why that population preserves very deep Y chromosome lineages [3]. Regarding only mtDNA, one might conclude that a historical paucity of migration between hunter-gatherer and agricultural groups would be the most important reason why MSA variation remains in the present-day African population. This has been the explanation for survival of deep mtDNA lineages in southern Africa, for example. The Y chromosome result and the current paper remind us that population growth can also preserve variation from earlier time periods.

    I think this proposal of African population history matches very well the model that we assumed in our acceleration paper [4], which we based on the archaeological record. We suggested early population growth in Africa by 35,000 years ago followed by an agricultural expansion after 5000 years ago. The evidence for relatively late agricultural intensification, within the last 4000-5000 years in sub-Saharan Africa, is very clear archaeologically. Less clear: How big was the earlier, pre-agricultural human population? The LSA might correspond to a demographic intensification, generally after 45,000 years ago. Genetics has certainly seemed to support such a view, and we found it consistent with the evidence that positive selection had increased in rate much earlier in Africa than in other regions. Still, the more detailed study by Gignoux and colleagues helps to clarify this picture.

    The results also show agricultural population growth to have been late in Southeast Asia.

    Direct archaeological evidence for rice agriculture in southeastern Asia dates to only ≈4,400 ya in Thailand (28). Agriculture spread throughout Island Southeast Asia, with evidence of rice in Taiwan again dating to ≈4,400 ya. Our Southeastern Asian Holocene population size curve indicates expansion beginning ≈4,700 ya (CI: 3,000–5,700 ya) (Fig. 1C and Table 1).

    Again, useful. I think we need to exert some effort making sure that the initial dispersal of people into South/Southeast Asia can be differentiated from the post-agricultural history. But assuming that Gignoux and colleagues are correct, it makes sense in an overall picture of slowly adapting early crops to tropical climate regimes, or replacing early domesticates with different ones in those areas.

    I am less sanguine about their results for Europe. They show a gradual period of growth associated in time with the Younger Dryas (around 12,000 years ago), which could make sense in the archaeology. But I am not convinced that the "European" haplogroups here are really European to that time depth. We know that the Neolithic and post-Neolithic saw some large-scale shifts in the frequencies of mtDNA haplogroups in Central and Western Europe. Some Upper Paleolithic Europeans probably contributed mtDNA to this later population, but I have no confidence that the proportion was great enough to accurately infer the demography of that pre-Neolithic population. (This is also a problem with the current paper in Current Anthropology by Peter Rowley-Conwy. I'll discuss this sometime soon.)

    The next frontier in reconstructing the population history of Europe will be ancient DNA. A good sample of Neolithic and pre-Neolithic whole mtDNA genomes would settle this question and allow inferences about the kind of demographic recovery Europe underwent after the Last Glacial Maximum.

    An open question is to what extent the other populations have similar problems. The European population of today reflects West Asian population dynamics 10,000 years ago. The East African population today reflects West African population dynamics from before the Bantu expansion, possibly to a similar extent. The population of Southeast Asia reflects the population dynamics of early rice agriculturalists in South China. And so on.

    Adding large-scale migration and partial population replacement to this kind of demographic analysis is not easy, but it will be essential if we want a better picture of how agriculture affected human populations. Considering these problems, I think it's easy to see why I started working on Holocene population dynamics. Evidence about Late Pleistocene populations, like MSA Africans and Neandertals, still lies within our genomes. But we see it through a lens. Holocene population dynamics -- movements and population growth -- distort that lens. If we don't account for those Holocene dynamics, we will conclude wrongly about the earlier dynamics.

    I like this a lot, because this is what anthropology is really good for. We can bring a lot of archaeological and historical knowledge to bear on the question of post-agricultural population dynamics. But it's a deep, deep field with a lot of specialized literature.


    References

    Synopsis: 
    A study of mtDNA variation attempts to find the times and magnitudes of population expansions in early agriculturalists.
  • Passing on your fertility to your kids

    Fri, 2010-05-14 10:37 -- John Hawks

    From the NY Times earlier this spring, a profile of a New York woman with an exceptional legacy:

    WHEN Yitta Schwartz died last month at 93, she left behind 15 children, more than 200 grandchildren and so many great- and great-great-grandchildren that, by her family’s count, she could claim perhaps 2,000 living descendants.

    The story talks about her history and how she came to have such a large family. By itself, having 15 children would be unremarkable except that the children and grandchildren themselves all went on to have large families ("Like many Hasidim, Mrs. Schwartz considered bearing children as her tribute to God."). After a couple of generations, it adds up to a lot of descendants.

    I don't think the story is all that unique. Within the United States there are many communities, like the Hutterites, Old Order Amish, and Hasidic Jews, where large family sizes are the norm. Probably hundreds of women on earth can claim more than a thousand living descendants, and thousands more have only to wait until they are old enough, while their children and grandchildren's families continue to grow.

    You can get there by having 10 children, each of which has 10, and each grandchild has 10 -- that adds up to 1110, giving some extra for different generation times and losses. Of course, it's a trick to live long enough to see the 1000 great-grandchildren, but the early ones should already have given you a fraction of your 10000 great-great-grandchildren.

    What's surprising here? Not the family sizes themselves -- big families are common in most human populations. The high offspring numbers are not as apparent in populations that have high juvenile and infant mortality, but many pregnancies was the norm prior to the industrial transition.

    No, what's surprising about huge numbers of living descendants is the correlation between generations. In these cases, the correlation is driven by religion and various social proscriptions related to religious observance.

    I often talk about models and real human population structures in my classes. One obviously unrealistic aspect of the Wright-Fisher population model is its reproductive variance. In the Wright-Fisher model, reproductive variance is binomial -- every gene in an offspring population is equally likely to descend from each gene in the parental generation. In the model, it is possible -- albeit extraordinarily unlikely -- for a single parent to give rise to the entire offspring generation. That just can't happen in a real population, certainly not in humans. The effect of that unrealistic assumption of the model is not great, however, because even in the model the chances have having more than 10 offspring, while possible in theory, are negligible. If anything, the Wright-Fisher model is too conservative about the variance of offspring number -- real human populations have a non-negligible fraction of women who have 10 or more live births.

    I get more concerned about other deficiencies of simple models, which are sometimes harder to deal with. One of those is the correlation of offspring number between generations. If there is even a slight correlation, women tending to have more children because they came from larger families, it has a major effect on the amount of inbreeding in the population.

    You can think about it genealogically. Suppose you live in a small town with a few big families. The chances that you yourself were born into one of those big families is small. But if today's big families tended to come from yesterday's big families, with each generation we go back in time, it becomes more and more likely that one of your ancestors came from one of those big families. Still looking backward in time, your genealogy becomes captured by those big families, branch by branch. Since there are few big families in the town, once two or more lines of your ancestry trace to them, those lines will rapidly share a common ancestor. That's inbreeding, from the perspective of your genealogy.

    In small towns, that process isn't inevitable because people move in from elsewhere. Most of the lines of your genealogy will probably come from other towns within a few generations. But if we consider the human species as a small town, well, there's nowhere else to move in from. If the population structure of our species has included a strong correlation of offspring number between generations, it will have massively reduced our genetic variation.

    Since we have low genetic variation as a species, you can see why this is potentially interesting.

    Masatoshi Nei and Motoi Murata back in 1966 worked out a relation between intergenerational correlation in offspring number and effective population size. That's before the days of computer models, for you simulation jocks out there. The "effective" size of a population, as I've noted here many times, is the one parameter of a Wright-Fisher model, as estimated from the genetic variation within a population. It's a statement about how inbred the population looks, assuming that its evolution followed a random-mating model throughout its history. Now, that model is wrong in pretty much every interesting case, and so there are various mathematical transformations that attempt to account for the effects of different mating structures.

    In the case of intergenerational correlation of offspring number, Nei and Murata derived an expression to predict the reduction of effective size to be expected from this correlation, assuming a model in which the variance in offspring number is distributed in a certain way. The solution isn't general -- if offspring number were distributed in some other way, the effect of the same measured correlation may be quite different. And in their model, they were concerned with the case where the correlation of offspring number is influenced by genes that determine fitness -- in other words, genes under selection in the population. So it's not a complete answer, but it's a start.

    Nei and Murata cited empirical data from several earlier studies that showed a correlation of 0.20 to 0.40 between generations of human offspring number. Under the assumption of their model, a correlation of 0.30 would causes a reduction of the effective size by roughly half.

    That's a big effect. We already expect a reduction of effective size compared to the census count of a human population, because human populations include many non-reproductive individuals -- kids and postreproductive adults make up half to two-thirds of small-scale foragers. If big families have an additional effect of half, it means that the effective size of the population starts out at a fourth to a sixth the census count. So that an effective size of 10,000 really means 40,000 to 60,000 people on the ground.

    Still low, but as one factor among many it may be very important -- and possibly the distribution of variance caused a further decline. It's much worth investigation.

    A correlation of offspring number between populations can be caused by many ecological or cultural factors. Nei and Murata (1966) had considered the case where fitness itself is inherited, because of the presence of selected genes. But in humans, a more pervasive force is cultural inheritance. This factor was discussed in 1976 by the demographer Samuel Preston, attending to the importance of cultural preferences in contemporary populations:

    Since children of each generation are drawn disproportionately from families of women with high fertility achievements in the past, it may be expected that a pronatalist selective bias operates each generation with respect to the transmission of "tastes" for children. It has also been suggested that personality traits which may affect fertility achievement, such as the ability to defer gratification, may be transferred to some extent between parent and child (Kantner and Potter, 1954). It is also reasonable to suggest that biological fecundability is partially inherited. The positive correlation between the social classes of parent and child implies that economic constraints impinging on the childbearing process tend to be similar for the two generations (Preston 1976:110).

    In small-scale societies, these forces are somewhat different. But I wouldn't expect them to be less -- indeed, the social competition between families is probably more intense. The entire "Macchiavellian intelligence" model of cognitive evolution implies that these kin-level effects were pervasive throughout human evolution over the past 2 million years or more. A strong cultural inheritance of fitness is really necessary for selection on genes that influence prosocial kin-related behaviors.

    How intense? Seems like a good question to investigate, as it may have a lot of importance to understanding genetic variation in our ancestors -- including our common ancestors with the Neandertals, whose genetic variation was limited just as much as our own.

    On the subject of effective population size, I'll be posting next week about chimpanzees and bonobos. More genetically variable than us? Well, some of them...

    References:

    Preston SH. 1976. Family sizes of children and family sizes of women. Demography 13:105-114.

    Nei M, Murata M. 1966. Effective population size when fertility is inherited. Genet Res 8:257-260.

  • Misinformation about brain evolution

    Mon, 2010-03-29 10:58 -- John Hawks

    Due to Jerry Coyne, I encountered an interview in the Guardian with Colin Blakemore: "Colin Blakemore: How the human brain got bigger by accident and not through evolution."

    The headline is a misnomer, as Blakemore is not denying evolution, he is denying selection. But Blakemore's argument is based completely on a false presentation of the facts. Consider:

    The question is: why is it so big compared to the brains of our predecessors, such as Homo erectus? Until 200,000 years ago, there had been a gradual increase in brain size among hominins, starting three million years ago. Then, abruptly, there was a remarkable increase of about 30% or so.

    That's Blakemore. Now, here's a chart of endocranial volumes of Pleistocene human fossils:

    Endocranial volumes of Pleistocene human fossils

    Endocranial volume against time for fossil Homo.

    Time is in thousands of years before present, running left to right.

    As you can see, there's no sudden jump 200,000 years ago, or at any other time. The data, such as they are, are consistent with a single pattern of increase over time, as pointed out by Sang-Hee Lee and Milford Wolpoff (2003).

    Heck, it's the lack of a sudden jump that has gotten all the attention. Because if "modern" humans suddenly showed up in Africa 200,000 years ago, and all of a sudden had vastly larger brains than any other hominins, wouldn't that be a simple and tidy story? Don't you think we'd all be talking about the sudden origin of modern humans as reflected by their larger brains?

    It just didn't happen.

    Well, it's one thing to be empirically wrong. That's a simple error that's easily corrected. But Blakemore, relying on the erroneous assumption of a single shift in brain size, asserts that neutral macromutations must be an important mode of human brain evolution:

    Genetic studies suggest every living human can be traced back to a single woman called "Mitochondrial Eve" who lived about 200,000 years ago. My suggestion is that the sudden expansion of the brain 200,000 years ago was a dramatic spontaneous mutation in the brain of Mitochondrial Eve or a relative which then spread through the species. A change in a single gene would have been enough.

    I hope that the empirical pattern is enough to convince you that this hypothesis is false. The "sudden increase" simply did not happen.

    But in case you need more persuasion: Blakemore here assumes that the increase in brain size had no negative consequences. Otherwise it couldn't proceed neutrally. Here is his argument:

    The environment of early humans was so clement and rich in resources that this greedy new brain, which would have absorbed even more of the body's energy, could be sustained without danger. Later, when times got hard, during droughts or climate changes, it helped us deal with these crises, which could otherwise have killed us off, by dreaming up novel ideas to problems.

    You see the outline: Life was easy, and humans could grow fat-brained, like so many sheep. Fortunately, our fat brains were then useful when times were tough. Blakemore describes this as somehow different from the idea that brains were adaptive -- it's in fact just another adaptive story for larger brains.

    But it falls apart, when we consider that assumption -- life was easy. I put it to my students this way: Suppose you lay a lot of sugar beets out on your land. What will happen to the deer population?

    The answer is not that the deer will grow fat-brained and later evolve to conquer humanity. The answer is that there will be a lot more deer.

    Population growth is much faster than adaptation, and it's hundreds of times faster than a neutral gene can transit through the population. Humans in the past were not a static population, living in peace with an abundant environment. They were repeated faced with Malthusian crises -- on submillennial timescales. That's why a close understanding of climate variability is so relevant to our evolution. The fact that tools and behaviors change so slowly in the Middle Pleistocene is informative -- it shows that humans weren't coming up with dramatically new ways to track shifting ecologies.

    And that means that the selection pressures of the energetic and life history constraints on the brain were repeatedly imposed on human populations. A substantial increase in brain size should have immediately been disadvantageous -- if it had no compensatory benefits to fitness.

    What remains is testing the hypotheses about those benefits to fitness. Blakemore actually is presenting one such hypothesis -- that a larger brain mostly was adaptive because of its ability to transmit traditions. That's testable, and is consistent with the greater transfer of information apparent in recent archaeological traditions compared to Middle Pleistocene ones. But there are other hypotheses as well, and it is quite difficult to compare them with the available record.

    That's why it's so important to state the empirical record accurately.

    UPDATE (2010-03-29): A reader points out that Malthusian crises, in terms of resource or food availability, may have been avoided by warfare or predation -- people kill each other instead of starving. I see that point, particularly where we consider the way that epidemic disease can relax competition for food until population growth resumes. Performing well under predation or competition would be one way that brain size might have had compensatory benefits to fitness beyond its energetic and life history costs.

    References:

    Lee S-H, Wolpoff MH. 2003. The pattern of evolution in Pleistocene human brain size. Paleobiology 29:186-196.

  • The other story about the mammoth DNA

    Wed, 2010-03-24 00:04 -- John Hawks

    I got to writing about a story a couple of years ago, and then stalled out. That happens every so often -- remember, most of my research-related entries are my own notes. You can only imagine how many half-written posts I have, but the AI on my computer has gotten better and better at archiving them.

    In this case, the half-written post lately has grown in relevance, so I've revisited it. In the summer of 2008, Thomas Gilbert and (many) colleagues reported on a phylogenetic analysis of 18 mtDNA genomes from extinct woolly mammoths.

    That's pretty cool, by the way. We now know a lot more about woolly mammoth mtDNA variation than we knew about human mtDNA variation in 1980.

    The mammoth mtDNA is an example of something slightly different than the usual phylogeography -- it adds the dimension of time. Call it phylotemporogeography, if you like. The best comparison? Neandertals -- a group for which the number of mtDNA sequences is very similar, over a similarly wide Palearctic geographic range. I wrote about Neandertal phylogeography last year ("Neandertal races?"), and the topic will surely return sometime this year.

    Different mammoth mtDNA clades, which originated millions of years ago, apparently became extinct at different times. The paper divided the mammoth mtDNA variation into two clades, which diverged approximately 1.7 million years ago. These two clades have different geographic distributions. One, which the authors termed, "clade I," was broadly distributed across Siberia and Beringia. The other, "clade II," appears to have been restricted to one area of Arctic Siberia, between the Taymyr Peninsula and the Lena River. Each of these clades has highly restricted diversity, and taking all the mammoth mtDNA sequences together, they are roughly as diverse as the within-subspecies diversity in living elephants. So that deep branch dividing the two clades accounts for a lot of the restricted diversity within mammoths.

    The interesting thing is that the two clades also have different temporal distributions, based on the radiocarbon dates associated with the remains. The geographically restricted clade II is systematically earlier. The time distributions overlap somewhat, but there is no clade II mtDNA after 30,000 years ago, while clade I lasts up to the extinction of the mammoths in the early Holocene.

    First question: why the deep branch? The simple answer is probably that it's just one of those things. It's difficult to weigh the importance of different parts of the geographic range of mammoths, so I hesitate to guess whether the relatively smaller region of clade II mammoths is "peripheral". It's not at a geographic extreme, but it's hard to judge the migration potential among these regions.

    The region occupied by a minor clade doesn't have to be peripheral or geographically isolated. The oldest branch point in a mtDNA tree is unlikely to be evenly balanced, and given that one clade is likely to be less numerous than the other, it is also likely to be geographically restricted. For all we know, the spatial distribution found among these mammoth mtDNAs is perfectly consistent with neutrality.

    Moreover, given the disappearance of clade II after 30,000 years ago, there aren't very many contemporary sequences that are clade I. We don't really know that they weren't evenly balanced at that time -- nor do we know what mtDNA clades may have been present in the broader range of mammoths across Europe and Beringia (although subsequent papers may have given some information on this).

    Second question: why the replacement of one clade by another? The authors first considered whether the mammoth mtDNA might have undergone a selective sweep:

    All of the observed substitutions appear to be between closely related amino acids. For those proteins having a close homolog with an experimentally determined structure (namely, COX1, COX2, COX3, and Cytb), we also modeled the structure of the mammoth proteins. All substitutions appear in regions on the surface or in loop regions that neither seem essential for proper folding nor would be expected to alter protein function in any obvious way. Therefore, the evidence from the modeled structures suggest [sic] that it is unlikely that the nonsynonymous differences found in the mitochondrial genomes of the two mammoth clades have resulted in any physiological disparities, and thus a selective advantage for clade I based on mtDNA sequence differences alone is not expected (Gilbert et al. 2008:8331).

    I think the authors have done as much analysis of this question as possible, given the available data, but I still think this is very weak evidence against selection as an explanation for the clade II extinction. After all, positively selected mtDNA variants are unlikely to change function in a major way -- big changes being much more likely to be bad under the usual Fisher model of adaptation.

    At any rate, the alternative hypothesis is local extinction, taking a geographically-localized clade with it.

    A more likely alternative is that the loss of clade II is a consequence of its restricted geographical distribution, because taxa with small ranges are generally more prone to extinction compared with widespread taxa. It is therefore conceivable that clade II was lost because of a demographic bottleneck resulting in genetic drift or a local population extinction.

    This seems contradictory. Given that there are no noticeable phenotypic differences between these clades, and that mtDNA clades I and II coexisted in the Lena-Kolyma region, a purely local demographic bottleneck doesn't make much sense. Now, there are alternatives that retain mtDNA neutrality -- for example, a demographic replacement of the Arctic Siberian mammoths by populations expanding from elsewhere (either east or south). This might have been driven by selection involving other aspects of physiology, enhanced by climate forcing. For instance, a long-lasting locally adapted population might give way to a more generalized form due to climate oscillations.

    Bottom line: mammoths were a dynamic population, capable of high mobility and rapid clade replacements on the scale of tens of thousands of years. And the Late Pleistocene was a time of high population turnover even across what should have been ideal mammoth habitat. That dynamism is not unusual for large, long-lived mammals, and is something we should be looking for in the DNA phylogeography of Late Pleistocene hominins.

    References:

    Gilbert MTP and 32 others. 2008. Intraspecific phylogenetic analysis of Siberian woolly mammoths using complete mitochondrial sequences. Proc Nat Acad Sci USA 105:8327-8332. doi:10.1073/pnas.0802315105

  • SNPs and culture history

    Tue, 2010-02-23 07:30 -- John Hawks

    Razib lists a taxonomy of culture-gene historical scenarios. Real worked examples for several of these would be worthwhile.

    It's now several years since I've noticed a lot of interest in the project of correlating gene trees and language trees. That may be because human geneticists have reflected on the importance of geography -- which in most cases seems stronger than any culture-historical factor in explaining allele frequencies. Or maybe it's because nobody ever really understood the "synthetic map" approach.

    Most of the people interested in culture history accounts of migration have focused on Y and mtDNA haplotypes, but I think there's room for new work on SNP genotypes and population history. We need some better models of culture contact and demography, and we need to integrate selection with the models.

  • An insertion into deep history

    Wed, 2010-02-10 00:28 -- John Hawks

    A couple of weeks ago I noted a new article by Chad Huff and colleagues in PNAS. It wasn't available yet when I wrote, but I've had the chance to study it now.

    The paper presents a tremendously clever way of using contemporary genetics to look at different time slices in Pleistocene human evolution. If you can imagine traveling to different parts of the human genome and looking at different times in the past, that's more or less what they are doing.

    We have the genomes of several people now -- the paper focuses on Venter's sequence versus the official HGP draft sequence, but there are others. A whole genome is limited in its utility to look at genetic variation, but it has some very interesting sampling properties. Much of population genetics theory is based on a simple question: what happens if you sample two individuals at random? How similar are they? What will be the distribution of genetic differences between them? How long ago did each of their genes descend from a single common ancestor? Sampling a diploid genome yields precisely the data for which these questions were designed.

    Huff and colleagues dredge up a relatively obscure point of theory. Suppose you take a particular kind of rare event -- they consider mobile element insertions, including Alu and LINE insertions. Even though these elements make up a large fraction of the human genome, the events that give rise to them are rare, occurring only once in a whole genome every 20 births or more. Now, look around the genome and partition it into two kinds of regions. One kind of region will include the rare events (insertions in this case) and the area immediately flanking them. The other will include everywhere else in the genome. Now, the partitioning creates a bias. The areas that include these rare events will, on average, represent more diverse parts of the genome, with deeper genealogies. This is because the intrinsically rare event is more likely to have happened in the long time span represented by such areas than in the relatively shorter times represented by the remainder of the genome. In fact, the average depth of these areas including the insertions should be precisely double the average depth of the areas that lack them.

    In other words, looking at these rare events is sort of like opening the box on Schroedinger's cat. There's something that we shouldn't be able to find out a priori -- how old is the genealogy of a part of the genome? By sifting through the genome and picking out all the parts that have these insertions, we know something about them: We know that they represent a time interval double that of the rest of the genome. Our looking at these insertions has collapsed the likelihood function that relates genetic location to age. When we look at the variation around insertions, we can then ignore some of the events that changed the population's diversity in the last couple of hundred thousand years. And by comparing these sites with the rest of the genome, we have another way to test hypotheses about whether the population was once a lot bigger or smaller than it has been over the last few hundred thousand years.

    The analysis shows that the population in that early part of the genealogy -- corresponding more or less to dates over 1.2 million years ago -- was consistent with an effective population size of 18000 individuals, give or take. As I pointed out in my earlier post, that value itself isn't surprising -- it's a bit higher than the average genome-wide. The best-fit model, including both areas near insertions and the rest of the genome, was one in which the effective population size actually declined from 18,500 to 8500 individuals at 1.2 million years ago. They explain that the recent value should be depressed by the separation of present human populations -- Venter and the human reference sequence both being primarily derived from Europe, they undersample human variation.

    Now, it's easy to see some of the limitations on the analysis. The authors considered only a two-epoch model of population history. That is to say, once upon the time the population was x individuals, then at some time t, the population becomes y individuals. Two epochs of population size, separated by one time. Clearly the actual history of human populations was more complicated than this, but does it matter? Recent history will not greatly influence nucleotide diversity, and in particular the insertions -- because they are intrinsically rare -- are likely to reflect much more ancient events that have survived any subsequent vicissitudes of population.

    But, I suspect that the distribution of insertions with relation to recent selection will make an appreciable difference to the nearby SNP diversity. The geographic distribution of variation will also make some difference, although we won't know how much until we look at non-European genomes.

    Meanwhile, if I were looking to the archaeological record to identify times that made a difference to the human population, 1.2 million years ago would really not register. It certainly would not strike me as a time of substantial reduction of the human population.

    The lack of any archaeological referent is typical of such studies -- after all, they're not trying to match numbers from archaeology, they're trying to establish internally consistent genetic tests of population history. But if these values are real, they must match what we know from the fossil and archaeological record. There is some text in the paper about the small effective size and its relevance to humans as a sign of repeated bottlenecks or other events. As I pointed out earlier, I think 18,000 is pretty significantly large compared to most other estimates of human effective population size. When we get an estimate of human effective size so near those of other apes, we are looking at a value consistent with habitation of a large, certainly continent-wide range by large populations. So now I have to think what the pertinent comparison from the archaeological record should be.

    One archaeological comparison is of special interest to me: a real-life comparison that will be immediately relevant. This study should be giving us information about the population ancestral to Neandertals and humans. In that sense, it duplicates the information that we ought to be able to derive from the comparison of human and Neandertal genomes.

    Interestingly, the effective size estimates published so far for the human-Neandertal ancestral population are much lower than the 18,500 estimated in this study. Green and colleagues (2006) made a point estimate of 3000 effective individuals at the time of Neandertal-human divergence. That estimate is likely to be supplanted by the Neandertal genome release, because the Green et al. (2006) estimate was influenced by some fraction of contaminating sequence from humans. And the error bars on that estimate are large. But there's a lot of space between them -- we're talking about at least a sixfold difference.

    Something doesn't add up. The human-Neandertal ancestral population must have contained all these polymorphic insertions that supposedly occurred before 800,000 years ago. The effective size of the population may have been lower, but if so we should look for some explanation for that substantial loss of variation.

    UPDATE (2010-02-10): A couple of people have asked about effective population size. Here's a helpful post that explains why a small effective size may not mean a small population size, and some of the current hypotheses that try to explain the human value.

    References:

    Green RE, Krause J, Ptak SE, Briggs AW, Ronan MT, Simons JF, Du L, Egholm M, Rothberg JM, Paunovic M, Pääbo S. 2006. Analysis of one million base pairs of Neanderthal DNA. Nature 444:330-336. doi:10.1038/nature05336

    Huff CD. Xing J, Rogers AR, Witherspoon D, Jorde LB. 2010. Mobile elements reveal small population size in the ancient ancestors of Homo sapiens. Proc Nat Acad Sci USA (early online) doi:10.1073/pnas.0909000107

  • High Pleistocene human effective population size

    Mon, 2010-01-18 21:43 -- John Hawks

    Nicholas Wade is reporting on an upcoming paper by Chad Huff and Lynn Jorde: "Genome Study Provides a Census of Early Humans".

    The Utah team based its estimate on the genetic variation present in two complete human genomes, one prepared by the government’s human genome project and the other by J. Craig Venter, the genome sequencing pioneer. The government decoded a single copy of a mosaic genome derived from a medley of people, apparently of European and Asian origin. Dr. Venter decoded both copies of his own genome, the one inherited from his father and the one from his mother.

    The Utah team thus had three genomes to work with and looked at ancient elements known as Alu insertions, the youngest class of which appeared in the human genome around a million years ago. The amount of variation seen in the DNA immediately surrounding the Alu insertions gave a measure of the size of human population at that time.

    Their estimate agrees almost exactly with an earlier one, also based on Alu insertions but with sparser data. The insertions tag ancient regions of the genome that are unaffected by the recent growth in population, Dr. Huff said.

    I'll probably write some more notes on this when I can get a copy.

    At the moment I think it's worth pointing out that the lede of Wade's story is exactly backward. The story is all about how the effective size estimate, 18,500 effective people, is very low. But in reality that's a high estimate compared to what most human geneticists have assumed, only 10,000 individuals.

    Neither estimate is really news. Observations in the early 1970's established that 10,000 was around the right order of magnitude for human effective population size. Around 10 years ago, some gene systems, including Alu insertions, appeared to support a higher estimate of effective size up around 18,000 individuals. That still seemed pretty small in evolutionary terms, and didn't change anybody's ideas about ancient population bottlenecks.

    The differences between these estimates have never really been resolved. As more and more genes got sequenced, human geneticists seem to have just standardized on the small estimate of 10,000 effective individuals -- even as they started to apply more and more complicated computer models to try to derive estimates of expansion and bottleneck times. (I wrote about the problem of effective population size last year, "Cultural impedance, demographic growth, effective population size".)

    A few years ago we started to get good effective size estimates for other primates. As Wade's article points out, the genetic variation of chimpanzees and gorillas lead to estimates of effective size on the order of 25,000 or so individuals. Geneticists noted that these species are therefore much more diverse than humans, with our puny effective size of around 10,000 individuals. Only bonobos seem to be close to the low human value.

    Well, if Huff and Jorde are right, human variation is a lot like the amount of variation in chimanzees and gorillas. Those other apes have lived in geographically structured subspecies spanning tropical Africa for several hundred thousand years.

    Or have they? Maybe there were massive bottlenecks and population replacements among chimpanzee subspecies. Maybe there was a recent "out of Congo" migration that accounts for the low genetic variation of bonobos. Maybe chimps themselves derive recently from some part of their current range.

    Or, maybe the human effective population size isn't so probative.

    In any event, the genomes here are all Eurasian. I wonder how much African genomes will increase the diversity? Could it be that we're even more diverse than chimpanzees?

  • Double the bottlenecks

    Fri, 2009-10-09 20:49 -- John Hawks

    Amos and Hoffman (2009) describe a study of microsatellite (STR) data taken from 53 populations -- the HGDP dataset. They suggest that the worldwide diversity of STR loci is consistent with a double-bottleneck population history: An initial bottleneck accompanying a dispersal of humans from Africa some 50,000 years ago, followed by a second bottleneck as people moved into Beringia much later. Despite the cline of diversity across Eurasia leading to lower diversity farther from Africa, they do not see any evidence for successive (sequential) bottlenecks across this region.

    In general, I think that people are going farther than the data on this question of human migrations. Even considering only 53 population samples, there are thousands of ways that they might have been connected to each other over time. Rarely does anybody test simple null hypotheses, like isolation by distance. Often they mix two or more distinct scenarios in an attempt to find a closer fit to data. The problem is that two or more distinct scenarios will inevitably provide a closer fit, merely by virtue of adding parameters. The question is whether the fit is significantly better.

    I find some things to like in this paper. They treat STR data in a better way than many other studies, and I think they've done the right thing in examining the question of heterozygosity versus allele number. That statistic is worth more consideration.

    From their introduction:

    The question of how many bottlenecks account for the distribution of modern human diversity has been relatively little studied (Rogers & Harpending 1992) and yields conflicting results. First, simulations indicate that the observed pattern is consistent with a linear stepping-stone model featuring a long series of founder events (Ramachandran et al. 2005; Liu et al. 2006). However, this does not preclude equally good fits based on other models. Equally, at the other extreme, large steps in single nucleotide polymorphism diversity between adjacent populations have been used to argue for two dominant bottlenecks, one ‘out of Africa’ and one around the Bering land bridge where humans crossed into the Americas (Hellenthal et al. 2008). The latter event is supported by both mitochondrial data (Wallace et al. 1985; Fagundes et al. 2008) and data from a few nuclear markers (Hey 2005). However, mitochondrial sequences only inform on female lineages, while the adjacent population approach is least reliable in regions like the Bering Strait where population samples are extremely sparse.

    They make a good point here: that "other models" may produce "equally good fits." That is a routine problem in "modern human origins" research -- alternative models are rarely evaluated, and people almost never take a null hypothesis testing approach.

    I've always been hesitant to give much credence to demographic studies based on microsatellites. The evidence for mutation-drift disequilibrium in a stepwise mutation model is an unusual pattern of variance among the individual length variances of STR loci. That's a complicated statistic, and it responds poorly to deviations from the pure stepwise mutation model. In particular, any constraints on allele length will eliminate outliers in allele length variance, making the population look like it went through a bottleneck of some kind.

    The current study looks for disequilibrium in the relation of heterozygosity to allele number -- the logic being that a population crash will eliminate rare alleles but not common ones, leaving heterozygosity nearly the same but cutting allele number substantially. They also are explicit about the problems of the stepwise mutation model:

    One problem with the Bottleneck test is that microsatellites do not follow a strict SMM. Known deviations include mutation biases favouring expansion or contraction (Xu et al. 2000), interruption mutations within the repeat tract that slow the rate of slippage (Jin et al. 1996; Kruglyak et al. 1998), occasional larger ‘jump’ mutations of several repeat units (Di Rienzo et al. 1994; Schlötterer et al. 1998) and some form of upper length boundary that prevents indefinite expansion (Amos & Clarke 2008).

    They go on to argue that their test is less susceptible to deviations in the mutation model. That claim deserves further evaluation. The information they provide about the pattern of variation of different classes of STR loci is useful, as it points to ways that the mutation model may have influenced the appearance of a bottleneck. But since they find "strong and consistent evidence of a bottleneck at the lowest variability loci." Given a correlation between diversity and strength of evidence of a bottleneck, and remembering that the signature of a bottleneck in their test is high diversity per allele, I would want to look for some mechanical explanation for the correlation.

    A (possibly additional) problem: The number of rare alleles within any single subpopulation is rapidly increased by migration from other subpopulations. All you need is a single migrant to bring in a new allele from somewhere else, and you've got another allele. Now, obviously this allele may not be sampled in any given dataset, so you have to account for sampling. But the point remains: it doesn't take much migration to increase allele numbers.

    Migration after any bottlenecks should, then, hide the evidence for them. It's not hard to imagine scenarios equally consistent with the data. For example, if every different population underwent a single bottleneck simultaneously, they would be unlikely to lose the same rare alleles, but would retain nearly the same heterozygosity. As migration resumed between them after the bottleneck, the rare alleles would be replenished in a way the reflects the subsequent migration rate and population size (the same number of migrants has a faster effect on allele frequencies in a smaller population).

    Or, if there were any interaction between a single bottlenecked ("founder") population and other pre-existing populations, it would tend to reduce the sign of a bottleneck. It seems plausible that the data in this paper might be explained by such interactions in South or East Asia, providing them with a store of rare alleles that didn't make it to the Near East.

    This is why it's useful to start simple. The simplest model in this case would include migration and expansion (which we know from non-genetic evidence happened) and no bottlenecks. Plumb these parameters to see if any acceptable fits turn up. They do with the question of the heterozygosity cline alone -- a simple trend of directionally biased migration is sufficient for that. Looking at the allele number together with the heterozygosity may reject that model, in which case you'd want to pick out the next simplest. In that sense, mtDNA may actually be giving more information than the hundreds of STR loci, because with mtDNA clades there is a way to estimate haplotype ages -- meaning that the demographic hypothesis must give rise to those haplotype ages in addition to the geographic dispersion of haplotypes and within-subpopulation diversity.

    References:

    Amos W, Hoffman JI. 2009. Evidence that two main bottleneck events shaped modern human genetic diversity. Proc R Soc Lond B (online) doi:10.1098/rspb.2009.1473

  • Norman Borlaug

    Sun, 2009-09-13 11:00 -- John Hawks

    Yesterday, Nobel-Peace-Prize-winning agricultural scientist Norman Borlaug died. This AP story reviews his life and accomplishments. Without question, Borlaug deserved to be better-known -- a scientist whose work reached out to touch almost all the world's population. One indication of the neglect: going out to find more material to link about Borlaug, the best sources were written years ago.

    Gregg Easterbrook profiled Borlaug in The Atlantic twelve years ago. This article is one that's been cribbed in many of the obituaries you'll see today, and is worth reading in its entirety.

    The popular image casts Borlaug as a foil to doomsayers like Paul Ehrlich. Indeed, the trend toward higher productivity, begun before Borlaug's work in postwar Mexico, seems to have escaped the awareness of the "Population Bomb" crowd, impressed by the demographic trends but ignorant of agricultural trends. But countervailing trends can only go as far as agricultural science progresses, and Borlaug himself saw problems in the future:

    Borlaug continues, "But Africa, the former Soviet republics, and the cerrado are the last frontiers. After they are in use, the world will have no additional sizable blocks of arable land left to put into production, unless you are willing to level whole forests, which you should not do. So future food-production increases will have to come from higher yields. And though I have no doubt yields will keep going up, whether they can go up enough to feed the population monster is another matter. Unless progress with agricultural yields remains very strong, the next century will experience sheer human misery that, on a numerical scale, will exceed the worst of everything that has come before."

    In 2000, Reason's Ronald Bailey (author of Liberation Biology: The Scientific and Moral Case for the Biotech Revolution) interviewed Borlaug, touching on a range of topics from biotechnology to the problems maintaining food supply chains in sub-Saharan Africa. Some highlights:

    Reason: Environmentalists say agricultural biotech will harm biodiversity.

    Borlaug: I don't believe that. If we grow our food and fiber on the land best suited to farming with the technology that we have and what's coming, including proper use of genetic engineering and biotechnology, we will leave untouched vast tracts of land, with all of their plant and animal diversity. It is because we use farmland so effectively now that President Clinton was recently able to set aside another 50 or 60 million acres of land as wilderness areas. That would not have been possible had it not been for the efficiency of modern agriculture.

    And on the history of doomsaying about impending collapse:

    Reason: You mentioned that you are afraid that the doomsayers could stop the progress in food production.

    Borlaug: It worries me, if they gum up all of these developments. It's elitism, and the American people are vulnerable to this, too. I'm talking about the extremists here and in Western Europe....In the U.S., 98 percent of consumers live in cities or urban areas or good-size towns. Only 2 percent still live out there on the land. In Western Europe also, a big percentage of the people live off the farms, and they don't understand the complexities of agriculture. So they are easily swayed by these scare stories that we are on the verge of being poisoned out of existence by farm chemicals.

    Borlaug was far from alone in working on agricultural productivity, but he did stand apart in his social engagement, his early demonstrations that massive gains were possible on dwarf varieties of wheat, and his long sustained focus. He began his work in Mexico at age 32.

    Related: I reviewed The Murder of Nikolai Vavilov, a biography of the Russian agricultural scientist who tried to develop highly productive and disease-resistant crops, but was foiled by Lysenko.

Pages

Subscribe to demography

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.