- This is a story about my work on recent human evolution, describing some of the main results and how the work came about. The story refers to my paper (with Gregory Cochran, Eric Wang, Henry Harpending, and Robert Moyzis), "Recent acceleration of human adaptive evolution," which came out in December, 2007.
Like most good stories in biology, this one begins with Darwin. Darwin was always very interested in animal breeding, which he considered the best analogy for the process of natural selection. Of course, if you're breeding livestock and want to select for some characteristics, it is important to select from as large a herd as possible, because large populations have more variation in them. Darwin recognized this as an important condition for natural selection, which relies on sufficient variation in natural populations.
[A]s variations manifestly useful or pleasing to man appear only occasionally, the chance of their appearance will be much increased by a large number of individuals being kept.... Hence, number is of the highest importance for success.
These words from the Origin, "number is of the highest importance for success" were influential.
This is a quick review of the research, based on a presentation I gave earlier this year. It is not complete, and glosses a number of very important details. A close reader looking for how to do genomics would be better served reading the actual research paper. Here, I'm trying to express the science for everyone else.
By 1930, R. A. Fisher picked up Darwin's idea about numbers, predicting that evolution in large populations could be faster than in small populations. However, this is not in all circumstances, but only where the number of new adaptive mutations is quite small -- in other words, where evolution is "mutation-limited":
The great contrast between abundant and rare species lies in the number of individuals available in each generation as possible mutants.... The importance of the contrast lies with the extremely rare mutations, in which the number of new mutations occurring must increase proportionately to the number of individuals available.
A long history of research in plant genetics (corn breeding), microbial chemostat experiments, and the examination of pesticide resistance in insects support Fisher's concept. For example, flies subjected to low doses of pesticide in the laboratory tend to acquire very complicated patterns of resistance -- involving slight changes in many different genes. These usually aren't transmitted perfectly and often have fitness costs; it's a very imperfect adaptation. But if pesticide is sprayed over a large area, flies sometimes appear very quickly with a single mutation that confers very complete resistance. Here, the very advantageous resistance mutation is incredibly rare -- it only occurs in maybe one in a billion flies. It would never occur in the small laboratory population.
Our growing population
Human populations have been growing rapidly during the last 50,000 years or so. That increase began around the time of the Upper Paleolithic -- that's documented by archaeological evidence. There was a later massive increase during the Neolithic. This agricultural transition actually was quite heterogeneous: earlier in West Asia and China, later in Europe, and then later still in subsaharan Africa. Last, we have within the last few hundred years seen a massive increase in numbers associated with industrialization and globalization of technology.
One day a couple of years ago, Greg Cochran and I were talking about brain evolution. You have to understand, this is long before we knew about any of these genome scans -- they hadn't come out yet. One of the main mysteries of human brain evolution is why it happened apparently gradually for such a long period of time. It is one of the best cases of evolutionary gradualism. But this is a problem, because directional selection would have too be too weak to take such a long time. Now, we know that brain size is constrained in two directions -- larger brains cost more energy to maintain, but smaller brains come with some functional disadvantages. So this creates a situation where new variants that satisfy both constraints -- costing little energy, or making great improvements in brain function -- must be very rare. It should be mutation-limited.
I remember very well, that at precisely the same moment, we both realized -- "Hey, maybe this great increase in human population size made a difference!" Because as we'll see later, the pattern of change in brain size really changed when populations started to get really big.
You see, this is one of those very rare cases where the theory preceded the data! It is quite simple; the rate of mutations in a population is a linear product of the rate per genome and the population size.
Not all mutations are advantageous, and not all advantageous mutations will be fixed. The vast majority are lost. If a mutation has a selective advantage, then the chance that it will proceed toward fixation (and attain high frequency) is 2s -- "s" here is the fitness advantage. That means that 90 percent of new mutations with a 5 percent fitness advantage are simply lost.
The most beneficial mutations are very rare; it is much more likely that a new mutation will be weakly selected. This is another aspect of selection that has been well-known since Fisher. So the chance of fixation increases with s, but the likelihood of the mutation decreases with s -- in fact, the number decreases exponentially as selection is stronger and stronger.
If you put all these together, you can predict how many selected changes you should see in a population that has been growing in size. This tells us the number of new adaptive mutations that should come into the population each generation. It is still linear with population size -- a larger population should have more mutations in precise proportion to its size.
Still, a very small fraction of the mutations in any given population will be advantageous. And the longer a population has existed, the more likely it will be close to its adaptive optimum -- the point at which positively selected mutations don't happen because there is no possible improvement. This is the most likely explanation for why very large species in nature don't always evolve rapidly.
Instead, it is when a new environment is imposed that natural populations respond. And when the environment changes, larger populations have an intrinsic advantage, as Fisher showed, because they have a faster potential response by new mutations.
From that standpoint, the ecological changes documented in human history and the archaeological record create an exceptional situation. Humans faced new selective pressures during the last 40,000 years, related to disease, agricultural diets, sedentism, city life, greater lifespan, and many other ecological changes. This created a need for selection.
Larger population sizes allowed the rapid response to selection -- more new adaptive mutations. Together, the the two patterns of historical change have placed humans far from an equilibrium. In that case, we expect that the pace of genetic change due to positive selection should recently have been radically higher than at other times in human evolution.
Finding selection in the genome
Now, it comes to a problem of how we can see recent mutations that have been selected. A genome scan is based on things that vary, not things that are fixed. So we are looking at some window of frequencies. In our study, that was a window from around 22 to 78 percent.
Before we go too far, it is important to point out that an adaptive gene will be in a window where we can detect it for only a short time -- it spends a long time getting up to an appreciable frequency (here 22 percent, which is our lower ascertainment bound) and a long time going from a high frequency (here 78 percent) to fixation -- this is for a dominant. But it spends only a very short time in the window where we can see it.
And strongly selected genes go through this window quite a lot faster than weakly selected ones.
The importance of this is that we will see genes with different strengths of selection at different ages. Our constraint is that right now all the things we can see are variable -- but some are variable because they originated a short time ago and were very strongly selected, and others are variable because they originated a long time ago, but were very weakly selected.
You can guess, that we expect to see more of the weak ones than the strong ones, because there should be more of them! So the window should give us a view of the strength of selection as well as the number of mutations. If we can estimate the ages of our mutations, then we can predict how many there should be at different strengths of selection, and try to quantify the effect of population size.
Here, we've drawn a graph showing the number of genes in the window, compared with the number that are still variable in the population -- they are on their way to fixation -- but they are outside the window. This is for a growing population, so you see that the number of these genes increases as you get closer to the present.
There are many more that we can't see than the ones we can see -- this is like the tip of the iceberg. That is one aspect of recent selection; these genes are in this intermediate frequency range for a short time, and there will be many more genes that are too rare for us to see with our current methods, but might be very important regionally or locally in some populations.
Based on a model of population growth, we expect to see a big peak corresponding to the period when humans were growing rapidly during the Neolithic. The distribution should plunge down toward the present, because selection would have to be so strong on such a recent mutation for us to see it -- we're talking about 20 percent or more. Those just almost never happen. The true number, remember, is the iceberg under the water -- but we must make predictions about the part we can see.
Linkage disequilibrium and selection
Now, I need to say a few words about how we find these genes when we scan the genome. The International HapMap consists of a list of over 3 million genetic polymorphisms -- SNPs -- taken from a sample of people with ancestry in Northern Europe, West Africa, and East Asia. When we look at a sample of a long stretch of DNA from several people, we will be considering the frequency of many different polymorphisms.
But more important, we have studied whether each polymorphism is linked to the others. As a new positively selected allele increases in frequency in a population, it is initially linked to a wide region including many nearby polymorphisms. This induces a long-distance association among SNPs, which is called linkage disequilibrium.
When we are looking at a stretch of chromosome, what we can observe is that there are areas where recombination seems to be very rare around one SNP -- an in particular where one of the two SNP alleles has almost no recombinant chromosomes, but the other allele appears to have been recombining normally. That kind of mismatch is a strong indication of selection.
I'm not going into the details of that process right now; I'll be posting some real examples of such LD decay analyses later in the week. After applying the analysis, we found more than 3000 in the Yoruba sample, more than 2800 in Europeans, and more than 2300 in Asians.
These numbers are very large -- they make it look like this aspect of evolution, positive selection on new adaptive alleles, has been going very fast. But how long a time period are we looking at? Based on the local rate of crossing-over, we can say how quickly LD ought to be broken by new recombinations, and that allows us to derive age estimates. The ages represent the time that has elapsed since the initial mutation that established each adaptive allele.
Here is a comparison between the ages of selected variants in the African HapMap and in the European HapMap. Let's look at this graph a little bit.
Each of these dots represents a number of different genes -- the y-axis is number; this is a histogram. The x-axis is the age. So you see, there are many of these selected genes that started around 10,000 years ago; there are many fewer that started around 40,000 years ago, and even fewer starting 80,000 years ago.
These fitted lines are what you get if you fit a one-parameter model with very strong selection to these curves. You can fit these without considering the effects of population growth.
But you notice some differences here between the African and European distributions. Africa has a few more total variants, but it especially has more older variants, before 10,000 years ago. You can see that during that time period, Europe has very few. And Europe has this later peak, where we see an earlier peak in Africa.
These details are a very good match to demographic growth -- Africa had much larger population size during the Late Pleistocene than Europe, but West Asia, and then Europe had earlier Neolithic expansion than Africa -- so we see these early times have a lot more selected variants within Africa, and later on there is a pulse of adaptive variants in Europe.
At this point, we have a theory that predicts acceleration of new adaptive variants, and we have data that appear to show a very fast recent rate. But we haven't yet directly tested the hypothesis of acceleration.
We chose a null hypothesis approach. After all, the rate of change looks like it has been very high recently, but what it if were always very high. A constant rate of change is a null hypothesis -- the hypothesis of no change, or in our case, no acceleration. So we worked out the predictions of this hypothesis: a constant, high rate of selection. If we could show that those predictions aren't true, then we could disprove the null hypothesis and show that adaptive human evolution accelerated.
We took several different approaches, testing predictions on different kinds of data. For one thing, if the null hypothesis were true, then there should be a whole lot more selected mutations that have already reached or approached fixation, than the relatively small number that we see still varying in human populations. So to test the null hypothesis, we should look for evidence of these fixed selected substitutions.
That's exactly what we did -- we looked at other means of assessing the number of recently fixed and near-fixed variants.
On the bottom of this graph, we have the European age distribution of variants in our window. This should represent a small fraction of the total number that have happened across this time period. But you can see from this graph, that if the rate was constant, the total number should be very, very large -- since we are looking at 10-generation bins, here we have around 150 predicted substitutions every 10 generations, or around 1/2 per year. Most of these should be way above our window, in fact, as we go back toward 40,000 years ago, almost all should be close to or at fixation.
This large number of completed sweeps should have vastly reduced human genetic variation, because polymorphisms tend to hitchhike along with nearby selected alleles. Hitchhiking up to fixation tends to eliminate variation. When we look at the effect of hitchhiking under this constant selection hypothesis, the genome-wide average diversity should be less than a tenth of what we actually observe. So that also disproves the null hypothesis.
How much acceleration?
Down at the bottom of the graph, you see the predicted number of selected variants over our window, under the hypothesis of population growth -- exactly the demographic growth that really happened to humans. And here you see, that there are many, many fewer of these predicted, and in fact over the long course of human evolution, the rate would have been very low.
We can put a number on just how low, and when we do that, we can see how much human evolution has sped up. For example, if we have 1/2 of a substitution per year, well, there are around 12,000,000 years separating humans and chimpanzees (6 million since the common ancestor, in both these lineages). So if adaptive substitutions had happened at a constant rate as high as the last few thousand years, we should be looking at around 6 million fixed adaptive substitutions between humans and chimpanzees.
But in reality there have been nowhere near that number. There are only 40,000 total amino acid substitutions between humans and chimps. Not all those were selected -- maybe only a third. We can add in some additional selected sites outside of coding regions, but still we are looking at an increase in the rate of new adaptive mutations in humans that is 100 times faster than could possibly have been true during most of human evolution.
Our evolution has recently accelerated by around 100-fold. And that's exactly what we would expect from the enormous growth of our population.
What is all this selection for?
We know something about the functional categories of genes inferred to be under selection; we are studying this now. We expect it will keep us busy for some time.
In a general view, they illustrate the idea that changing cultures and ecologies have been important in changing the pattern of selection. For example, many of the selected genes are involved with pathogen defense -- for new pathogens that didn't always exist. Some are apparently related to metabolism or even directly to diet, in terms of processing new food sources. Of course, lactase is an excellent example in this category.
These are not the kinds of phenotypes that have a lot of visibility in skeletal remains. But we have a skeletal record of these populations during the last 40,000 years. We know a lot about what they looked like and how they changed. So we may try to relate the pattern of genetic, skeletal, archaeological, and other kinds of changes over time.
One obvious way to test hypotheses about these changes would be to sample ancient DNA from skeletons. In this way, we could see if the new selected alleles are in them or not. This spring, a paper by Burger and colleagues (PNAS) sampled ancient European skeletons, Neolithic skeletons, for the lactase persistence allele. They didn't find any who had that allele -- not a single one, and this is in Neolithic populations where today the allele is up over 90 percent in frequency. What is going on there?
In this case, it is quite obvious by considering population genetics. We have a very good date for this lactase persistence allele, from many sources -- it is around 6000-10,000 years old. And you can see in the figure, a new selected allele will remain at a very low frequency for a long, long time after its origin. Here, these skeletons were sampled at a time when the selection pressure favoring the allele was present, but the allele had not yet increased to a substantial frequency. In fact, this allele would have been rapidly increasing through these intermediate frequencies much more recently -- we're talking here about Roman times. And today it is over 90 percent in Scandinavia, but considerably lower in Italy and Southern Europe.
In the future, we will be able to sample for genes more widely in ancient skeletons. At the same time, we will be able to sample skeletal changes to try to correlate them with allele origins. That is some research that I have applied for a number grants to support, and I think it will be very promising.
I hope that this essay gives an introduction to the work we have done. This was based on a presentation about the research I gave earlier this year. There are many missing ends, and I'll be adding more information over the next several days about ways of testing for selection, as well as some of the more surprising implications of our research. I've written it without a bibliography, which I can direct you to the paper for a full set of references.