john hawks weblog

paleoanthropology, genetics and evolution

Some genetic drift graphs with Mathematica

Mon, 2008-09-15 00:16 -- John Hawks

The first thing to come up in my lectures is genetic drift. Pretty much everyone who lectures about drift needs a figure showing the results of simple Monte Carlo simulations of sampling drift in a finite population. You start a population with two alleles, sample it randomly in each generation until one or the other alleles disappears. I tend to start with a "population size" of 1000 gene copies, and an initial allele frequency of 50%.

We can do this kind of simulation in Mathematica pretty easily. We'll work with three variables:

  • popSize, which we'll set equal to 1000;
  • a, the number of gene copies with one of the two alleles, initially equal to 500;
  • frequencyList, which will hold the value of a for each generation. Initially, frequencyList holds the first value of a, 500.

Now, we're ready to code the simulation. We set the three variables, and then set up a While[] loop that samples a random binomial deviate in each generation, based on the previous generation's allele frequency (a/popSize):

a = N[500];
popSize = 1000;
frequencyList = {a};
While[a 0,
a = N[RandomInteger[BinomialDistribution[popSize, a/popSize]]];
AppendTo[frequencyList, a]]

The last line appends the value a to the list. Nesting BinomialDistribution[] inside RandomInteger[] gives us a binomial deviate, based on popSize trials and frequency a/popSize. The loop executes until a reaches either zero or 1000, at which point the simulation stops.

OK, that gives us a list, frequencyList, which contains the number of gene copies with one of the two alleles over time. Now, we want to plot that list. We can try:

ListPlot[
frequencyList,
PlotJoined -> True]

...which gives us:

Monte Carlo simulation of genetic drift

That's servicable, but a bit simple. And it's confusing where the axis cuts across. It would be better to have the plot bounded at 0 and 1000, the min and max possible. Also, we need a better font than Times.

Let's try:

ListPlot[
{frequencyList},
PlotJoined -> True,
Frame -> True,
AxesOrigin -> {0, 0},
PlotStyle -> Thick,
FrameLabel -> {"Time (generations)", Frequency},
TextStyle -> {FontFamily -> "Myriad Pro", FontSize -> 12},
Filling -> 500]
Monte Carlo simulation of genetic drift

That's a bit more like it. The Filling -> 500 gives us shading everywhere between our frequency line and the 500 mark, a nice cue reminding us where the simulation started.

Now, we can add a second run in the same plot:

Monte Carlo simulation of genetic drift, two populations

Oh, well that shifted the x-axis. No harm, though. And we continue...

Monte Carlo simulation of genetic drift, two populations

Not a bad representation of the process, with fixation times ranging from very short to quite long, and one going to fixation at zero. The different color shading tends to confuse the picture a bit, and doing it in a larger size for the projector will take some tweaks, but so far it's much more attractive than the Excel version.

We could run a few more simulations to substitute, just in case one made the overall picture more clear (for example, by moving the lines apart in the early phase). But here we have the benefit of honesty --- these are the first four sims I ran.

UPDATE (2009-09-13): I'm revisiting this post as I work on a Mathematica Demonstration of genetic drift. It's much simpler to implement the central drift algorithm with a NestList or a NestWhileList. For example:

NestList[RandomInteger[BinomialDistribution[1000, #/1000]] &, 500, 4000]

gives you the trajectory over 4000 generations. This:

NestWhileList[
RandomInteger[
BinomialDistribution[1000, #/1000]] &, 500, # 0 &]

replicates the output above.

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.