john hawks weblog

paleoanthropology, genetics and evolution

allele frequency

  • Genetic variation and the Hardy-Weinberg proportions

    Mon, 2011-10-03 09:54 -- John Hawks
    Synopsis: 
    Allele frequencies and genotype frequencies are connected by math

    The fundamental information about genetics for any individual is her genotype — the alleles that she has. But genes in populations can be considered in other ways as well.

    For instance, a population consists of individuals, so a geneticist may count the number of individuals with every possible genotype. Comparing these numbers with the total number of individuals, the geneticist may calculate the genotype frequencies, the proportion of individuals who have each possible genotype.

    Cystic fibrosis (CF) is a very rare disorder. Among Americans of European ancestry, only around 1 in 2500 people will develop CF during her lifetime. The disorder is even rarer among people of non-European origin. Geneticists have surveyed many people to discover how many of them carry the disorder, and today a number of states screen newborns for cystic fibrosis as a way of directing affected children to early medical treatment (AAP Newborn Screening Task Force 2000). From this information, geneticists have determined that while only around 0.04 percent of the population is affected by cystic fibrosis, with a genotype of ff, approximately 2.5 percent of people are carriers of the allele, with a genotype of Ff. This leaves some 97.5 percent of the population with the genotype FF. These proportions are the frequencies of the three possible genotypes for this gene: 0.04% ff, 2.5% Ff and 97.5% FF.

    The frequency of an allele is the proportion of copies of that allele compared to the total number of copies of all alleles in a population.

    When geneticists know the frequencies of genotypes in a population, they can estimate how many copies of each allele are in the population as a whole. One recent study used genetic techniques to assess the genotypes of a group of colorectal cancer patients for the two possible alleles (A and B) of the p73 gene on chromosome 1 (Pfeifer et al. 2004). In this sample, 113 people (63 percent) were AA, 54 people (30 percent) were AB, and 12 people (7 percent) were BB. Each AA person has two copies of the A allele and each AB person has one copy, so the total sample included 280 copies of the A allele and 78 copies of the B allele. The allele frequency of the A allele is then 280/358, or 78 percent. The frequency of the B allele in this sample is 22 percent.

    Hardy-Weinberg proportions

    While geneticists can determine the frequency of an allele from the proportions of genotypes in a population, they may also do the same calculation in reverse — figuring out the expected proportions of genotypes from the frequencies of alleles. For example, if the cystic fibrosis f allele is found at a frequency of 5 percent in a population, then the chance that an individual will have two copies of this allele is simply the 5 percent chance for each of the two alleles, multiplied by each other. Five percent times five percent is 0.0025, which is 0.25% twenty-five chances out of 10,000.

    The Hardy-Weinberg genotype proportions are p2 + 2pq + q2 for a two-allele gene.

    The proportion of both homozygotes in the population may be determined in the same way. The probability that an individual will be a homozygote for either allele equals the frequency of the allele times itself, or squared. Thus, in the example above, the chance of f homozygotes is equal to the frequency of f squared. The proportion of heterozygotes is equal to the chance that one allele is F times the chance that the other allele is f, plus the chance that the first allele is f times the chance that the other allele is F. Since these likelihoods are the same, the total chance of an individual being a heterozygote is 2 times the frequency of one allele times the frequency of the other allele. Thus, the proportions of genotypes are p2 and q2 homozygotes, and 2pq heterozygotes.

    These proportions are called the Hardy-Weinberg proportions, after the British mathematician G. H. Hardy and German physician Wilhelm Weinberg, who independently formulated the relation in 1908. The proportions come from simple probability of sampling copies from a population with given allele frequencies. These proportions are expected to form an equilibrium. That is, they should stay the same over time, as long as the allele frequencies stay the same. When individuals mate without regard to the alleles they carry, every generation of a population should have genotype frequencies in approximately the Hardy-Weinberg proportions.

    The Hardy-Weinberg proportion is important for two reasons. First, the proportions of genotypes in a population may diverge from the expected ones for many reasons, including natural selection, division of populations into different subgroups, or mating that is not entirely random. Comparing the expected and observed proportions of genotypes allows biologists to determine whether these evolutionary forces may be contributing to a population.

    The heterozygosity of a population is the expected proportions of heterozygotes, given the allele frequencies in the population.

    Second, the proportions lead to a natural definition of genetic variation in a population: the heterozygosity. A population's heterozygosity is the expected proportion of heterozygotes from the Hardy-Weinberg formula, 2pq. Two populations may be compared by their heterozygosities: the one with the higher heterozygosity has a higher chance that any single individual will have two different alleles, which means the population is genetically more variable. Variation is a consequence of evolutionary history, including the patterns of selection and genetic drift, and the amount that individuals have moved from one population to another in the past. Thus, the Hardy-Weinberg proportions give an important way to study the evolution of populations over time.

    Study questions: 
    1. Suppose a population has two alleles, with frequencies of 70% and 30%. What are the Hardy-Weinberg proportions expected for the three genotypes of these alleles?
    2. Mendelian recessive disorders are rare in most populations, but their allele frequencies may be surprisingly high. Why?
    3. For a gene with two alleles, what is the highest possible value of heterozygosity? What is the lowest?
  • Eye pigmentation and allele frequencies

    Tue, 2011-09-06 00:46 -- John Hawks
    Synopsis: 
    A single nucleotide polymorphism is associated with blue eyes in Europeans, leading to explanation of genetic associations.

    Eye pigmentation in humans varies along a spectrum of colors from dark brown, through lighter brown, hazel, and green, to light blue. These differences are caused by variation in the content of the dark pigment, eumelanin, in the layers of the iris. Several genes are involved in the variation in color, but most of the lighter colors require a change in the expression of a gene called OCA2.

    Photo credit: blue and brown by Look Into My Eyes, on Flickr. Mixed eye color can sometimes occur, due to somatic mutations that affect the pigment expression in the iris.

    The lighter eye colors are most common in Europe, and in northern Europe in particular. Much of the variation in eye pigmentation in this population is associated with one area of the genome, on chromosome 15 in the region of the genes HERC2 and OCA2. The strongest association is with a single site, 28365618 nucleotides from the beginning of chromosome 15 in the current draft of the human genome. At this site, some human sequences carry an A, and others have a G.

    This kind of variation is called a single nucleotide polymorphism (SNP). The word polymorphism meaning "many forms", but in fact this SNP has only two different forms, or alleles in human populations.

    There are millions of SNPs in the human genome. When they sequence many people, geneticists often find SNPs they have never noticed before, and enter them into a catalog called dbSNP. Each SNP gets a catalog number, beginning with the letters "rs". This one, associated with eye color in Europeans, is known as rs12913832 (dbSNP link).

    We know that rs12913832 is associated with variation in eye color because it has been genotyped in thousands of people. Blue-eyed people are very likely to carry two G's here. Why this SNP is associated with eye color is not yet clear. OCA2 is essential to forming normal pigmentation, but rs12913832 does not change the amino acid sequence of this gene. In fact, it lies within another gene, HERC2. The SNP may change the regulation of OCA2, or it may be linked on the same chromosome sequence to another mutation that does. Or the activity of HERC2 may itself affect pigmentation. Right now, scientists simply don't know.

    Finding a genetic association, like a correlation, is not the same as finding a cause. An association doesn't necessarily tell us that the genetic change caused a change in the body; it merely indicates that one form of the gene is common in people with a particular trait.

    An association may give some hint about the history of a trait. In the case of eye color, blue eyes are most common in northern Europe, and occur more rarely across southern Europe, north Africa, and West Asia. The G allele of rs12913832 has roughly the same distribution:

    Allele frequencies of rs12913832 in human populations surveyed as part of the Human Genome Diversity Project. Map courtesy of HGDP Selection Browser.

    The G allele is most common in northern Europe, and is rare or absent in most of Africa and East Asia. However, the indigenous people of South America actually have fairly high frequencies (up to 30-40%) for this allele. Those populations do not have blue eyes at any appreciable frequency. What can explain this discrepancy?

    Again, a genetic association is not the same as a genetic cause. This SNP allele may be linked to blue eyes in Europe because of its history: Another mutation that causes blue eyes may have happened on a copy of chromosome 15 that carried this SNP allele. Meanwhile, a different copy of chromosome 15 carrying this SNP allele but unrelated to eye pigmentation was in the population that entered the New World some 15,000 years ago, and became common in the ancestors of South American populations.

    Understanding the history of human movements helps us to uncover the genetic causes of traits. In this case, the SNP allele reflects two different histories in western Eurasia and in the New World.

    Study questions: 
    1. Use the genome browser to look around rs12913832. Use the tools to zoom out until you can see the gene OCA2. How far away is OCA2 from this SNP?
    2. The population of the United States was not surveyed in the project that gave rise to the map above. What do you predict about the allele frequencies of rs12913832 in the present U.S. population?
    3. What is the frequency of the trait blue eyes in your classroom?
  • Founder effect

    Sat, 2011-08-06 13:34 -- John Hawks
    Synopsis: 
    The founder effect is a special case of genetic drift that can happen when a small number of individuals found a new population
    The founder effect is caused by genetic drift in a small number of initial founders of a new population.

    One of the most important manifestations of genetic drift is in the founding of new populations by a small number of colonists. For example, the Afrikaner population of the country of South Africa today descends from Dutch colonists who arrived during the seventeenth century. Some of the earliest colonists to arrive had a large genetic contribution to the later Afrikaner population, because they had a chance to have lots of offspring who intermarried with later arrivals. The first Dutch colonists landed in 1652, and one of these colonists was a man who carried an allele causing Huntington's disease, a rare genetic disorder of the nervous system. Huntington's is a dominant genetic disorder, affecting all individuals who carry the allele, but it exerts most of its effect late in life --- after people generally reproduce. Although this harmful allele was carried by only one individual, it was a relatively large proportion of the new founder population --- much higher in frequency than it had been in Holland. After strong population growth, today's Afrikaners have a high frequency of the Huntington's allele, mainly from this single founder (Ridley 2002). This phenomenon of genetic drift is often called the \term{founder effect}.

    \subsection{Population structure and genetic drift}

    Genetic drift is stronger when there is more variability in reproduction.

    A simple reason for variability in reproduction is the different reproductive efforts of males and females. Female mammals face a high cost of reproduction. Mothers provide space and nutrients to their developing young while they still in the womb, and mothers provide high-energy milk and protection to their young after they are born. Although female fish and frogs may lay hundreds --- or even thousands --- of eggs, female mammals are limited to many fewer offspring over the course of their lifetimes. Males, on the other hand, do not face the same reproductive costs. If a male can mate with many females, he can potentially have many times the number of offspring of any single female. But males face a different cost: if they want to mate at all, they must first face competition from other males. In many species, a lucky few males may mate with many females, while most males do not mate at all. Thus, males are often much more variable in their reproductive success than females. Each generation of offspring in such a population includes the genes of many different females but only a few males. Only a few genes may be responsible for the and all the genes of these few males are boosted by genetic drift.

    Human history appears to have included some cases where single male lineages had exceptionally high mating success. Geneticists can trace male reproduction through the Y chromosome, which is passed from only from father to son. Because of this unique pattern of inheritance, the Y chromosome marks \term{patrilines}, lineages of males. In many human societies, social status or power may also be passed along patrilines, as kings and chiefs pass power to their sons. This cultural pattern of inheritance generally lasts only for a few generations, as some member of the male lineage ultimately fails to have a son as an heir, or the patriline simply loses power. But the history of some cultures gave a few patrilines exceptional mating opportunities, as kings and other high-ranking men sometimes kept harems of dozens or more women for their own exclusive mating.

    \begin{figure}
    \includegraphics[width=\textwidth]{genghis.png}
    \caption[Frequency of ``Genghis Khan'' Y chromosome haplotype in Asia]{Frequency of the ``Genghis Khan'' Y chromosome haplotype in samples of Asian populations. The ``star cluster'' refers to the rapid expansion in numbers of the haplotype in different populations since its origin around 1000 years ago. Reprinted from Zerjal \emph{et al.} (2003).}
    \label{fig:genghis}
    \end{figure}

    Two Y chromosome haplotypes in Asia are shared by many millions of men, even though they emerged within the past thousand years. One of these, carried by 8 percent of men in Central and Northeast Asia, appears to have originated in Mongolia around a thousand years ago [1]. At this frequency, the haplotype would occur in as many as 16 million men, all descendants of a single man within the past 1000 years. The large current population implies that these men descend from an exceptionally widespread and productive patriline. During the past 1000 years in Asia, the best candidate for such a patriline is that of the Mongol emperor Genghis Khan, who lived from around A.D. 1162--1227. After conquering history's largest land empire, Genghis and his descendants installed their male relatives as rulers of much of Asia. These descendants themselves must often have had extraordinary reproductive opportunities, so that their Y chromosomes became more and more common in Asian populations. A second Y chromosome haplotype is carried by around 3 percent of people in China and Mongolia, and may derive from the Manchu dynasty, which dates to the year 1644 [2]. Together, these haplotypes illustrate the chance for some rare alleles to increase greatly in frequency due to genetic drift in human history.


    References

    1. Zerjal T, Xue Y, Bertorelle G, Wells RS, Bao W, Zhu S, Qamar R, Ayub Q, Mohyuddin A, Fu S, et al. 2003. The Genetic Legacy of the {Mongols}. American Journal of Human Genetics 72:717–721.
    2. Xue Y, Zerjal T, Bao W, Zhu S, Lim S-K, Shu Q, Xu J, Du R, Fu S, Li P, et al. 2005. Recent Spread of a Y-Chromosomal Lineage in {Northern China} and {Mongolia}. American Journal of Human Genetics 77:1112–1116.
    Study questions: 
    1. Can you think of other populations in human history that might have undergone a founder effect?
    2. What evidence can we use to test whether a founder effect can explain the high frequency of an allele?
  • Genetic drift

    Fri, 2011-08-05 01:24 -- John Hawks
    Synopsis: 
    Many changes in gene frequencies are caused by random chance differences in reproduction.

    If everyone in a population lived a long life, mated, and reproduced absolutely equally (two offspring per person), then the population size would never change. There would always be approximately the same number of individuals, allowing for variations in when people are born or die. In this population, every gene has an equal chance of being passed into the next generation. Natural selection depends on differences in the chance that genes will survive and reproduce, so this population would not evolve by natural selection.

    But the population would still evolve by random chance. A single chromosome can illustrate this potential for evolution. The Y chromosome determines whether humans will be male or female: males have one X chromosome and one Y, females have no Y and two X chromosomes. Mendelian genetics predicts that if a father has two offspring, each of these children has a 50 percent chance of inheriting his Y chromosome and thereby being a son. But these odds mean that the man has a substantial chance of having no sons at all --- 25 percent of the time, both children will be daughters. If the man has no sons, then his Y chromosome is simply lost from the next generation. Genes disappear due to chance, even if everyone mates and reproduces equally.

    Genetic drift is a random change in allele frequencies.

    These random changes in allele frequency can accumulate over time. Across many generations, the frequency of an allele can gradually increase, gradually decrease, or fluctuate back and forth. In other words, the frequencies of different alleles seem to ``drift'' up and down, without any direction. This is why the random change in allele frequencies is called \term{genetic drift}. Over time, genetic drift can make once rare alleles common, or eliminate alleles altogether.

    Genetic drift is stronger in small populations.

    \begin{figure}
    \centering
    \includegraphics[width=4in]{genetic_variation_drift.png}
    \label{fig:genetic_drift}
    \caption[Genetic variation under genetic drift]{Genetic variation under genetic drift as a function of population size. The expected amount of genetic variation increases as a linear function of the size of the population, when genetic drift and mutation are the only causes of evolution. Larger populations are more variable; smaller populations are less variable. }
    \end{figure}

    The most obvious factor affecting the rate of genetic drift is the size of the population. If the population is small, then a small sample is taken of the gametic population in every generation. Small samples can vary more markedly from the larger sets from which they are selected than larger samples, so genetic drift is more powerful in smaller populations. For example, in a population of five individuals, an allele that exists in a single copy in one individual has a frequency of ten percent. Nevertheless, this allele is in constant jeopardy of being eliminated from the population, requiring only the chance of not being passed on once to never again be found. Likewise, it is very possible that in a very few generations this allele might increase from one copy to ten, eliminating all other alleles. In contrast, in a population of a thousand individuals, an allele with a frequency of ten percent exists in 200 copies. While random sampling of gametes will cause this number to fluctuate over time, it is extremely unlikely that chance alone would allow no copy of this allele to be passed on in any given generation. Indeed, it would likely take many hundreds of generations for random events to either eliminate this allele or all the others.

    Study questions: 
    1. Can you think of other human populations that have undergone founder effects?
  • Genetic variation and the Hardy-Weinberg formula

    Thu, 2011-08-04 15:32 -- John Hawks
    Synopsis: 
    We can predict the proportions of genotypes in a population from the allele frequencies, which gives us a way to measure variation.

    The fundamental information about genetics for any individual is her genotype. The genotype for a single genetic locus is simply a list of two alleles, whether they're the same or different. Within a population, we can count alleles in other ways also, giving us ways to measure the genetic variation among many individuals.

    A population consists of individuals, so a geneticist may count the number of individuals with every possible genotype. Comparing these numbers with the total number of individuals, the geneticist may calculate the genotype frequencies, the proportion of individuals who have each possible genotype.

    For example, cystic fibrosis (CF) is a very rare disorder. Among Americans of European ancestry, only around 1 in 2500 people will develop CF during her lifetime. The disorder is even rarer among people of non-European origin. Geneticists have surveyed many people to discover how many of them carry the disorder, and today a number of states screen newborns for cystic fibrosis as a way of directing affected children to early medical treatment (AAP Newborn Screening Task Force 2000). From this information, geneticists have determined that while only around 0.04 percent of the population is affected by cystic fibrosis, with a genotype of ff, approximately 4 percent of people are carriers of the allele, with a genotype of Ff. This leaves nearly 96 percent of the population with the genotype FF. These proportions are the frequencies of the three possible genotypes for this gene.

    The frequency of an allele is the proportion of the total number of gene copies in the population that are that allele.

    When geneticists know the frequencies of genotypes in a population, they can determine how many copies of each allele are in the population as a whole. For example, one recent study used genetic techniques to assess the genotypes of a group of colorectal cancer patients for the two possible alleles (A and B) of the p73 gene on chromosome 1 (Pfeifer et al. 2004). In this sample, 113 people (63 percent) were AA, 54 people (30 percent) were AB, and 12 people (7 percent) were BB. Considering that each AA person has two copies of the A allele and each AB person has one copy, the total sample includes 280 copies of the A allele and 78 copies of the B allele. The allele frequency of the A allele is then 280/358, or 78 percent. The frequency of the B allele in this sample is 22 percent.

    While geneticists can determine the frequency of an allele from the proportions of genotypes in a population, they may also do the same calculation in reverse — figuring out the expected proportions of genotypes from the frequencies of alleles. For example, if the cystic fibrosis f allele is found at a frequency of 5 percent in a population, then the chance that an individual will have two copies of this allele is simply the 5 percent chance for each of the two alleles, multiplied by each other. Five percent times five percent is 0.0025, or twenty-five chances out of 10,000.

    The Hardy-Weinberg genotype proportions are p2 + 2pq + q2 for a two-allele gene.

    The proportion of both homozygotes in the population may be determined in the same way. The probability that an individual will be a homozygote for either allele equals the frequency of the allele times itself, or squared. Thus, in the example above, the chance of f homozygotes is equal to the frequency of f squared. The proportion of heterozygotes is equal to the chance that one allele is F times the chance that the other allele is f, plus the chance that the first allele is f times the chance that the other allele is F. Since these likelihoods are the same, the total chance of an individual being a heterozygote is 2 times the frequency of one allele times the frequency of the other allele. Thus, the proportions of genotypes are p2 and q2 homozygotes, and 2pq heterozygotes.

    These proportions are called the Hardy-Weinberg proportions, after the British mathematician G. H. Hardy and German physician Wilhelm Weinberg, who independently formulated the relation in 1908. The proportions come from simple probability of sampling copies from a population with given allele frequencies. These proportions are expected to form an equilibrium — that is, they should stay the same over time — as long as the allele frequencies stay the same. When individuals mate without regard to the alleles they carry, every generation of a population should have genotype frequencies in approximately the Hardy-Weinberg proportions.

    The Hardy-Weinberg proportion is important for two reasons. First, the proportions of genotypes in a population may diverge from the expected ones for many reasons, including natural selection, division of populations into different subgroups, or mating that is not entirely random. Comparing the expected and observed proportions of genotypes allows biologists to determine whether these evolutionary forces may be contributing to a population.

    The heterozygosity of a population is the expected proportion of heterozygotes, given the allele frequencies in the population.

    Second, the proportions lead to a natural definition of genetic variation in a population: the heterozygosity. A population's heterozygosity is the expected proportion of heterozygotes from the Hardy-Weinberg formula, 2pq. Two populations may be compared by their heterozygosities: the one with the higher heterozygosity has a higher chance that any single individual will have two different alleles, which means the population is genetically more variable. Variation is a consequence of evolutionary history, including the patterns of selection and genetic drift, and the amount that individuals have moved from one population to another in the past. Thus, the Hardy-Weinberg proportions give an important way to study the evolution of populations over time.

    Study questions: 
    1. Suppose that the frequency of one allele is 0.3 (30%). Use the Hardy-Weinberg formula to predict the proportion of homozygotes in the population with two copies of that allele.
    2. Consider a population with two alleles, with frequencies 0.4 and 0.6. What is the heterozygosity of the population?
    3. How could a population have many different alleles but still have a low heterozygosity?
Subscribe to allele frequency

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.