john hawks weblog

paleoanthropology, genetics and evolution

allele

  • Genetic variation and the Hardy-Weinberg proportions

    Mon, 2011-10-03 09:54 -- John Hawks
    Synopsis: 
    Allele frequencies and genotype frequencies are connected by math

    The fundamental information about genetics for any individual is her genotype — the alleles that she has. But genes in populations can be considered in other ways as well.

    For instance, a population consists of individuals, so a geneticist may count the number of individuals with every possible genotype. Comparing these numbers with the total number of individuals, the geneticist may calculate the genotype frequencies, the proportion of individuals who have each possible genotype.

    Cystic fibrosis (CF) is a very rare disorder. Among Americans of European ancestry, only around 1 in 2500 people will develop CF during her lifetime. The disorder is even rarer among people of non-European origin. Geneticists have surveyed many people to discover how many of them carry the disorder, and today a number of states screen newborns for cystic fibrosis as a way of directing affected children to early medical treatment (AAP Newborn Screening Task Force 2000). From this information, geneticists have determined that while only around 0.04 percent of the population is affected by cystic fibrosis, with a genotype of ff, approximately 2.5 percent of people are carriers of the allele, with a genotype of Ff. This leaves some 97.5 percent of the population with the genotype FF. These proportions are the frequencies of the three possible genotypes for this gene: 0.04% ff, 2.5% Ff and 97.5% FF.

    The frequency of an allele is the proportion of copies of that allele compared to the total number of copies of all alleles in a population.

    When geneticists know the frequencies of genotypes in a population, they can estimate how many copies of each allele are in the population as a whole. One recent study used genetic techniques to assess the genotypes of a group of colorectal cancer patients for the two possible alleles (A and B) of the p73 gene on chromosome 1 (Pfeifer et al. 2004). In this sample, 113 people (63 percent) were AA, 54 people (30 percent) were AB, and 12 people (7 percent) were BB. Each AA person has two copies of the A allele and each AB person has one copy, so the total sample included 280 copies of the A allele and 78 copies of the B allele. The allele frequency of the A allele is then 280/358, or 78 percent. The frequency of the B allele in this sample is 22 percent.

    Hardy-Weinberg proportions

    While geneticists can determine the frequency of an allele from the proportions of genotypes in a population, they may also do the same calculation in reverse — figuring out the expected proportions of genotypes from the frequencies of alleles. For example, if the cystic fibrosis f allele is found at a frequency of 5 percent in a population, then the chance that an individual will have two copies of this allele is simply the 5 percent chance for each of the two alleles, multiplied by each other. Five percent times five percent is 0.0025, which is 0.25% twenty-five chances out of 10,000.

    The Hardy-Weinberg genotype proportions are p2 + 2pq + q2 for a two-allele gene.

    The proportion of both homozygotes in the population may be determined in the same way. The probability that an individual will be a homozygote for either allele equals the frequency of the allele times itself, or squared. Thus, in the example above, the chance of f homozygotes is equal to the frequency of f squared. The proportion of heterozygotes is equal to the chance that one allele is F times the chance that the other allele is f, plus the chance that the first allele is f times the chance that the other allele is F. Since these likelihoods are the same, the total chance of an individual being a heterozygote is 2 times the frequency of one allele times the frequency of the other allele. Thus, the proportions of genotypes are p2 and q2 homozygotes, and 2pq heterozygotes.

    These proportions are called the Hardy-Weinberg proportions, after the British mathematician G. H. Hardy and German physician Wilhelm Weinberg, who independently formulated the relation in 1908. The proportions come from simple probability of sampling copies from a population with given allele frequencies. These proportions are expected to form an equilibrium. That is, they should stay the same over time, as long as the allele frequencies stay the same. When individuals mate without regard to the alleles they carry, every generation of a population should have genotype frequencies in approximately the Hardy-Weinberg proportions.

    The Hardy-Weinberg proportion is important for two reasons. First, the proportions of genotypes in a population may diverge from the expected ones for many reasons, including natural selection, division of populations into different subgroups, or mating that is not entirely random. Comparing the expected and observed proportions of genotypes allows biologists to determine whether these evolutionary forces may be contributing to a population.

    The heterozygosity of a population is the expected proportions of heterozygotes, given the allele frequencies in the population.

    Second, the proportions lead to a natural definition of genetic variation in a population: the heterozygosity. A population's heterozygosity is the expected proportion of heterozygotes from the Hardy-Weinberg formula, 2pq. Two populations may be compared by their heterozygosities: the one with the higher heterozygosity has a higher chance that any single individual will have two different alleles, which means the population is genetically more variable. Variation is a consequence of evolutionary history, including the patterns of selection and genetic drift, and the amount that individuals have moved from one population to another in the past. Thus, the Hardy-Weinberg proportions give an important way to study the evolution of populations over time.

    Study questions: 
    1. Suppose a population has two alleles, with frequencies of 70% and 30%. What are the Hardy-Weinberg proportions expected for the three genotypes of these alleles?
    2. Mendelian recessive disorders are rare in most populations, but their allele frequencies may be surprisingly high. Why?
    3. For a gene with two alleles, what is the highest possible value of heterozygosity? What is the lowest?
  • Eye pigmentation and allele frequencies

    Tue, 2011-09-06 00:46 -- John Hawks
    Synopsis: 
    A single nucleotide polymorphism is associated with blue eyes in Europeans, leading to explanation of genetic associations.

    Eye pigmentation in humans varies along a spectrum of colors from dark brown, through lighter brown, hazel, and green, to light blue. These differences are caused by variation in the content of the dark pigment, eumelanin, in the layers of the iris. Several genes are involved in the variation in color, but most of the lighter colors require a change in the expression of a gene called OCA2.

    Photo credit: blue and brown by Look Into My Eyes, on Flickr. Mixed eye color can sometimes occur, due to somatic mutations that affect the pigment expression in the iris.

    The lighter eye colors are most common in Europe, and in northern Europe in particular. Much of the variation in eye pigmentation in this population is associated with one area of the genome, on chromosome 15 in the region of the genes HERC2 and OCA2. The strongest association is with a single site, 28365618 nucleotides from the beginning of chromosome 15 in the current draft of the human genome. At this site, some human sequences carry an A, and others have a G.

    This kind of variation is called a single nucleotide polymorphism (SNP). The word polymorphism meaning "many forms", but in fact this SNP has only two different forms, or alleles in human populations.

    There are millions of SNPs in the human genome. When they sequence many people, geneticists often find SNPs they have never noticed before, and enter them into a catalog called dbSNP. Each SNP gets a catalog number, beginning with the letters "rs". This one, associated with eye color in Europeans, is known as rs12913832 (dbSNP link).

    We know that rs12913832 is associated with variation in eye color because it has been genotyped in thousands of people. Blue-eyed people are very likely to carry two G's here. Why this SNP is associated with eye color is not yet clear. OCA2 is essential to forming normal pigmentation, but rs12913832 does not change the amino acid sequence of this gene. In fact, it lies within another gene, HERC2. The SNP may change the regulation of OCA2, or it may be linked on the same chromosome sequence to another mutation that does. Or the activity of HERC2 may itself affect pigmentation. Right now, scientists simply don't know.

    Finding a genetic association, like a correlation, is not the same as finding a cause. An association doesn't necessarily tell us that the genetic change caused a change in the body; it merely indicates that one form of the gene is common in people with a particular trait.

    An association may give some hint about the history of a trait. In the case of eye color, blue eyes are most common in northern Europe, and occur more rarely across southern Europe, north Africa, and West Asia. The G allele of rs12913832 has roughly the same distribution:

    Allele frequencies of rs12913832 in human populations surveyed as part of the Human Genome Diversity Project. Map courtesy of HGDP Selection Browser.

    The G allele is most common in northern Europe, and is rare or absent in most of Africa and East Asia. However, the indigenous people of South America actually have fairly high frequencies (up to 30-40%) for this allele. Those populations do not have blue eyes at any appreciable frequency. What can explain this discrepancy?

    Again, a genetic association is not the same as a genetic cause. This SNP allele may be linked to blue eyes in Europe because of its history: Another mutation that causes blue eyes may have happened on a copy of chromosome 15 that carried this SNP allele. Meanwhile, a different copy of chromosome 15 carrying this SNP allele but unrelated to eye pigmentation was in the population that entered the New World some 15,000 years ago, and became common in the ancestors of South American populations.

    Understanding the history of human movements helps us to uncover the genetic causes of traits. In this case, the SNP allele reflects two different histories in western Eurasia and in the New World.

    Study questions: 
    1. Use the genome browser to look around rs12913832. Use the tools to zoom out until you can see the gene OCA2. How far away is OCA2 from this SNP?
    2. The population of the United States was not surveyed in the project that gave rise to the map above. What do you predict about the allele frequencies of rs12913832 in the present U.S. population?
    3. What is the frequency of the trait blue eyes in your classroom?
  • Genetic variation and the Hardy-Weinberg formula

    Thu, 2011-08-04 15:32 -- John Hawks
    Synopsis: 
    We can predict the proportions of genotypes in a population from the allele frequencies, which gives us a way to measure variation.

    The fundamental information about genetics for any individual is her genotype. The genotype for a single genetic locus is simply a list of two alleles, whether they're the same or different. Within a population, we can count alleles in other ways also, giving us ways to measure the genetic variation among many individuals.

    A population consists of individuals, so a geneticist may count the number of individuals with every possible genotype. Comparing these numbers with the total number of individuals, the geneticist may calculate the genotype frequencies, the proportion of individuals who have each possible genotype.

    For example, cystic fibrosis (CF) is a very rare disorder. Among Americans of European ancestry, only around 1 in 2500 people will develop CF during her lifetime. The disorder is even rarer among people of non-European origin. Geneticists have surveyed many people to discover how many of them carry the disorder, and today a number of states screen newborns for cystic fibrosis as a way of directing affected children to early medical treatment (AAP Newborn Screening Task Force 2000). From this information, geneticists have determined that while only around 0.04 percent of the population is affected by cystic fibrosis, with a genotype of ff, approximately 4 percent of people are carriers of the allele, with a genotype of Ff. This leaves nearly 96 percent of the population with the genotype FF. These proportions are the frequencies of the three possible genotypes for this gene.

    The frequency of an allele is the proportion of the total number of gene copies in the population that are that allele.

    When geneticists know the frequencies of genotypes in a population, they can determine how many copies of each allele are in the population as a whole. For example, one recent study used genetic techniques to assess the genotypes of a group of colorectal cancer patients for the two possible alleles (A and B) of the p73 gene on chromosome 1 (Pfeifer et al. 2004). In this sample, 113 people (63 percent) were AA, 54 people (30 percent) were AB, and 12 people (7 percent) were BB. Considering that each AA person has two copies of the A allele and each AB person has one copy, the total sample includes 280 copies of the A allele and 78 copies of the B allele. The allele frequency of the A allele is then 280/358, or 78 percent. The frequency of the B allele in this sample is 22 percent.

    While geneticists can determine the frequency of an allele from the proportions of genotypes in a population, they may also do the same calculation in reverse — figuring out the expected proportions of genotypes from the frequencies of alleles. For example, if the cystic fibrosis f allele is found at a frequency of 5 percent in a population, then the chance that an individual will have two copies of this allele is simply the 5 percent chance for each of the two alleles, multiplied by each other. Five percent times five percent is 0.0025, or twenty-five chances out of 10,000.

    The Hardy-Weinberg genotype proportions are p2 + 2pq + q2 for a two-allele gene.

    The proportion of both homozygotes in the population may be determined in the same way. The probability that an individual will be a homozygote for either allele equals the frequency of the allele times itself, or squared. Thus, in the example above, the chance of f homozygotes is equal to the frequency of f squared. The proportion of heterozygotes is equal to the chance that one allele is F times the chance that the other allele is f, plus the chance that the first allele is f times the chance that the other allele is F. Since these likelihoods are the same, the total chance of an individual being a heterozygote is 2 times the frequency of one allele times the frequency of the other allele. Thus, the proportions of genotypes are p2 and q2 homozygotes, and 2pq heterozygotes.

    These proportions are called the Hardy-Weinberg proportions, after the British mathematician G. H. Hardy and German physician Wilhelm Weinberg, who independently formulated the relation in 1908. The proportions come from simple probability of sampling copies from a population with given allele frequencies. These proportions are expected to form an equilibrium — that is, they should stay the same over time — as long as the allele frequencies stay the same. When individuals mate without regard to the alleles they carry, every generation of a population should have genotype frequencies in approximately the Hardy-Weinberg proportions.

    The Hardy-Weinberg proportion is important for two reasons. First, the proportions of genotypes in a population may diverge from the expected ones for many reasons, including natural selection, division of populations into different subgroups, or mating that is not entirely random. Comparing the expected and observed proportions of genotypes allows biologists to determine whether these evolutionary forces may be contributing to a population.

    The heterozygosity of a population is the expected proportion of heterozygotes, given the allele frequencies in the population.

    Second, the proportions lead to a natural definition of genetic variation in a population: the heterozygosity. A population's heterozygosity is the expected proportion of heterozygotes from the Hardy-Weinberg formula, 2pq. Two populations may be compared by their heterozygosities: the one with the higher heterozygosity has a higher chance that any single individual will have two different alleles, which means the population is genetically more variable. Variation is a consequence of evolutionary history, including the patterns of selection and genetic drift, and the amount that individuals have moved from one population to another in the past. Thus, the Hardy-Weinberg proportions give an important way to study the evolution of populations over time.

    Study questions: 
    1. Suppose that the frequency of one allele is 0.3 (30%). Use the Hardy-Weinberg formula to predict the proportion of homozygotes in the population with two copies of that allele.
    2. Consider a population with two alleles, with frequencies 0.4 and 0.6. What is the heterozygosity of the population?
    3. How could a population have many different alleles but still have a low heterozygosity?
Subscribe to allele

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.