john hawks weblog

paleoanthropology, genetics and evolution

epigenetics

  • Gene expunction

    Fri, 2010-09-24 09:13 -- John Hawks

    So, it's a perfectly ordinary story about epigenetics and how the methylation of some genes may be correlated with BMI. But what I don't understand is the headline:

    Study: Can We Tell Our Genes to Make Us Fat?

    YES! That's exactly what I need -- to tell my genes to make me fat. Please tell me more!

  • Methylation in Neandertals

    Wed, 2009-12-30 00:17 -- John Hawks

    A new Neandertal (and mammoth) genetics paper from Adrian Briggs and colleagues is bigger news than it might appear at first glance:

    DNA sequences determined from ancient organisms have high error rates, primarily due to uracil bases created by cytosine deamination. We use synthetic oligonucleotides, as well as DNA extracted from mammoth and Neandertal remains, to show that treatment with uracil–DNA–glycosylase and endonuclease VIII removes uracil residues from ancient DNA and repairs most of the resulting abasic sites, leaving undamaged parts of the DNA fragments intact. Neandertal DNA sequences determined with this protocol have greatly increased accuracy. In addition, our results demonstrate that Neandertal DNA retains in vivo patterns of CpG methylation, potentially allowing future studies of gene inactivation and imprinting in ancient organisms.

    I've bold-faced that last sentence because my mouth dropped open when I read it. Traces of epigenetic signals are still there in the degraded DNA of ancient Neandertals.

    I haven't seen anybody else notice this paper yet. It's largely technical, describing the efficacy of particular treatment processes for increasing sequencing accuracy. And one of the characteristics of chemical diagenesis of the ancient DNA is that CpG sites and methylated bases create some complications.

    We still don't know how to interpret methylation in the DNA of living people. So there's some limit to the utility of this observation. But it's still really cool.

    Some excerpts from the paper follow. Here's a passage reminding us of the high quality of libraries coming from some specimens:

    When analyzing Neandertal DNA sequences, contamination of experiments with contemporary human DNA is a potential problem (10,34). However, the level of such contamination in a Neandertal DNA library can be assessed by counting the ratio of Neandertal versus contaminant fragments at nucleotide positions where Neandertals differ from all or almost all present-day humans (33). The mtDNA of this Neandertal carries 133 such diagnostic positions (23). The ‘no repair’ dataset yielded 139 mtDNA fragments that overlapped such positions; 138 carried the Neandertal base while one matched modern human mtDNA. The UDG/endoVIII treated dataset yielded 128 informative fragments, of which all were the Neandertal type. Thus, the mtDNA in all libraries was almost completely free of contamination by modern human mtDNA, even after treatment with UDG and endoVIII. Since the ratio of mitochondrial to nuclear DNA may differ between the contaminating and the Neandertal DNA, this estimate is strictly applicable only to the mtDNA (33). However, the estimate of mtDNA contamination in these libraries is low enough that within even a few-fold variation in mtDNA:nuclear DNA ratios between the Neandertal and contaminating DNA, sequences aligning to the human nuclear genome will be predominantly of Neandertal origin (Briggs et al. 2009: 9).

    A recurring theme in the paper is that very high numbers of resequencing will be necessary to improve the accuracy of ancient DNA. Probing for a particular genetic variant -- such as the "Neandertal diagnostic" mtDNA sites -- is less of a problem, because there is a low probability of a sequencing error at any particular site. But summed across thousands or millions of base pairs, the number of sequence errors can easily exceed the number of genuine differences between Neandertal and human genomes. For example:

    Figure 6 shows that for mitochondrial sequences, overall error rate per base (Figure 6) was 2.20% for ‘no repair’ sequences; 0.40% for UDG/endoVIII sequences and 0.09% for multi-pass UDG/endoVIII-treated sequences. Thus, UDG/endoVIII treatment alone results in a 5.5-fold reduction in error rates while deep sequencing results in an additional 4.4-fold reduction. In combination this results in a 22-fold reduction in errors. In nuclear DNA, for which we removed CpG sites from the analysis due to the effect of methylation described above, a similar pattern is observed (Figure 6) although the true error rate at these low levels cannot be accurately calculated due to genuine sequence differences between this Neandertal and the human reference.

    A truly impressive low error rate in the best case -- which here involves sequencing many copies of the same sites, generating "deep coverage". But consider: the average sequence divergence between two humans is around 0.1 percent. With a total error rate of 0.09 percent, there would be nearly as many erroneous differences as real ones between an ancient genome and the modern reference. And sequencing two ancient genomes would double the errors, so that the apparent genetic differences would be roughly three times the actual value.

    Even deeper coverage will be necessary, hopefully reducing error rates further. This may lead to a hypothesis-testing approach for sequence differences found in Neandertal genomes, as the false positives are sifted out of the data. It's going to take some creative population genetics to deal with this problem, as we try to analyze the data from these ancient specimens.

    References:

    Briggs AW, Stenzel U, Meyer M, Krause J, Kircher M, Pääbo S. 2009. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res (advance) doi:10.1093/nar/gkp1163

  • Mailbag: Holding on to ancient DNA

    Wed, 2009-08-26 19:12 -- John Hawks

    Hi,

    It is often claimed that ancient genes that were once very adaptable are discarded over time by drift, bottle necks etc. What if an ancient trait were again valuable as climate swings or other environmental opportunites and are now again favorable. My point is that if an organism, especially in a variable climate, that carried this gene would be at a selelctive advantantage if that trait were inherited. The inheritable “trait” being the ability to retain ancient DNA. Also, this trait could be inherited in pieces spread over more than one organism, which are recombined through hybridization with the same results.

    The most basic version of this is frequency-dependent polymorphism. Suppose that an allele is useful when rare, and harmful when common. Over the long term, it will never approach fixation, but nor will it become extinct unless the advantages are weak relative to the size of the population.

    Now, suppose that the allele is advantageous only some of the time, and otherwise neutral. Now it can drift to fixation. If the times when it is useful are far enough apart, it can drift to loss. But anytime the environment is favorable for the allele, it will get a little boost. The tendency will be toward fixation, biased just to the extent of the strength of selection and duration of the favorable time intervals.

    OK, add another element of complexity. The allele is favored during some intervals, and disfavored during others. Motoo Kimura described the dynamics of this scenario; the ultimate fate of the allele depends on the duration of the time intervals, of course, and may lead to an unstable polymorphism, fixation or loss.

    You propose a "reserve" mechanism, where the genome holds on to old variants to resurrect them at some later time when they become useful.

    Of course, we potentially have such a mechanism now, as we can dig up ancient DNA and experiment with it in vivo. But you suggest that a reserve of ancient genetic material might be adaptive.

    I believe the dynamics of such a mechanism would be the same as if the population were merely larger. In that case, drift (and selection against recessives) would be much slower to eliminate alleles that had lost their advantage. So when the environment changed, the population could respond more quickly without waiting for the old variants to reappear by de novo mutations.

    Also, a larger population makes it much more likely for mutations to happen.

    There's no evidence that a store of ancient genetic variants lie silent in our genomes, but I think you might look at actual gene silencing mechanisms as a parallel to your suggestion. We do retain functional genes within our genomes that we turn off by methylation early in development. The genes either act early in development, are imprinted by maternal or paternal origin, or are turned off in tissues that don't need them. That's a way of maintaining variations for use in some circumstances but not all.

  • The classical model of the gene and information contrasts

    Wed, 2008-11-12 21:16 -- John Hawks

    There has been much wringing of hands about the definition of the term "gene." The worries aren't new, but they have become a topic this week because of a NY Times article by Carl Zimmer.

    It seems that the Mendelian gene just isn't doing it for some geneticists. One wonders whether they would prefer to call themselves DNAticists, alternate splicists and methylatrices.

    I lecture on this topic every semester when I introduce Mendelian inheritance, so I don't see the urgency or news value here. The simple fact is that "gene" is a term that was defined to apply to a theory of the transmission of traits -- the Mendelian theory, although Mendel himself had used the term elementen. As Zimmer explains, "gene" came later:

    The word was coined by the Danish geneticist Wilhelm Johanssen in 1909, to describe whatever it was that parents passed down to their offspring so that they developed the same traits. Johanssen, like other biologists of his generation, had no idea what that invisible factor was. But he thought it would be useful to have a way to describe it.

    “The word ‘gene’ is completely free from any hypothesis,” Johanssen declared, calling it “a very applicable little word.”

    But still, the definition is fundamentally Mendelian. And Mendel's theory was fundamentally a theory of contrasts. Peas were round or wrinkled, yellow or green. The exact degree of wrinkling, or greenness, is not an issue. What is important is the contrast between the two phenotypes. That contrast is attributable to the presence of two alleles.

    When Fisher (and others) reconciled the Mendelian theory with continuous traits, the meaning of "gene" subtly shifted. Allelomorphs were still responsible for contrasts in the phenotype, but these contrasts were not absolute; they depended on the influence of the environment and the background of other genes. Still, they were inherited as units: the reconciliation was between the Mendelian inheritance and the continuous nature of variation.

    I want to point out that the original use of "gene" corresponds very closely to linguistic concepts that are defined by contrasts. The tongue and the rest of the vocal tract can configure themselves in a near-infinite number of shapes, and different people have differently shaped vocal tracts. Out of this continuous range of variability, our auditory processing system perceives a finite (and small) number of distinct phonemes. Those phonemes are defined by their contrasts with other phonemes. English perceives "r" and "l" as distinct phonemes, some other languages do not. Different English speakers generate different "a" sounds, but a single listener can recognize them as the same phoneme because of the contrast between "a" and other vowel sounds.

    If we consider short fragments of a recording of continuous speech, we may have much trouble recognizing phonemes. Their function (distinguishing words from each other by minimal contrasts, say "bat" versus "pat") depends on their position and the nearby sounds. If we look at a spectrogram of the frequencies of sounds, we will observe that one person's "b" covers a range of actual sounds, and two people may have quite different "b"s.

    Now, I suppose we could say that phonemes are a poorly defined concept, that they correspond to many different things at different parts of an utterance, or that the definition of phoneme must somehow cover every possible case. We might say that "junk" utterances or exclamations, like "AIEEEEEE!" actually contain hidden phonemes that should cause us to revisit what a phoneme really is. We might claim that "phoneme" is on the edge of a revolution, because inspection of spectrograms is showing unexpected diversity.

    I don't think any of that is necessary, though, and luckily neither do linguists (as far as I know). Nor do I think any of that is necessary for the concept of a "gene." It's a perfectly good concept in the context of Mendelian inheritance. Life hasn't changed that much in a hundred years.

    OK, but what about epigenetics, and alternate splicing, and highly conserved noncoding elements, and all that? First, there's nothing wrong with observing that the actual contrasts described by the Mendelian theory are almost always caused by DNA differences. Sometimes those differences are coding, and change amino acid sequences; sometimes they are regulatory and change RNA transcription. Some of these changes are minor, and the contrasts they generate may grade into the unmeasurable.

    Nor is there anything wrong with observing that some mechanisms may cause inheritance that is not Mendelian in character. Certainly X-inactivation is such a process, and in general epigenetic changes may have non-Mendelian effects. I don't see why these processes require us to revisit the definition of "genes", though.

    The idea of contrasts may have some value in considering the observation that most noncoding DNA is transcribed into RNA, at least sometimes. Observing such transcription with today's high-sensitivity methods is no guarantee (or maybe no evidence) of functional importance. After all, if a cell can replicate all this junk DNA with little fitness cost, why should we assume that occasional transcription would be unbearably expensive?

    But if all that extra junk DNA transcription is just biochemical noise, then it follows that cells may use the contrast between highly transcribed genes and rarely transcribed non-genes. In the simplistic sense, the contrast is information theoretic; based on probabilities of transcription of different DNA segments. Likewise, phonemes are defined by the probabilities of occurrence of sound sequences of certain recognizable qualities.

    So in my view, "gene" is only problematic if we insist on confusing distinct biological processes. It is defined by transmission contrasts, in a Mendelian sense; it corresponds often (but perhaps not exclusively) with delimited DNA sequences, and it cannot by itself describe more complex functional properties such as methylation and epigenetic interactions.

    And ENCODE -- the Encyclopedia of DNA Elements? It brings us full circle to Mendel's elementen, derived after all from the Latin elementum, the "first principle."

    UPDATE (2008-11-12): I'm realizing after reading this over, that I may have given the impression that I embark on some kind of strange journey through linguistics when I lecture on Mendelian inheritance. Not so -- that's just a riff for the blog. "Gene" is introduced in Mendelian terms (where after all I use it most of the time, since I apply population genetics), and then I discuss what exactly that means in terms of the chromosome theory (and ultimately DNA sequences). The result is inevitable: "gene" means different things in these contexts, and obviously must include many distinct kinds of DNA configurations, from coding regions, to regulatory elements, to conserved noncoding segments. Since "allele" is extended even more broadly (any variant site qualifies), I don't think "gene" is the problem here.

Subscribe to epigenetics

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.