The classical model of the gene and information contrasts

There has been much wringing of hands about the definition of the term "gene." The worries aren't new, but they have become a topic this week because of a NY Times article by Carl Zimmer.

It seems that the Mendelian gene just isn't doing it for some geneticists. One wonders whether they would prefer to call themselves DNAticists, alternate splicists and methylatrices.

I lecture on this topic every semester when I introduce Mendelian inheritance, so I don't see the urgency or news value here. The simple fact is that "gene" is a term that was defined to apply to a theory of the transmission of traits -- the Mendelian theory, although Mendel himself had used the term elementen. As Zimmer explains, "gene" came later:

The word was coined by the Danish geneticist Wilhelm Johanssen in 1909, to describe whatever it was that parents passed down to their offspring so that they developed the same traits. Johanssen, like other biologists of his generation, had no idea what that invisible factor was. But he thought it would be useful to have a way to describe it.
The word gene is completely free from any hypothesis, Johanssen declared, calling it a very applicable little word.

But still, the definition is fundamentally Mendelian. And Mendel's theory was fundamentally a theory of contrasts. Peas were round or wrinkled, yellow or green. The exact degree of wrinkling, or greenness, is not an issue. What is important is the contrast between the two phenotypes. That contrast is attributable to the presence of two alleles.

When Fisher (and others) reconciled the Mendelian theory with continuous traits, the meaning of "gene" subtly shifted. Allelomorphs were still responsible for contrasts in the phenotype, but these contrasts were not absolute; they depended on the influence of the environment and the background of other genes. Still, they were inherited as units: the reconciliation was between the Mendelian inheritance and the continuous nature of variation.

I want to point out that the original use of "gene" corresponds very closely to linguistic concepts that are defined by contrasts. The tongue and the rest of the vocal tract can configure themselves in a near-infinite number of shapes, and different people have differently shaped vocal tracts. Out of this continuous range of variability, our auditory processing system perceives a finite (and small) number of distinct phonemes. Those phonemes are defined by their contrasts with other phonemes. English perceives "r" and "l" as distinct phonemes, some other languages do not. Different English speakers generate different "a" sounds, but a single listener can recognize them as the same phoneme because of the contrast between "a" and other vowel sounds.

If we consider short fragments of a recording of continuous speech, we may have much trouble recognizing phonemes. Their function (distinguishing words from each other by minimal contrasts, say "bat" versus "pat") depends on their position and the nearby sounds. If we look at a spectrogram of the frequencies of sounds, we will observe that one person's "b" covers a range of actual sounds, and two people may have quite different "b"s.

Now, I suppose we could say that phonemes are a poorly defined concept, that they correspond to many different things at different parts of an utterance, or that the definition of phoneme must somehow cover every possible case. We might say that "junk" utterances or exclamations, like "AIEEEEEE!" actually contain hidden phonemes that should cause us to revisit what a phoneme really is. We might claim that "phoneme" is on the edge of a revolution, because inspection of spectrograms is showing unexpected diversity.

I don't think any of that is necessary, though, and luckily neither do linguists (as far as I know). Nor do I think any of that is necessary for the concept of a "gene." It's a perfectly good concept in the context of Mendelian inheritance. Life hasn't changed that much in a hundred years.

OK, but what about epigenetics, and alternate splicing, and highly conserved noncoding elements, and all that? First, there's nothing wrong with observing that the actual contrasts described by the Mendelian theory are almost always caused by DNA differences. Sometimes those differences are coding, and change amino acid sequences; sometimes they are regulatory and change RNA transcription. Some of these changes are minor, and the contrasts they generate may grade into the unmeasurable.

Nor is there anything wrong with observing that some mechanisms may cause inheritance that is not Mendelian in character. Certainly X-inactivation is such a process, and in general epigenetic changes may have non-Mendelian effects. I don't see why these processes require us to revisit the definition of "genes", though.

The idea of contrasts may have some value in considering the observation that most noncoding DNA is transcribed into RNA, at least sometimes. Observing such transcription with today's high-sensitivity methods is no guarantee (or maybe no evidence) of functional importance. After all, if a cell can replicate all this junk DNA with little fitness cost, why should we assume that occasional transcription would be unbearably expensive?

But if all that extra junk DNA transcription is just biochemical noise, then it follows that cells may use the contrast between highly transcribed genes and rarely transcribed non-genes. In the simplistic sense, the contrast is information theoretic; based on probabilities of transcription of different DNA segments. Likewise, phonemes are defined by the probabilities of occurrence of sound sequences of certain recognizable qualities.

So in my view, "gene" is only problematic if we insist on confusing distinct biological processes. It is defined by transmission contrasts, in a Mendelian sense; it corresponds often (but perhaps not exclusively) with delimited DNA sequences, and it cannot by itself describe more complex functional properties such as methylation and epigenetic interactions.

And ENCODE -- the Encyclopedia of DNA Elements? It brings us full circle to Mendel's elementen, derived after all from the Latin elementum, the "first principle."

UPDATE (2008-11-12): I'm realizing after reading this over, that I may have given the impression that I embark on some kind of strange journey through linguistics when I lecture on Mendelian inheritance. Not so -- that's just a riff for the blog. "Gene" is introduced in Mendelian terms (where after all I use it most of the time, since I apply population genetics), and then I discuss what exactly that means in terms of the chromosome theory (and ultimately DNA sequences). The result is inevitable: "gene" means different things in these contexts, and obviously must include many distinct kinds of DNA configurations, from coding regions, to regulatory elements, to conserved noncoding segments. Since "allele" is extended even more broadly (any variant site qualifies), I don't think "gene" is the problem here.