john hawks weblog

paleoanthropology, genetics and evolution

copy number variants

  • Copy number variation in 1000 Genomes

    Sat, 2010-10-30 13:01 -- John Hawks

    When I wrote earlier in the week about the 1000 Genomes Project results, I mentioned that a second paper was being published in Science. That paper, by Peter Sudmant and colleagues [1], works to quantify the amount of copy number variation of genes in the genomes of the study participants.

    It can be challenging to study copy number variation using shotgun sequencing methods, because each duplicated part of the genome creates multiple alignment targets for short reads. One way to deal with this problem is to use the drawbacks of shotgun sequencing as an advantage: Look for template regions of the genome that have much higher read depth than others. These places include many where a gene has been duplicated in the target genome, giving one-and-a-half or twice the number of reads for each duplication. Looking at read depth genome-wide is a quick way to assess copy number variation at sites where it was previously unknown. Once these are ascertained in a sample of genomes, they can be targeted for further study, including characterizing the boundaries of the duplicate region.

    The paper describes this methodology in some detail, with various embellishments to get more precise answers to certain kinds of structural questions. They developed a large set of SNPs that differentiate paralogous gene copies, among other things allowing them to examine which members of various gene families had been duplicated, and whether events were shared between populations.

    Through our analysis, we identified that duplicated regions are more likely to be stratified between human populations when compared with copy number variation within unique regions of the genome. For example, 59 (92%) of the top 64 stratified gene families overlap segmental duplications (P –16). Remarkably, many of these highly polymorphic genes map to duplications that promote recurrent rearrangements associated with intellectual disability, autism, schizophrenia and epilepsy. We hypothesize that the extreme polymorphism may contribute to genomic instability associated with disease and may predispose certain populations to different chromosomal rearrangements (30).

    Segmental duplications can be relatively effective ways to change the amount of gene product without changing the gene product. In other words, a duplication can increase the dosage of a particular gene product. That can sometimes be very useful. For example, salivary amylase production varies among people due to the number of duplicate copies of the gene [2]. The copy number variation is related to population history of agricultural subsistence -- old agricultural populations have more amylase copies. It's a simple case where the dietary ecology favors a dosage increase for an enzyme.

    Gene duplications and other structural changes to the genome are rare events -- any particular kind of change is substantially less likely than a single nucleotide mutation at a given point in the genome. So it is of some interest to consider which regions are actually invariant in copy number -- duplications that occurred on the human lineage but have been conserved in more recent populations -- because these may reflect old adaptations essential to the evolution of hominins. Here's what the paper concludes:

    We have also defined the ~49% of gene duplicates that are largely invariant in copy among humans. Although this is based only on an assessment of 159 genomes from select populations, the fact that this fraction of genes remains copy number invariant in a milieu of recurrent unequal crossover suggests functional importance. Among these, we find a number of genes involved in neurological development and disease. We note that many of these duplicated genes are themselves incomplete and may represent nonprocessed pseudogenes, which may modulate the expression of the ancestral gene. The characterization of the most recently duplicated genes should facilitate identification of those that acquired new functions (neofunctionalization) versus those that have become pseudogenes or have partitioned their function among duplicate copies (31).

    I was going to write that there's not much analysis in the paper and let it go at that. But the paper has a 108-page supplement.

    I know I write this like once a week, but what the heck is the point of a 4-page paper with a 108-page supplement? Granted, 7 of the supplement pages are the author list (!!), but I view the whole thing mainly as a rip-off for the people who did the analyses in the supplement. Why don't they get their own first-authored publications? Are other journals satisfied to accept first-authored versions of analyses that have already been in a supplement in Science?

    The supplement lists 64 gene families including segmental duplications that differ substantially in average copy number among the CEU, YRI and CHB/JPT samples to which the low-coverage whole-genome sequencing has been applied thus far. The table (S8) lists the mean copy number in the three populations and the total variance in copy number; the key statistic is a value called Vst, which is analogous to FST for length variations.

    These are not generally duplications of whole genes, and their boundaries don't generally correspond to the boundaries of coding regions or exons. Without further analysis, it is not clear which of these duplicated regions may have functional import. Many of the additional copies may be inactive, either because of pseudogenization or because the duplication may not include the promoter/enhancer elements needed for gene expression. Some of the duplications occur in regions with known pseudogenes. The "involvement" of some genes in these regions with neurological development and disease is interesting, but the paper attempts no statistical assessment of this. It's a list of candidates, with some interesting ones that are obviously worth further examination, but without a clear story for any of them.

    It is maybe interesting that salivary amylase didn't make the list. It's not clear from the supplement whether that is an omission or whether its population differentiation, great as it is, is not as high as the lower cutoff. The greatest differentiation for amylase copy number is between populations that are not yet represented in the 1000 Genomes whole-genome sequencing.

    That raises an interesting question: What if we applied the same methods to the read data from some of the other public genomes? The Bushman genomes from earlier this year are an especially interesting sample because they are notably not drawn from a long-time agricultural population. In which areas would they score atypical copy number variation compared to the 1000 Genomes samples?


    References

  • Unbelievable Y chromosome differences between humans and chimpanzees

    Thu, 2010-01-14 00:11 -- John Hawks

    Holy crap!

    Indeed, at 6 million years of separation, the difference in MSY gene content in chimpanzee and human is more comparable to the difference in autosomal gene content in chicken and human, at 310 million years of separation.

    So much for 98 percent. Let me just repeat part of that: humans and chimpanzees, "comparable to the difference ... in chicken and human".

    This is from a new paper that's just shown up in the Nature advance publication zone. The authors are Jennifer Hughes and colleagues, and the subject is the first complete sequencing of the chimpanzee Y chromosome. "MSY" stands for "male-specific region of the Y chromosome" -- it's most of the Y, aside from a small fraction that recombines with the X chromosome.

    The Y chromosome was part of the initial chimpanzee genome draft, and was recognized then as a "clear outlier" in showing low human-chimpanzee sequence similarity (Chimpanzee Genome Consortium 2005). But it wasn't obvious just how different it was because the relatively short sequencing reads aligned fairly well with the human draft. That comparison also seems not to have included the missing genes (they might have just been missed during sequencing), or duplications. Moreover, the Y chromosome includes a high fraction of repetitive sequence, including long front-to-back, or "palindromic" passages. Only with very long reads with long overlaps is it possible to straighten out the large-scale sequence, and thereby detect sequence reorganizations and large copy number variants. This kind of intensive sequencing has so far been completed only for chromosome 21 and now the Y chromosome.

    I can't believe how sedated the reaction to this paper has been so far. The outcome of the sequencing is really, really weird. More than thirty percent of the chimpanzee Y chromosome has no homolog in humans, and likewise for the human Y in chimpanzees.

    I mean, really -- here's a map:

    Chimpanzee compared to human Y chromosome

    Just glancing at the ideograms, they don't even look like homologous chromosomes!

    Obviously they are; there's a whole lot of homologous sequence in there including functional genes. But the structure of both human and chimpanzee Y chromosomes has evolved incredibly fast compared to the rest of the genome.

    The central question: beyond its interest for Y chromosome structural evolution, what does this result say about the evolution of human (and chimpanzee) phenotypes?

    Option 1: Maybe nothing. The main mechanism for the rapid structural evolution was probably autologous recombination. Imagine that the Y chromosome wriggles around and different copies of repetitive sequences get together with each other.

    The molecular mechanisms that enabled this wholesale remodelling of ampliconic regions merit consideration. Although the chimpanzee and human MSYs do not normally participate in meiotic exchange with a partner chromosome, the mirroring of sequences in the ampliconic regions provides ample opportunity for ectopic homologous recombination within the MSY. This recombinational proclivity is well documented in the human MSY, where it has repeatedly given rise to large-scale structural polymorphisms during the past 100,000 years of human history as well as to Y-chromosomal anomalies that cause spermatogenic failure and sex reversal in current generations. We suggest that ectopic homologous recombination between MSY amplicons has similarly accelerated structural remodelling of the MSY in the chimpanzee and human lineages during the past 6 million years.

    That leads to rapid structural evolution, but not necessarily any functional changes.

    Option 2: Massive changes in gene regulation. Then again, widespread relocations of genes have a way of stripping them apart from upstream (or downstream) elements that may regulate their expression. Besides that, chimpanzees have lost several genes entirely, while humans have picked up a few that weren't in the common ancestor. So there's a potential for phenotypic evolution from these changes, possibly reverberating through the genome.

    In aggregate, the consequence of gene loss and gain in the chimpanzee and human lineages, respectively, is that the chimpanzee MSY contains only two-thirds as many distinct genes or gene families as the human MSY, and only half as many protein-coding transcription units.

    That's pretty amazing. They speculate that the most important phenotypic correlates of these genetic changes may be related to sperm or testicular function, which certainly is a target of rapid evolution elsewhere in the chimpanzee and human genomes.

    Option 3: Hitchhiking. OK, this isn't different or mutually exclusive from the above, but it's worth remembering that it only takes a single advantageous mutation to fix the entire Y chromosome in the population. That event carries with it whatever strange mutations might be on the same copy as the initial advantageous change. This kind of event may have happened dozens or even hundreds of times on the chimpanzee and human lineages. Indeed, if it was common enough, hitchhiking can drive its own dynamic, since it tends to fix lots of slightly deleterious variations that later have to be repaired or accommodated.

    An interesting possibility: Maybe the extreme evolution of the Y chromosome in the emerging human and chimpanzee lineages explains the unusual similarity of their X chromosomes.

    I'm thinking back to the story about chumans and the divergence of chimpanzee and human lineages ("The dawn chumans"). Patterson and colleagues (2006) suggested that the two lineages had undergone some kind of hybridization event long after they began to diverge. This surprising hypothesis was meant to explain why the X chromosome shows a substantially lower level of genetic difference between humans and chimpanzees, compared to the average autosomal locus. I don't think that a late hybridization is necessary to account for X chromosome similarity. A large ancestral effective population size implies a wide variance in coalescence times in the ancestral population; the average on the X will be lower than the autosomes, and if there was any hitchhiking the X would be lower still.

    But...that X chromosome similarity might have a different explanation. A fraction of the human Y chromosome continues to recombine with the X. Imagine an initially rapid divergence of Y chromosomes within the chuman population. For a while, there might have been a strong selection pressure on the ancestral X to equip it for the structural diversity of the Y. Possibly an inverse relation would have emerged: the as the Y becomes variable (possibly in partially isolated subpopulations), the X adapts to that variation until reproductive isolation finally occurs.

    Could this have been the proximate cause of human-chimpanzee reproductive isolation? The sex chromosomes are often implicated in speciation through Haldane's rule. It's a bit of speculation, but not too far from some discussion within the paper, particularly the relation between Y chromosome variations and infertility.

    References:

    Hughes JF and 16 others. 2010. Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature (early online) doi:10.1038/nature08700

Subscribe to copy number variants

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.