john hawks weblog

paleoanthropology, genetics and evolution

How much data in your genome

Sat, 2008-06-28 10:46 -- John Hawks

Daniel Macarthur, of Genetic Future, reviews the amount of information required to store genomic information. Naturally, you'd probably think it was around 12 billion bits (2 bits per base pair), but sequencing technologies and the availability of references from other people make things a little more complicated.

This interesting quote about the raw image files generated by the Illumina platform presents some of the range of complications:

Almost as soon as these images are generated they are fed into an algorithm that processes them, creating a set of text files containing the sequence of each of the fragments. The image files are then almost always discarded. Why are they discarded? Because, as you will see in a minute, storing the raw image data from each run in even a moderate-scale sequencing facility quickly becomes prohibitively expensive - in fact, several people have suggested to me that it would be cheaper to just repeat the sequencing than to store these data long-term.

An accurate read requires lots of redundant bits, which adds up to lots and lots of data storage. If these are winnowed down to a real "best" sequence, then you're back to 12 billion bits (=1.5 gigabytes), more or less. Of course, most of that sequence is redundant and may be significantly compressed. And if you compare with a reference sequence, really a small amount of information is sufficient to distinguish your genome compared to the reference. Anyway, all this is explained at the link.

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.