john hawks weblog

paleoanthropology, genetics and evolution

biotech

  • Exome sequencing into Norway national health care

    Sat, 2012-02-04 10:24 -- John Hawks

    From Ewen Callaway: "Norway is set to become the first country to incorporate genome sequencing into its national health-care system."

    In its three-year pilot phase, the Norwegian Cancer Genomics Consortium will sequence the tumour genomes of 1,000 patients in the hope of influencing their treatments. It will also look at another 3,000 previously obtained tumour biopsies to get a better idea of the mutations in different cancers, and how they influence a patient's response to a drug. In a second phase, the project will build the laboratory, clinical and computing infrastructure needed to bring such care to the 25,000 Norwegians who are diagnosed with cancer each year.

    Expect to see much, much more of this.

  • Synthetic biology explainer

    Sat, 2012-01-14 14:47 -- John Hawks

    Nice piece on synthetic biology by Adam Rutherford:

    But Freckles is a long way from normal. She is an extraordinary creation, an animal that could not have existed at any point in history before the 21st century. She is all goat, but she has something extra in every one of her cells: Freckles is also part spider.

    UPDATE (2012-01-14): A knowledgeable reader writes:

    Ah, journalists! What do transgenic animals have to do with synthetic biology? Absolutely nothing, in fact. And the hyperbole fails, too. If the protein was human instead of arachnid (as is the with many cows now), that goat would be part human then? Which would then mean that a lot of bacterial, insect and mouse cells I grow in the lab are part human, too! Woo-hoo! Meet Dima Klenchin, a synthetic biologist...

    And Venter Institute's bacterium is not a synthetic life by any stretch of imagination and neither is anything else described in the article (modifying microorganisms for industrial production is about two decades old news). In fact, "synthetic biology" seems to be simply a new buzz word to get funding easier. You see, "transgenic organisms" is getting too routine and stale from the funding point of view and "nanotechnology" too has been overused to the point of losing much of its buzz power as well. So pharmacology is now "chemical biology" and gene engineering is now "synthetic biology".

    As for the truly synthetic life, we are finding that it is a very hard going. Everyone would be enormously impressed by a single brand new enzyme or a metabolic pathway (no aping from existing prototypes in nature). Alas, even that turns out to be easier said than done. But fear not - all those computer scientists and physicists will soon, veeeery soon, come to the rescue. :-)

    I thought it was a fun article but your points are well taken. I think that there is a faction who are trying to "define down" the term synthetic biology so it applies to everything from recombinant DNA upward. Venter obviously hasn't helped matters by trying to lower the bar for an artificial life form.

    But maintaining any useful distinction may become impossible anyway if molecular machines can be made to interact in any useful way with endogenous genomes. Of course if they make spider silk comes straight out the goat's udder it would be more awesome than tomacco!

  • The Mayflower criminal registry

    Fri, 2012-01-13 22:25 -- John Hawks

    Of some interest with respect to DNA databases and privacy concerns: "DNA links 1991 killing to Colonial-era family".

    The DNA sample was taken in the death of 16-year-old Sarah Yarborough, who was killed on her high school campus in Federal Way, Washington, in December 1991. The King County Sheriff's Office has circulated two composite sketches of a possible suspect -- a man in his 20s at the time with shoulder-length blonde or light brown hair -- but had been unable to put a name to the sketch.

    In December, though, the department sent the DNA profile to California-based forensic consultant Colleen Fitzpatrick. Fitzpatrick compared the profile to others in genealogy databases and found the closest match was to the family of Robert Fuller, who settled in Salem, Massachusetts, in 1630 and had relatives who came over before him on the Mayflower.

    This is a Y chromosome match based on the genealogical research of people who may be completely unknown to the "suspect". Fitzpatrick offers that a Y-chromosome match may be expected to share a surname, which is probative in the forensic situation. Obviously there are many possible scenarios in which such information will not lead to discovery of a suspect: the chance of non-acknowledged paternity events across 200 years is very high. I don't view the result as strongly actionable, but I do think it raises important questions about the future of genealogy databases.

    We are near the time when whole-genome sequencing will make this kind of identification much more likely because unique genetic matches to 3rd and 4th degree relatives will be plausible. Finding a handful of rare mutations shared between a crime scene sample and an individual in a whole-gneome database would be a strong indication of a relationship. It's possible that the databases for whole genomes will grow faster than the technology will allow reliable whole-genome sequencing from a crime scene sample. So in this case, the issues with database use may be primary.

    It would be an interesting exercise to estimate the fraction of unknown samples from crime scene Y chromosome and mtDNA that could be matched to a 10th-degree relative in the Genographic (or any other large) dataset.

  • A quick look at your Neandertal fraction

    Fri, 2011-12-16 15:13 -- John Hawks

    The 23andMe blog, the Spittoon, has a description of their new technique to use 23andMe SNPs to estimate any customer's fraction of Neandertal: "Find your inner Neanderthal".

    The result is a rough-and-ready numerical estimate of your Neandertal ancestry fraction. For me it's 2.5 percent. Gretchen is 3 percent, and she's been lording it over me all day.

    The estimate is the work of Eric Durand, who broke ground on the D-statistic method for finding introgression from archaic genomes [1]. He has made public a short white paper describing the application.

    So far, all estimates of Neandertal (or other archaic human) ancestry have come from the proportion of a genome (or genotypes from a genome) that are shared and derived with Neandertals. That includes the results I've been posting here for the 1000 Genomes Project samples this week.

    The next step is to uncover exactly which parts of a person's genome have come from Neandertal ancestors. To discover this, we have to further determine which shared alleles come from recent introgression as opposed to ancient incomplete lineage sorting. We have been working very hard on that problem here, as you'll see, and it has been an important aspect of our work in pigmentation genes in the archaic genomes.

    If you have been considering getting your genotypes from 23andMe, it has become a very good time to do this. The overall fraction of your DNA derived from Neandertals is only the beginning. Soon we'll be able to specify which parts, and in a few cases we'll have a good guess as to what difference it makes. If you want to participate in this research, I'm hoping to gather as many interested people as I can -- so keep your eyes here over the next month.

    And if you are interested in having your genotypes done, feel free to use my link to the 23andMe promotion. I've been very happy with their way of presenting the genotypes and their updates, and know many other people who have also found it interesting. As I wrote a couple of years ago, it's not something to spend your food money on, but it does have an entertainment value. And the potential to be an active research participant.


    References

  • Sequencing is outpacing computing

    Wed, 2011-11-30 23:36 -- John Hawks

    The New York Times notices DNA sequencing's Malthusian trap: "DNA sequencing caught in deluge of data."

    That is a decline [in sequencing costs] by a factor of more than 800 over four years. By contrast, computing costs would have dropped by perhaps a factor of four in that time span.

    The lower cost, along with increasing speed, has led to a huge increase in how much sequencing data is being produced. World capacity is now 13 quadrillion DNA bases a year, an amount that would fill a stack of DVDs two miles high, according to Michael Schatz, assistant professor of quantitative biology at the Cold Spring Harbor Laboratory on Long Island.

    I have spoken with several scientists in other fields, like astronomy and particle physics, who deal with truly big datasets. Until now, biology data has actually been pretty small potatoes compared with the sheer amount pumped out by large projects in other fields. But that's changing. The Times article points out a unique aspect of the data problem in genetics: There are now thousands of labs that can generate large datasets, many of whom have no special plan for data archiving or availability.

    “Google has enough capacity to do all of genomics in a day,” said Dr. Schatz of Cold Spring Harbor, who is trying to apply Google’s techniques to genomics data. Prodded by Senator Charles E. Schumer, Democrat of New York, Google is exploring cooperation with Cold Spring Harbor.

    Google’s venture capital arm recently invested in DNAnexus, a bioinformatics company. DNAnexus and Google plan to host their own copy of the federal sequence archive that had once looked as if it might be closed.

    I don't see Google as a deus ex machina for this one -- although I do observe that several other big data projects are sponsored by large Microsoft investors or founders.

  • Sequence the old, fast

    Wed, 2011-10-26 10:13 -- John Hawks

    The Archon Genomics X Prize is a $10 million contest to see what company or organization can develop a low-cost accurate sequencing technology. The AP's Malcolm Ritter reports that the testbed sequences will be 100 centenarians ("Secrets of long life sought in DNA of the elderly"), which is a pretty interesting test cohort.

    Protective features of a centenarian's DNA can even overcome less-than-ideal lifestyles, says Dr. Nir Barzilai of the Albert Einstein College of Medicine in New York. His own study of how centenarians live found that "as a group, they haven't done the right things."

    Many in the group he studied were obese or overweight. Many were smokers, and few exercised or followed a vegetarian diet. His oldest participant, who died this month just short of her 110th birthday, smoked for 95 years.

    "She had genes that protected her against the environment," Barzilai said. One of her sisters died at 102, and one of her brothers is 105 and still manages a hedge fund.

    I doubt they'll be able to explain much of the variance in longevity with 100 genomes, but they'll surely find some things that make a small difference and will lead to a newsworthy outcome. Larger samples will find more of the genetic pathways that influence lifespan, as will adding a wider range of elderly samples from other populations.

  • Exome sequencing as a stopgap

    Fri, 2011-10-14 12:09 -- John Hawks

    The new Genome Biology has a perspective piece by Jacob Tennessen and colleagues, titled "The promise and limitations of population exomics for human evolution studies" [1]. Exomics is the study of the coding part of the genome, which is only 30 megabases as opposed to the 3 gigabases of a whole genome. Today it is possible to apply methods that sequence only the protein-coding parts of the genome, by combining methods that capture such regions with next-generation sequencing. The result is vastly cheaper than a whole genome, and some of this cost savings can be applied to increase the coverage, which increases the sequence accuracy.

    Tossing away 99% of the genome is not an ideal sampling strategy for many purposes. However, when it comes to phenotype prediction, we can make some predictions about how changes in amino acid sequences will affect protein function. Many important phenotypic changes are caused by non-coding variations in gene regulation, but genetics has not yet reached a state of knowledge where these can be readily predicted. So, if we're sequencing people's genomes for the purposes of finding disease or phenotype variants, exome sequences give much of the information that we can presently evaluate.

    James Hadfield noted the spree of exome sequencing publications at his blog, Core Genomics ("Exome capture comparison publication splurge"). He tags the rationale for

    A lot of people I have talked to are now looking at screening pipelines which use Exome-Seq ahead of WGS to reduce the number of whole Human genomes to be sequenced. The idea being that the exome run will find mutations that can be followed up in many cases and only those with no hits can be selected for WGS.

    I have heard a number of geneticists looking at exome sequencing as an intermediate step in population genetics, a way to increase the size of samples more affordably than whole genome sequencing makes possible at present. I don't think this will last long, as whole genomes offer much more for population genetic analysis and are rapidly dropping in price, but that depends on how technology develops. If we are consistently in the situation where researchers can multiplex 50 exomes at high coverage for the same price as one whole genome, it may make sense to use that strategy for a long time.

    23andMe is starting an exome sequencing project. Daniel MacArthur's comments on G+ and the subsequent reader comments are interesting.


    References

  • Personalized genomics beats personalized genetics

    Fri, 2011-09-16 01:00 -- John Hawks

    Joe Pickrell encountered sticker shock when faced with the prospect of a medical sequencing test: "The week that I worried I had a rare genetic disease".

    What’s really striking to me is that the price of whole genome sequencing is already competitive with commercial Sanger sequencing tests of individual genes.

    Amazing how much patent-laden (and labor-intensive) sequencing work can charge to insurance.

  • Anodyne DTC genetics

    Sun, 2011-04-24 16:51 -- John Hawks

    The Wall Street Journal has an op-ed by Matt Ridley, on the topic of possible regulation of consumer genetic testing. He writes that after years of relative non-interest in such tests, he ordered his own because of the likelihood that the FDA will limit their ability in the near future.

    The champions of regulation respond that some firms in the direct-to-consumer genetic-testing industry are sometimes much exaggerating the health benefits of genotyping. As I said above, most results are anodyne and close to useless in terms of telling you how to live your life, but that is not how it sounds on the websites. However, this is not an argument for FDA medical-device regulation or requiring doctors' prescriptions before testing. It is an argument for plain, old-fashioned truth-in-advertising regulation of the kind effected by the Federal Trade Commission.

    The AMA always seems to think it's fighting Doc Brinkley. Personally, I'd say the supplement industry is a far greater threat to the public health than DTC genetic testing, and is surely a better use of the FDA's time.

    UPDATE (2011-04-24): More on Gene Expression. Much tweeting of the final line of the op-ed, as well:

    Genetic knowledge, whether the high priests like it or not, is going to be a crowd-sourced phenomenon.

  • Delete the troubling data

    Thu, 2011-04-14 14:00 -- John Hawks

    Misha Angrist turns on the sarcasm filter for a proposal to discard raw data that may trouble research subjects ("If you want to destroy my sweater"):

    Pay attention, kids: If it poses an ethical problem, then the obvious thing to do is to just throw it away! Delete it! Burn it! Shred it! Avert your eyes! The patient/research participant/taxpayer won’t mind! Trust me!

    This is so annoying. It's cheaper in many contexts to do genome-wide genotyping than assay specific gene variants. So we'll increasingly see gene testing done on whole-genome platforms of various kinds.

    But doctors don't order clinical tests for whole genomes, they order particular genetic tests. It's an obvious strategy for a testing company to provide only the ordered results, and either retain or discard further data, in the hopes of additional sales later. The company can upsell its "filtered" service as including additional validation or additional interpretive information of the kind that software can automatically add (for example, short-range phased haplotypes).

    Angrist references a suggestion from an academic paper that a subject's APOE status should be blindly deleted from such results, to avoid the necessity of informing the subject about Alzheimer's risk.

    This is the sort of thing we need to be thinking harder about -- how to alert unsuspecting people to minor or moderate risks that will be routine in whole-genome data.

Pages

Subscribe to biotech

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.