john hawks weblog

paleoanthropology, genetics and evolution

open science

  • Update, March 2013

    Sat, 2013-03-16 22:11 -- John Hawks

    We want to thank everyone who has assisted with the project. If you're just arriving here, welcome!

    Rachelle Keeling has completed the first round of analysis, including several parts based on suggestions of the open project. She is moving forward to publish the results and when they are through review we will be able to share more.

  • Open 3-d archive of Kromdraai

    Tue, 2013-05-14 08:38 -- John Hawks

    A new paper in the Journal of Human Evolution by Matthew Skinner and colleagues [1] announces the new availability of an open archive of microCT data from the site of Kromdraai, South Africa, with a large collection of hominin specimens curated in Pretoria at the Ditsong National Museum:

    Digital representations of vertebrate fossils are quickly becoming a standard source of data for scientific inquiry and non-destructive imaging of the internal structure of fossils is opening up new avenues of research that will further our understanding of fossil taxa. The purpose of this paper is to formally announce the availability of high-resolution microtomographic (microCT) scans of hominin fossils from the site of Kromdraai B (known as the ‘hominin site’, as opposed to Kromdraai A, the ‘faunal site’), South Africa. These microCT scans are the result of a collaborative research project between the curatorial institution of the Kromdraai fossils, the Ditsong National Museum of Natural History (DNMNH; formerly the Transvaal Museum) in Pretoria, South Africa and the Department of Human Evolution of the Max Planck Institute for Evolutionary Anthropology (MPI-EVA) in Leipzig, Germany.

    In publishing these scans, we hope to stimulate research on these important specimens using virtual representations of the original fossils. We also envision that increased access to such data will stimulate additional research requiring study of the original fossils.

    This is really an outstanding development, a rich resource for education and further research. I want to congratulate all the people involved!

    The CT archive is hosted at the Max Planck Institute website, "DITSONG - CT Archive".

    Solving the problem of access has been especially difficult for scan data of hominin fossils. The physical specimens that represent ancient hominins are curated by museums in many countries around the world. Museums and other national institutions have a mission to steward their historical and cultural resources. A central archive, whether from a government sponsor like NSF or from a commercial entity such as Google, would be more convenient for researchers and save facilities and labor costs, but might take away some of the stewardship capability exerted by the separate institutions now. So to see a large institute entering into this kind of collaboration with a prominent national museum in another country makes me hopeful that the field is being persuaded about the benefits of more open access -- especially for educational use.

    Paleoanthropology is a comparative science, and good science requires comparing a specimen to the variation across samples of fossils and living primates. This is why casts have had such an important role in the history of the field. The key fossil specimens may never be in the same room with each other, but casts can be brought together for comparisons.

    Now with CT data, technology in principle makes it possible for every paleoanthropologist to have an archive of fossil morphology. That has such a potential to save morphologists time and trouble. Simply keying a publicly available 3-d scan model with a label can allow much clearer communication about the form of a trait that may appear ambiguous on a fossil fragment.

    So why has this technology taken so long to get into the hands of paleoanthropologists? In 2005, I reflected on an article that forecasted a bright future for CT data archives in paleoanthropology: "Frontiers of human origins". I wrote:

    Personally, I think CT will have a limited set of impacts. The best thing is that it will allow any lab in the world to have as full a set of comparative data as have been released. Currently, it's useless for that purpose; there's just not enough access. But that is changing, and CT scans are as useful to a practiced eye as casts -- which are much less available today even as CT increases. In fact, high-resolution CT may essentially end casting of new fossils, since that is one of the major sources of damage. We'll be doing a lot of comparative work with imaging in the future.

    There is still not enough access. There are a very limited number of scans of hominin fossils that are openly available for download -- for example, Harvard's Peabody Museum made CT data for the Skhul V cranium available several years ago. Other CT data are available for sale; some are available with a consortium membership, and some can be acquired by direct inquiry to researchers. Right now, a student cannot use these Kromdraai data in comparison with other open fossil data unless that student is well-connected in the hierarchy of paleoanthropology. The day is still far away when every laboratory has access to a useful archive of fossil hominin morphology.


    References

    Synopsis: 
    A new resource gives unprecedented access to imagery of a fossil hominin collection
  • Scholarship and experience outside the academy

    Thu, 2013-02-07 11:27 -- John Hawks

    The Wall Street Journal has an inspiring story of a hairdresser who turned her curiosity about Roman hairstyles into novel scholarship: "On Pins and Needles: Stylist Turns Ancient Hairdo Debate on Its Head".

    In 2007, she sent her findings to the Journal of Roman Archaeology. "It's amazing how much chutzpah you have when you have no idea what you're doing," she says. "I don't write scholarly material. I'm a hairdresser."

    John Humphrey, the journal's editor, was intrigued. "I could tell even from the first version that it was a very serious piece of experimental archaeology which no scholar who was not a hairdresser—in other words, no scholar—would have been able to write," he says.

    Ms. Stephens' article was edited and published in 2008, under the headline "Ancient Roman Hairdressing: On (Hair)Pins and Needles." The only other article by a nonarchaeologist that Mr. Humphrey can recall publishing in the journal's 25-year history was written by a soldier who had discovered an unknown Roman fort in Iraq.

    There is so much room in archaeology for people with deep subject knowledge, but not necessarily archaeological training, to make original contributions. Last night's NOVA episode, with a group of people trying to reconstruct Egyptian chariots, is another case where an ancient tradition can only be examined by those with insights about the subject beyond the historical and archaeological record -- in this instance, how to get a team of horses to work together using bridles, bits and yokes that no one had seen used in more than 2000 years.

    One of the great potential strengths of online media and open access is to enable this kind of participation by non-academicians. I'm hoping to capture some of that enthusiasm and knowledge in an upcoming project.

    (via Charles Mann)

  • "Brittle techniques"

    Mon, 2013-01-28 00:03 -- John Hawks

    I was pointed to a rant from early last year written by Fred Ross: "A farewell to bioinformatics".

    Like any good rant, it is extreme and I don't endorse it, but like all good rants it has kernels of truth.

    This all seems an inauspicious beginning for a field. Anything so worthless should quickly shrivel up and die, right? Well, intentionally or not, bioinformatics found a way to survive: obfuscation. By making the tools unusable, by inventing file format after file format, by seeking out the most brittle techniques and the slowest languages, by not publishing their algorithms and making their results impossible to replicate, the field managed to reduce its productivity by at least 90%, probably closer to 99%. Thus the thread of failures can be stretched out from years to decades, hidden by the cloak of incompetence.

    Data structures in bioinformatics should be designed for robusticity and ease of re-use by different research teams. But that won't happen unless grant money to support data collection requires it. Open access to data is wonderful, but it is only the first step toward open science.

  • Crowdsourcing paleoecology

    Mon, 2013-01-07 23:55 -- John Hawks

    Jacqueline Gill reports on a conference with a provocative organization: "Crowd-sourcing the 50 most pressing questions in paleoecology".

    Conference attendees (of which I believe were around 60) were emailed the questions in advance, and asked to narrow them down to each of our own individual top fifty, as well as rank which subgroups we were most interested in– I ended up in Biodiversity Through Time. Every subgroup had a scribe (to record information about which questions were particularly contentious, or when concerns or points were raised), a chair, and a co-chair (for organizational and time-keeping purposes). Each subgroup was given dozens of questions, organized into loose themes, that we had to narrow down to twenty in the first day. This process was much more complex that it initially sounds– after an initial round of voting, there was a considerable amount of discussion, word-smithing, and merging of questions.

    What a neat idea -- a conference with a real agenda and public product at the end of it. Like paleoanthropology, paleoecology is a field where data are hard to obtain and require very specialized analytical methods. Getting the public involved in the science means finding ways to get people engaged in the questions and hypothesis formation. A ranking of important questions is a great idea, and may help to shape granting priorities.

  • Open access and Creative Commons

    Wed, 2013-01-02 21:23 -- John Hawks

    Cameron Neylon comments interestingly in Nature on the intellectual property drawbacks of publications that are free to access but not to reuse: "Science publishing: Open access must enable open use".

    The success of PubMed Central and of other disciplinary and institutional repositories illustrates a weakness. Although millions of articles are accessible to read, the majority of them cannot be used for anything except reading. If, for instance, you wish to index all the gene names in a set of papers, put them on a website, translate them, use text or images in a summary or even just print out several copies of the collected papers, you are limited to a much smaller set of around 500,000 articles that carry a Creative Commons licence (see go.nature.com/heaqoe). For any commercial purpose, which could include simply making copies for a class or company meeting, one is restricted to the smaller subset of papers that have a CC BY licence.

    A heated discussion arises in the comments section. I would point out that making an index of gene names does not violate copyright in the U.S., and many other data reuses are perfectly consistent with publisher copyright. Image reuses are more important to me, as the restrictive copyright terms on distributing images of fossils on a website have severely restricted what I have been able to do here.

    I do not presently have much under a Creative Commons license, but am exploring this option for some upcoming projects.

  • Bigfoot DNA?

    Mon, 2012-11-26 09:57 -- John Hawks

    A press release claims the recovery of Sasquatch DNA:

    “Sasquatch nuclear DNA is incredibly novel and not at all what we had expected. While it has human nuclear DNA within its genome, there are also distinctly non-human, non-archaic hominin, and non-ape sequences. We describe it as a mosaic of human and novel non-human sequence. Further study is needed and is ongoing to better characterize and understand Sasquatch nuclear DNA.”

    This has been developing for a while. Until I see the data, I am withholding judgment.

    One benefit of the world of genetics as opposed to traditional anthropology: The original sequence data must be made available to the public. No data, no discovery.

  • Link parade, 2

    Tue, 2012-10-23 23:43 -- John Hawks

    Ben Phelan at Slate writes about the recent evolution of lactase persistence: "The Most Spectacular Mutation in Recent Human History".

    The plot is still fuzzy, but we know a few things: The rise of civilization coincided with a strange twist in our evolutionary history. We became, in the coinage of one paleoanthropologist, “mampires” who feed on the fluids of other animals. Western civilization, which is twinned with agriculture, seems to have required milk to begin functioning. No one can say why. We know much less than we think about why we eat what we do. The puzzle is not merely academic. If we knew more, we might learn something about why our relationship to food can be so strange.

    I wanted to quote that passage because it was my friend Greg Cochran's son Roddy who coined the term "mampires", which is exceptionally clever. On the article as a whole, I think Phelan makes too much of the "mystery" aspect of the advantage of lactase persistence. There are really only two serious hypotheses and none of the possible explanations are mutually exclusive. I would have liked to see the article devote more attention to the multiple lactase persistence mutations in other populations, which together point to the very great advantage of the trait in association with dairying.


    David Dobbs writes in the New York Times about the genetics of intelligence and what we know (and don't know) about it: "If Smart Is the Norm, Stupidity Gets More Interesting". The piece emphasizes that geneticists haven't had much luck finding genes that explain the heritability of intelligence. The problem of "missing heritability" has loomed over complex trait genetics for the last several years, meaning that we can estimate the heritability of traits with twin studies and other traditional pedigree approaches, but single gene loci do not account for much of the variance of these traits. One possibility is that common genes have such small effects that they are statistically difficult to find.

    Another possibility is that very rare genes of small effect -- or new mutations -- may explain the heritability of such traits within families. The most likely reason for large-effect mutations to be rare is if they are deleterious. Across a population, this hypothesis of many rare deleterious mutations is called "genetic load":

    But in some other genetic realms we do differ widely, for example, mutational load — the number of mutations we carry. This tends to run in families, which means some of us generate and retain more mutations than others do. Among our 23,000 genes, you may carry 500 mutations while I carry 1,000.

    Most mutations have no effect. But those that do are more likely to bring harm than good, Dr. Mitchell said in an interview, because “there are simply many more ways of screwing something up than of improving it.”

    This is a nicely balanced treatment and emphasizes evolutionary approaches in an accessible way for Times readers.


    From the San Jose Mercury News, a story by Lisa Krieger: "Open-source science helps San Carlos father's genetic quest".

    One tiny flaw in one gene in one little girl. That explains why Beatrice Rienhoff, 8, is so lean and leggy.

    ...

    No one else in her family had such a syndrome. In fact, apparently no one else in the world did either.

    Rienhoff -- a biotech consultant trained in math, medicine and genetics at Harvard, Johns Hopkins and the Fred Hutchinson Cancer Research Center in Seattle -- launched a search.

    Yes, you can do this now. This father is now making transgenic mice with his daughter's mutation to better understand its effects.

    (via Gene Expression)


    Ken Weiss writes about some of the reasons a family medical history is a better predictor of individual health than genotyping: "23andLess".

    The most likely truth at this stage is that such common traits like heart disease or how tall or heavy you are, are determined by a very large number of genes, mostly with individually very small effects. Each person with the 'same' trait--each diabetic, say--has that trait for a different genetic reason. Individual genetic variants may be causal contributors, but they are not very important.

    I agree with his point...although as I was reading the post, it occurred to me that doctors treat family history as if it were much more effective than it should be, if causal variants really have small effect sizes. Complex disorders are not the same as Mendelian disorders with low penetrance. Having a grandfather with heart disease, for example, should mean substantially less to you than having a grandfather who is tall.

  • Without the code, it's hand-waving

    Mon, 2012-09-03 23:03 -- John Hawks

    A new post by C. Titus Brown is worth reading: "Anecdotal science"

    I'm starting to notice that a lot of bioinformatics is anecdotal.

    People publish software that "works for them." But it's not clear what "works" means -- all to often either the exact parameters or the specific evaluation procedure is not provided (and yes, there's a double standard here where experimental methods are considered more important than computational methods).

    This means that their result is not an example of computational science. It's an anecdote.

    He gives an example and discusses the real cost, which is that a published advance really doesn't advance anything, because everyone else has to spend so much time trying to get the code to work for their projects.

    Time after time I'm reminded of my conversation with the big data astronomer, who reflected that his friends who are biologists complain that students are all being trained in computer programming instead of biology. Compared to astronomy, he said, biologists don't have a data problem at all.

    Clearly, bioinformatics isn't taking seriously the need to really engineer software, with documentation and standard programming interfaces.

  • "We find it hard to see what publication would achieve at this stage"

    Mon, 2012-08-27 21:05 -- John Hawks

    Theoretical physicist Terry Rudolph shares a story about preprints and the editorial process at a top science journal: "Guest Post: Terry Rudolph on Nature versus Nurture". In short, there was no problem posting a potentially interesting physics paper on the arXiv, and then getting it reviewed by the journal. But when the authors posted a follow-up preprint, it sabotaged the "interest" of their first submission:

    While it mildly rankles that my own participation in that “wide debate” was curbed by the blurry lines of their own policies, I’m not particularly upset by the episode – perhaps indicative of my well documented own laissez-faire attitude to publishing, but perhaps because I know the result is ultimately more important than the journal it appears in.

    The ironic part is that Nature wrung the news value out of the first preprint with coverage from its news division. Rudolph's story gives the appearance that the journal was happy to promote the work before it accepted the paper, but later claimed it was not newsworthy.

    I don't really have any problem with journals pursuing papers that are newsworthy. My problem is that these journals make papers appear newsworthy by their control of information flow. I've said it before ("The costs of publication delays"): We need to eliminate the myth that publication itself is a newsworthy event.

Pages

Subscribe to open science

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.