NSF and data access

Mark Weiss from NSF appeared at the AAPA business meeting to discuss recent changes in the funding guidelines from the Physical Anthropology program. The most significant change, effective in the upcoming (July 2005) funding cycle, is the requirement to file and follow a data access plan with every grant. This change is the NSF response to the questionaire circulated last year among physical anthropologists and archaeologists. It follows policy changes at the top levels of NSF, ultimately initiated by the Clinton and Bush administrations toward greater openness of publicly funded research data and protocols.

From the Physical Anthropology grant information page:

NSF is committed to the principle that research supported with public funds should be made widely available. Under NSF's data sharing policy, the Foundation expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections, and other supporting materials created or gathered in the course of the work. To implement that policy in ways appropriate to Physical Anthropology and Archaeology, beginning July 1, 2005 these Programs will require that all proposals include a one-page detailed description of the applicant's data access plan in the "Supplementary Documents" section. This page will be in addition to the standard 15-page project description. Applications lacking this statement will not be reviewed. The Programs realize that individual cases may differ widely and recognize that any absolute timeline or rigid set of rules is not possible. They also recognize that revision and adjustment may often be required as the work proceeds. The data access plan, however, will be considered an integral part of the project and therefore subject to reviewer and panel evaluation. Major departure from it will constitute a significant project change and require NSF approval. Successful applicants will be required to address this issue in every progress and final report. PIs on all awards made under these guidelines will be expected to discuss implementation of their plans in the "Results of Prior Research" section when they submit subsequent applications.

To me, this appears to be a good compromise between the different positions on data access. Some researchers would prefer to have casts, photographs, and measurements of specimens become publicly available (without restrictions) immediately after they are published. Others (including a subset of primary excavators) would prefer to limit access to photos and data until after a full monographic treatment of the specimens is published. There are good arguments on both sides.

In favor of limiting access, specimens are rare and fragile, and access to them should be carefully limited to preserve them. The skills required to prepare fossil specimens are rare, and they must be cultivated in long-term research projects. The only way that such projects can survive is if they can maximize the impact of their most important finds, and this means controlling the publication of pictures, limiting the creation and distribution of casts, and promoting students of the principal invesigators as groundbreakers making new and important discoveries. If such research projects had to make their data public immediately, there would be no incentive for them to continue their work.

But even those who are in favor of limiting access to fossil specimens must recognize that the situation in paleoanthropology today does not benefit them. There are very few publicly accessible datasets. Even pure electronic data for which analysis has been published and for which the cost of transmission is negligible, such as CT data, generally cannot be had. There are exceptions, who either provide data for sale or for free, and from the bottom of my heart I thank them for their choice to better the science. Their choice is all the more laudable, because the situation at present has created absolute disincentives to share data. At present, closing access is the only way to punish freeriders who fail to share data themselves. And commoditizing data and casts can be the only way to get valuable data out of other researchers.

And in my opinion, the issue of access to new fossil hominids has received an unwarranted share of the attention. Ann Gibbons' 2002 article, "Glasnost for hominids," is an excellent treatment, but it only scratches the periosteum of the problem. If the only problem with access to specimens was that only a few people could see something until ten years after it was unearthed, that would be bad, but still much better than the situation as it stands.

The real problem is that twenty to thirty years after many fossils are uncovered, there is no cast availability, little public data access, few financial accommodations to make such access possible. Specialists like me often find ways around these barriers. But I do not think it would be overstating the problem to suggest that perhaps half the people teaching human evolution in four-year universities have never touched a cast of a Hadar fossil. I would be delighted to be proved wrong, but I don't think I am. Our field is educating students into a world in which A. afarensis is unknown in the laboratory and poorly represented in our textbooks. I'm not talking about new specimens, here, I'm talking about fossils that were found in the mid-1970's and monographed in 1982. Nor is this problem limited to early hominids. What proportion of people teaching about the modern human origins problem do you suppose have seen a cast of any "early modern" fossil other than Skhul 5?

One may object that this kind of teaching effort really isn't the same thing as primary research, and one would be right. But I am one who thinks that teaching is essential to my research. And I see it the same way as my high school band teacher: you can't have a good high school band without a good junior high and grade school band program. We can't train competent professionals without a strong undergraduate training, and the undergraduate training of our professional paleoanthropologists is a lot more varied than the graduate programs. Unless we strengthen the broad base of the field, we have little hope of strengthening its research depth.

And the fact is that primary paleoanthropological research is no longer the province of a few dozen professionals. The field is increasingly interdisciplinary, involving hundreds of people with no expertise in anatomy at all. The fossil record is an afterthought to many of these people, and it is our task to continue to show its relevance. We can't do this without the tools.

Righting the paleoanthropology ecosystem

In this sense, the current ecosystem in paleoanthropology is dysfunctional, and the problem of data access has had a negative impact on the quality of science in the field.

New specimens are a bottleneck in paleoanthropological research. The pace of research is positively limited by the rarity of fossils. This bottleneck has several consequences, including the complete absence of research on some topics that are poorly addressed by fossils, the high citation rates of initial announcements of fossil discoveries, and a funding structure that privileges field research leading to new discoveries. Because this bottleneck is so acute, a naive observer may confuse it for the entire field.

But except for this one part, paleoanthropology as a whole is a normal part of evolutionary biology. Like other parts of biology, ours is a comparative science in which all competent work depends on thorough procedural knowledge of evolutionary theory and factual knowledge of comparative samples, such as extant apes, humans, and other primates.

Even most paleoanthropologists do not themselves recognize the breadth of their field. There is a tendency to see the field as an unstable ecosystem, in which a very small number of primary producers (who find new sites and excavate and prepare fossils) support a huge number of consumers:

The classical ecological pyramid has a broad base of primary produces, with increasingly smaller numbers of secondary and tertiary consumers. Modern paleoanthropology, however, is like an inverted ecological pyramid. Armchair commentators abound. Actual producers of fossil data are increasingly rare. But boosting the number of producers is not feasible because so few professionals have the requisite specialized skills. Even fewer are qualified to teach them. The production of primary paleoanthropological data requires physical search, discovery, extraction, dating, contextualizing, preparation, photography, molding, analysis, writeup, and publication. The process now takes years of work by large coordinated teams (White 2000:289).

Tim White is one of the premier fieldworkers in the discipline, and it is not surprising that he should display a fossil-centric view of the field. But is it really true that we have nothing of value besides the fossils; that they are the only "product" we deal in? Are the rest of us really nothing more than jackals nipping at his heels?

I would propose an alternative model of our ecosystem. Rather than privileging the mere objects that fossils actually are, I would privilege the knowledge that we gain about human origins from them. Fossils are far from the only source of this knowledge. Indeed, all the knowledge that we obtain from fossils ultimately comes from comparing those ancient fragmentary remains with the more complete comparative samples of extant species, not to mention their rich genetic, behavioral, and soft-tissue morphological record. Even modelers and mathematicians, like myself, wring data out of fossils that ultimately do not inhere in the bones themselves but in their relationship with other specimens and species.

Left to itself, this work is steady and vegetative. We produce observations, comparisons, hypotheses, and ultimately evolutionary theory. We travel, we study specimens, we present our work to public audiences and to groups of our peers for scrutiny and comment. And this open process helps us to make our knowledge better. Without a single fossil, this body of theory would be left sorely wanting for accuracy, but it would exist nonetheless and would be nonetheless be the most valuable evidence for our evolution that we have. Just as Darwin's Descent of Man preceded all but the Neandertals, our work today precedes the next hundred years of fossil discoveries and awaits testing in light of them.

Those of us familiar with this kind of work tend to call it not "armchair commentating" but instead "critical thinking." We train our students in it, and work to make them knowledge producers as well. We socialize them that the best way to succeed in the real world is to share data and to play well with others. And we hope they won't get burned in their first encounter with a real predator.

Our field has its T. rex and the like. The activity of these top predators is spastic and episodic. When they roar, presenting us with a new precious relic, much of the field cowers and prays that we don't have to relearn everything from our graduate training that the new fossil makes obsolete. These carnivores devour comparative biology, for their fossils have little relevance outside its context. Newton called it "standing on the shoulders of giants," but sometimes it seems more like Spinal Tap dwarves trodding on a tiny Stonehenge.

Most of us recognize that new fossils are more than bludgeons to beat away the jackals. They are the only tests that many of our hypotheses can ever hope to have. And I don't see anything to be gained in classing part of our science as highly important and another part as irrelevant or worse. The fact is that all of us work with each other's data and conclusions. Some of us have established barriers to make that process more difficult. All of us deal with the same bottleneck of fossil evidence, but for many of us that bottleneck is a mere inconvenience, while for others it is the crook used to lever an entire career.

It is fitting and just that the acquisition of new fossils should be a high funding priority; if not the highest. This bottleneck prevents progress, and we should do anything in our power to alleviate it. But high funding for new field research does not imply that access should not be more open.

Closed access unnecessarily impedes progress in other areas that might otherwise be made. The present situation is unstable, and I see these critical problems:

  1. The slow reporting of specimens and failure to share casts and data slows research on some important topics, limiting them to a small cadre of researchers. As an extreme example, no study of the energetics of the earliest bipeds is now possible, because many major specimens currently exist without having been reported, and none of the people working on them specialize in energetics. But more practically, only two years ago it was reported that only a single person had seen all of the then-extant evidence for Miocene hominids (Gibbons 2002). How can a field progress when so few people are in a position to review its data? If these people review each other's papers (because they are the only recognized experts), then how can any of us have confidence in their rigor?
  2. Studies published on inaccessible fossils are not replicable. Suppose that someone publishes the energetics of the earliest bipeds, using measurements from new specimens. Certainly anyone reading this research can run the same measurements through the equations, but how can they be sure that the measurements are accurate or relevant, without examining the fossils or reconstructions themselves? This is the current situation with Sahelanthropus and its CT reconstruction, for instance: the publication exists, but is not replicable because access does not exist.
  3. Students who can study inaccessible fossils can trade on this knowledge to promote themselves. Now, I don't think there's anything wrong with self-promotion; after all, jobs are scarce. But quality of access has increasingly become confused with quality of training. Ideally, a student will have both. Paleoanthropology is a comparative science, and extensive experience with comparative samples such as extant apes is needed for any competent research. To the extent that some students exploit the fossil bottleneck to leverage greater visibility, the quality of training expected of new hires is diminished.
  4. Casts are generally inaccessible. Despite the current ubiquity of CT scanning of fossils and creation of stereolith casts, even these cannot be purchased. All of the problems above would be less pressing if there were some assurance that eventually all qualified researchers would have access to casts and scans. But when an initial description, peer-reviewed by only friendly colleagues, stands for decades without reanalysis because of the lack of access, a mistake that shouldn't occupy more than five pages in a dissertation ultimately bends the course of the discipline for years.
  5. Most important, public support for our discipline depends on its perception in a country where a majority of people don't believe that humans evolved. Those arrayed against us argue that new fossils are hidden away and not studied by the scientific process of peer review. They argue that many human fossils are manufactured, and that there are no guarantees that they are not the product of a small group of scientists with an anti-creationist agenda. As long as we do not open access to the primary evidence of human evolution, these criticisms are not only damaging, as far as the nonspecialist public is concerned they are also valid. We do nothing but damage the profession when we fail to share the products of our research as freely as possible, not only with each other, but with humanity.

Will the policy work?

To the extent that new grants will make data more available, will encourage the spread of CT scans of fossils, and will help to spread photos and observations of new discoveries to the public, I think the data access policy will be helpful. I think there may be nothing to be done about the availability of casts, as long as museums control their reproduction. I respect and value the work of all museums who conserve fossil remains, but they are not set up for widespread public sale of fossil replicas. And a commercial solution will have little incentive to reproduce rare fossils that are not part of the central story of evolution. In my opinion, the most important aspect of data access is to increase the effectiveness of peer review and to guarantee replicability of research. For these goals, I think the new policy has a maximal chance of success.

Of course the real test of the value of the new policy is to see whether grants start to be declined on the basis of data access restrictions. As I read it, this new policy basically sets the clock at zero. There is no condition that specifies that previously funded work should be made public, and no effective means of pressure to create a situation favoring the sharing of old data and specimens. There are now specimens that have been out of the ground for thirty years that cannot be studied. There are hominid specimens that have been out of the ground for ten years or longer that remain undescribed. This situation will not change.

If the new policy is to be a success, then the proof of it cannot wait for ten to thirty years. It needs teeth. It needs two or three high-profile grants to be declined because of data access issues. And it needs those cases to be made public, so that everyone can have confidence in the openness of the process. This doesn't mean that the names of the applicants and their alleged sharing violations should be dragged through the press. It does mean that NSF should publish the number of grants (and their proposed funding amounts) declined for failings in the data access plan.

But more importantly, it needs replication among other granting agencies. A large set of molecular anthropologists have just shown their willingness to completely forego public funding, in order to maintain certain kinds of controls (in this case ethical ones) over their research (See Genographic Project). Will paleoanthropologists do the same? It would be helpful if some of the important private foundations, such as the National Geographic Society, the Leakey Foundation, Wenner-Gren, and others would establish data access provisions also.

Another helpful idea would be for one of these foundations to establish a data bank. Notice what is missing in the NSF policy is any discussion of a data archive. Other areas of NSF and NIH have such archives and maintain policies of mandatory deposition of data. This is most prominent for genetics, with the GenBank archive and journal publication of most results conditional on mandatory submission of data to the archive. Thus, there is no logical impediment to the creation of such a resource by a federal agency. The fact that they chose not to implement such a policy, I find significant.

Among other considerations, this choice probably depended upon discussions with museums and governmental agencies in other countries, who are the conservators and permit-granters for most fossil research. There are good reasons for the U.S. government not to compromise the activities of international museums by making public images, casts, and CT data of their fossils. On the other hand, much money and effort could be saved if such an archive were available, and it would increase the quality of published science by increasing sample sizes, consistency of measurements and estimates. It would also help conserve the fossils by protecting them from the investigators themselves. Non-governmental agencies are probably the best sources for such a centralized archive because they may have more ability to work directly with multinational sources to broker a solution. In my opinion, such an archive would be more important and would have a more positive scientific effect than five years of ordinary research funding for such an organization.

Not so long ago, Wenner-Gren was the principal international sponsor of cast production. There is no logical reason why it or some other foundation could not be again.

Final thoughts

This turned into more of an essay than I really intended, but it is a subject that I think all of us are strongly invested in. The issues at stake are what kind of science we want to have, and how do we want to limit access to its findings. I believe that our research should be as public as possible. I think that openness leads to better science, and I think that restrictions to access only make us suffer at the hands of those who wish us ill. I hope that this new policy will lead to more conversations about the future of the field. I will be most pleased if I can play some role in moving those conversations forward.

References:

Gibbons A. 2002. Glasnost for hominids: seeking access to fossils. Science 297:1464-1468.

White TD. 2000. A view on the science: physical anthropology at the millennium. Am J Phys Anthropol 113:287-292.