john hawks weblog

paleoanthropology, genetics and evolution

data access

  • Public interests in data from federally funded research

    Thu, 2012-01-12 20:20 -- John Hawks

    I submitted the following essay in response to the Request for Information on Public Access to Digital Data Resulting from Federally Funded Research from the National Science and Technology Council's Interagency Working Group on Digital Data.

    This RFI is not the same as the current bill before Congress ("Open access op/ed in NY Times"), which would restrict public access to research articles based on federally funded research. Research articles are a very important issue, but I hope that the access to digital data will not be overshadowed by the attention to published results. As a paleoanthropologist, I believe that access to digital data from federally funded research projects is a fundamentally important issue, as I remark below.

    Introduction

    The United States provides grant funding to scientists through many federal programs. This funding advances work of public interest that might not happen without federal assistance.

    The creation of scientific knowledge may serve the public interest directly by enabling useful inventions or supplying actionable information on issues of public importance. A funded project may also serve the public interest indirectly, by (1) finding negative results that prevent wasted effort or public harm; (2) building the scientific infrastructure that enables future discoveries and advances; (3) training new and established scientists in effective research techniques; (4) enhancing international cooperation and public/private partnerships.

    Congress and the Executive Branch have recognized that access to the published results of scientific research is not sufficient to advance the direct and indirect public interests served by federally funded projects. Facilitating the indirect benefits of research is a major aim of federal agencies' "Broader Impacts" and data access rules. These policies have been a qualified success since their implementation, limited mainly by the exceptions carved out by programs and agencies to avoid requiring certain kinds of data to be reported along with research reports.

    I argue that open public access to digital data should be a requirement for all federally funded scientific research. Digital data can be maintained by federal agencies as a part of the reporting requirement of federal grant funding. Doing so will advance the interest of the public and ensure that today's science generates a continuing heritage of research excellence.

    Data access and transparency

    Transparency is essential to public trust. Scientific conclusions are formed by observation and replication, and for this process to be transparent, all data must be available for independent inspection. The possibility of such inspection should not be limited to qualified researchers, because the very existence of special access requirements blocks transparency of the scientific process.

    Changing technology has shifted the public's expectations about transparency. Digital technology enables most research data to be shared rapidly and at low cost. If data are produced in digital form, and digital data can be shared at low cost, researchers and agencies cannot credibly claim that the difficulty of reproducing and disseminating data is a sufficient reason to restrict access. Where no competing interest argues for restricted access (such as human subjects protections), a lack of access to digital data itself can now be a compelling reason for public distrust.

    Therefore, federally funded researchers should release digital data to the public by default. Federal agencies should facilitate this public reporting by requiring digital data to be supplied as part of final project reporting.

    Data access has a well-established record of success

    The recent history of human genetics demonstrates that open access to data has unforeseen benefits that can spawn innovation, support more effective education, and catalyze new discovery. In genetics, both federal and journal policies require release of data; raw data from federally funded projects are often available as they are generated, long before publication.

    My own laboratory has no federal research funding to date, but is actively engaged in research using data from federally funded projects. Today my laboratory trains undergraduate students in genetics with new data from ongoing federally funded genetic projects such as the 1000 Genomes Project. We use open access data from archaic human genomes to investigate the variation of ancient people and their relationships to living humans. This kind of work would be impractical without clearly established open data access policy.

    The open access to data from the Human Genome Project facilitated the rapid development of microarrays that are now used on a broad scale in human genetics to investigate the genetic correlates of human health and disease. Access to data from these studies has enabled other scientists to independently replicate many genetic associations. More important, meta-analysis of such data has shown that many associations cannot be replicated, while also showing some cases in which nonsignificant results across different samples give rise to a significant finding when pooling those samples. Access to negative results and raw data is necessary, in other words, to establish the facts in subsequent research. This goes beyond access to published research results and requires open access to unpublished digital data.

    Intellectual property protections and data access

    Research data are somewhat distinct from the intellectual property issues relating to research publications. Some kinds of data do not meet the standard of originality necessary for copyright protection, such as sequence data, CT or MRI data, or data from measurement instruments. For raw data from instruments, there is no intellectual property reason why federal agency should not maintain an open archive for the public.

    Much research data is unquestionably subject to copyright protection, such as lab notebooks, written descriptions, photographs, and original reconstructions. Yet there is still a substantial public and scientific interest in inspecting such data. For example, photographic documentation of archaeological sites and specimens are of particular scientific value and are today routinely produced by digital technologies and stored in digital form. Some primary digital records are unique products that cannot be recreated at another time and place: for example, in situ photographs of specimens, photographs and records of sites before excavation, and digital reconstructions. The scientific record would be incomplete without such contributions, and maintaining an archive of such data over the long term is a difficult task for a single investigator, beyond the scope of a grant term.

    In cases where it is impracticable to obtain Creative Commons or other open licenses to such content, a funding agency should at a minimum require that a copy of all such archival information be deposited along with the final project report and a limited-use non-commercial license permitting electronic dissemination of these materials to the public as part of the report.

    Metadata and data access

    Many have noted that raw data may be useless in the absence of additional information about how the data were obtained. Such information is known as "metadata". Researchers generate instrumental data using particular instrument settings and recording standards. They gather observational data under particular research protocols. These standards are may change quickly as instrumentation, technology, and scientific results themselves demand new practices.

    Some scientists note the problem of incompatible metadata, using it as an argument against to delay the establishment of open public access to data. In their view, the public are likely to misunderstand or misuse scientific data where metadata are not clearly indicated. Meta-analyses combining data from multiple research projects are an important secondary use of digital data, and such meta-analyses are impossible when data cannot be reconciled into common observational or instrumental frameworks. Performing original work with data collected in heterogeneous contexts is a research speciality of its own, and is itself sometimes targeted by federal grants.

    However, meta-analysis is only one purpose of data access. Transparency, replicability, and education are central public interests that do not require the reconciliation of data collection methods from multiple studies. They require only clear description of the methods under which data were obtained. At a minimum, final research reports on federally funded projects must describe the standards of data collection with sufficient detail to allow independent replication, including all unpublished results and data.

    Successes of data access in paleoanthropology

    I am an anthropologist, and am most familiar with the scientific data relating to human evolution. These data include genetic observations on living and skeletal samples of humans. They also include fossil and archaeological evidence such as photographs, CT scans, isotopic records, anatomical measurements and descriptions.

    For many years, nearly all genetic data resulting from federally funded research have been made available for public download. Much genetic data generated by non-federally funded research programs, including foreign and domestic institutes, has also been free for public download. These data have resulted in a massive acceleration of research on recent human evolution and human origins. They have also led to unexpected discoveries and a burgeoning contribution of other disciplines to understanding our evolution.

    Data from radiocarbon dating and other isotopic sampling has also been made available to the public. Human occupation sites are among the best sources of evidence about past climates. The investment of federal resources in human evolution research has generated a temporal record that is now essential to studying changes in the faunal and plant compositions of past environments. Free access to records has enabled stronger calibration of radiocarbon dates, the development of a more secure chronology, and a more highly replicable scientific record correlating different regions of the world. Our understanding of such events changes is vastly stronger when data are made public.

    Institutions and data access in paleoanthropology

    By contrast, CT scans and photographs pertaining to human origins are typically not made freely accessible to the public. The United States funding agencies are not the only parties with an interest in such data. In particular, museums and institutes that curate specimens often permit data collection under agreements that restrict the dissemination of the resulting data. Such agreements may be equated to "non-disclosure agreements" with respect to scientific data.

    An institution has a legitimate interest in controlling the public use of images and access to curated materials. Nevertheless, the lack of access to digital data results in reduplication of effort, overapplication of destructive sampling and measurement techniques, and unnecessary handling of precious and fragile specimens. Where it is practical, the United States should facilitate agreements with institutions that allow the release of digital data produced by public funding. Where release is not possible, funding should be granted only for those activities that will result in the release of data under a limited-use non-commercial license. Non-disclosure of data from instruments such as CT scanners, electron microscopes, or mass spectrometers is incompatible with scientific replication.

    Scientific careers and data access in paleoanthropology

    The economy of federal funding for scientific production sometimes leads to perverse incentives for high-ranking researchers that prevent public access to research data. Some scientists believe that their own future research will require exclusive access to data. Others want to impede research achievements by their academic rivals, or to maintain prestige and future funding opportunities.

    Scientific data in some areas may constitute "trade secrets" until they are protected by patents. Even in noncommercial research, federally funded scientists sometimes claim exclusive ownership over data that they plan to use in future research. In my own field of paleoanthropology, data secrecy supports a clandestine "quid pro quo" economy among researchers, in which established researchers and institutions allow furtive looks at unpublished data, to support and consolidate their power and influence.

    This is a game that the United States should simply decline to play. When federal research supports scientific results that are not subject to independent replication, it betrays the public interest in science.

    Established collaborations and centers of scientific research will always exert a strong influence upon the future of science, irrespective of federal data access policies. But established players should not use federal funding to construct barriers to open inquiry.

    Conclusion

    Open public access to data is one indication that a research project is following scientific principles. Making digital data available to the public would be good practice for any researcher, irrespective of funding source. Data access mitigates the risk that negative data will be unreported. Data access facilitates broader stewardship of research projects, in particular where collaborations create data that are distributed across many institutions. Data access and reporting standards enable other researchers to fill in for those who cannot complete scientific project due to health or other personal reasons.

    Federal grant agencies already have successful repositories for many kinds of digital data. Such data are shared with the public at minimal cost relative to the overall budget for federal research grants. Supporting digital data repositories has itself been an important granting aim for several federal agencies and continues to be an active part of scientific infrastructure. Limiting such repositories for the exclusive use of a small cadre of researchers is enormously wasteful of resources, when they can be opened to an interested public for a small incremental cost.

    The public has repeatedly invented surprising uses for digital data that can complement or enhance the scientific record. But much more important, open access to digital data serves the scientific values of transparency and independent replication, essential to maintaining public trust and investment in the research enterprise.

    Synopsis: 
    My response to a federal Request for Information on the topic of digital data access to federally funded research
  • Ecologists against public access to peer reviewed publications

    Fri, 2012-01-06 14:59 -- John Hawks

    This seems incredible, from Jonathan Eisen: "YHGTBFKM: Ecological Society of America letter regarding #OpenAccess is disturbing".

    Wow -- I am really disturbed by the letter the Ecological Society of America (ESA) has written to the White House OSTP in regard to Open Access publishing.

    ...

    So - the justification here for not making ecological articles available is that they are MORE important over time? So the taxpayers pays for research that is valuable and because it is valuable over time we should make it less freely available? Seriously?

    This next week is an important one for proponents of open access publication and data access, as the White House Office for Science and Technology Policy has requested public comments related to both these issues for federally funded research. I will be posting my letter about data access when I complete it this weekend. I encourage everyone to pay attention and submit a letter if possible. It is dismaying to see professional scientific societies take public stands against making their members' research available.

  • Will monographs arise from the dead, or eat our brains?

    Sat, 2011-10-01 21:26 -- John Hawks

    Inside Higher Ed reviews and interviews an author who argues that the scholarly monograph shackles academics to an obsolete model of communication:

    So it is strategic that Kathleen Fitzpatrick, director of scholarly communication at the Modern Language Association and a professor of media studies at Pomona College, invokes the living dead early to illustrate her argument in Planned Obsolescence: Publishing, Technology, and the Future of the Academy (NYU Press). The scholarly press book, she writes, “is no longer a viable mode of communication … [yet] it is, in many fields, still required in order to get tenure. If anything, the scholarly monograph isn’t dead; it is undead."

    I agree with this thesis in part. Sixty-dollar monographs are going the way of the thylacine. Locking scholarly content in the tall stacks of university libraries doesn't disseminate it. Peer review no longer improves work to the extent that it's worth locking it up in response. It is ridiculous for anyone to judge the quality of a young scholar's work by the imprint of a "prestigious" academic press. Tenure committees have simply delegated their responsibilities to editors, and the editors do a poor job.

    But I disagree that the scholarly monograph is dead. Personally, I expect monographs to undergo a renaissance as more academics adopt e-publishing. Academic presses affiliated with universities should be going all-digital, and should start massively promoting their back catalogs as e-books at fire-sale prices. The smart ones will take the opportunity to change their agenda, competing to publish new books by a new generation of scholars who are building a broad readership both inside and outside academia. There's no reason why we need to constrain our scholarship to books so boring that nobody wants to read them. Tomorrow's scholars should be engaging with a much broader public than university presses have historically cultivated.

    The stumbling block is that these books still must serve as a guide to the academic quality of young scholars' work. On this count, Fitzpatrick provides some useful ideas about how to build quality scholarship under a more collaborative model:

    The way to make this work, Fitzpatrick says, is to change the currency of scholarly communications from paper to credit. Instead of rewarding faculty for getting a lot of paper published, universities should consider how helpful tenure candidates have been in parsing other people’s articles written and helping others refine their ideas, she says. Journals could help out with this by creating “trust metrics” that cede more weight to academics who consistently give constructive feedback. They could also encourage frequent, thoughtful reviews by making them prerequisites for publishing one’s own work — thus attracting the sort of critical mass of reviewers that Fitzpatrick argues is necessary for successful peer-to-peer review (and which some previous high-profile experiments with the model failed to get).

    Under such a system, faculty members could glide to tenure on the wings of their reputations as positive contributors to the advancement of knowledge in their field — a metric the current “publish-or-perish” model does not adequately represent, Fitzpatrick says. “Little in graduate school or on the tenure track inculcates helpfulness,” she writes, “and in fact much militates against it.”

    Obviously I think this model would be better than our current one. Still, I worry about the actual assignment of credit. Quite frankly, all my writing here has done wonders for my influence, but has had a substantial drawback: Many of my ideas are used by other scholars without credit or citation. We compete for research support, and in that competition I get no credit or acknowledgement whatsoever for any contributions I make. That's a cost I've been willing to pay for what I do, but if we expect more young academics to share their ideas broadly, we're going to need to change the culture of research funding to recognize their contributions appropriately.

    My favorite part of the interview is the last question, which asked Fitzpatrick to give advice about new models of publication to a junior faculty member, librarian, and university provost, respectively.

    Finally, to the provost: understand that scholarly communication is a core responsibility of the university – so fundamental to the university mission, in fact, that it must be thought of as part of the institution’s infrastructure, not as a revenue center. And every university must develop some kind of plan for scholarly communication. If you leave disseminating the work of your faculty exclusively to corporate publishers, corporations will profit from it at your institution’s expense. Instead, invest in the structures that will get your faculty’s work into broader circulation – not least because those structures will help you make clear to the concerned public why the university continues to matter today.

    I'm going to append to this post the first link to my entry in the Anthropologies project: "What's wrong with anthropology?" where I discuss my own perspective on these problems. Needless to say, I think things need to change. I expect the change in scholarly communication to be highly specific to each academic field, as what works for cultural anthropology will not be the same as what works for genetics or English. But new approaches will be digital, and that means a university may find much more ability to support multiple approaches than is possible with print. The tools to support varied forms are already available, if universities would support and extend them, they could capture much of the need for academic communication.

    Synopsis: 
    Making academic writing relevant means abandoning the monograph, says a specialist.
  • The great world CT-scanning tour

    Fri, 2011-09-16 22:24 -- John Hawks

    The international version of Der Spiegel is running an English-language profile of the traveling CT-scan project from Jean-Jacques Hublin and the Max-Planck Institute for Evolutionary Anthropology: "German Scientists Bring Fossils into the Computer Age"

    To show just what the future holds for his field, Hublin crossed the back courtyard of the anatomy institute in Tel Aviv. There, next to the dumpsters, stands a 20-foot (6-meter) container that the Israeli technicians like to smoke behind. The box's exterior gives no hint that it holds a laboratory on prehistoric man unlike any other one in the world.

    This is a topic that should be followed closely by anyone interested in paleoanthropology's future. The article seems to imply that the data are being made freely available, but of course they are not. I am confident that, in the future, all data like these will be openly available, as they are now made routinely available in other fields of science. But for the time being, our field is one of the exceptions - and the closed nature of the data is a serious impediment given the great challenges we face educating the public about human evolution.

    The Spiegel article sets up the politics as a confrontation between Hublin and museum curators:

    Until now, Hublin says, it was usual to handle fossils from the dawn of mankind "like relics or national treasures." Under these circumstances, curators assumed the role of keepers of the Grail.

    In this way, curators were holding on the reins of scientific power. After all, it is vital for researchers to have access to the fossils. "Whoever is denied (this access) will never get anywhere," Hublin says.

    A New Era for Research

    Indeed, Hublin believes having a virtual fossil archive could herald the end of this system. He sees his work as boosting accessibility to the objects and says curators "are afraid of losing control."

    In my experience, the article's frame is overly simplistic. Scans aren't open unless the people who have them make them open. Believe me, if there were a lot of open scans out there, I'd be posting visualizations here on the weblog. Obviously people use funding and position to compete for prestige and control, and their strategies depend on the resources under their charge. When curators or institutions give permission to scan, it becomes a contractual matter. A foreign researcher coming to scan may demand a period of exclusivity, an institution might demand some meaningful local involvement in the research. The ultimate disposition of the data may be of little importance to either party relative to their more immediate needs. I am familiar with cases where scan data were never returned to the institution, despite promises of access, and other cases where institutions have refused to allow scanning because they objected to a long exclusivity period for the scanning team.

    Fossil remains of our ancestors and relatives are national treasures — indeed, even more broadly, they are pieces of world heritage. We have the technology today to bring those extraordinary objects to everyone in the world. So I think its a great shame that the politics of science continues to obscure our fossil record.

    Synopsis: 
    Der Spiegel profiles the Max-Planck CT-scanning trek to Israel, raising the politics of data access.
  • Floating on the data

    Mon, 2011-08-22 12:19 -- John Hawks

    Technology Review reports on a recent conference trying to spread data mining techniques. The point of departure is the growth of electronic sensor networks in industry and online social media information: "The New Big Data".

    People have been working with graphs of data for hundreds of years, but the graphs now being plotted from social networks or sensor networks are of an unprecedented scale, Apte says. "These are gigantic graphs," he says. "You're talking about millions of nodes and tens of millions of links."

    Dealing with graphs of that size and scope, and applying modern analytic tools to them, calls for better algorithms and other innovations.

    I'm dealing here with genetic data networks, which are becoming rapidly denser and we're beginning to apply these kinds of network methods to understand them. Once you begin to pass beyond the analysis of a single locus, and spread the data across the whole genome, it becomes necessary to go beyond a single tree, to understand the relationships (and commonalities) among genealogical networks that connect people with each other. In some ways, this shares more with epidemiological modeling than with traditional genetics.

  • Goodall record digitization

    Mon, 2011-03-28 22:05 -- John Hawks

    Jason Goldman covers the acquisition of Gombe chimpanzee records from the Jane Goodall Institute by Duke University ("Digitizing Jane Goodall's legacy at Duke").

    Now, researchers at Duke University are taking more than twenty file-cabinets full with fifty years of check-sheets, longhand narratives in both English and Swahili, hand-drawn maps, videos, and photos, and carefully digitizing everything. This will allow researchers to construct searchable life-histories of the chimpanzees of Gombe, for the first time. The word "archives" is a bit misleading, though. The new Jane Goodall Institute Research Center at Duke is continuing to receive new data from Gombe, which will all become digitized and included in the collection as well.

    The move toward digitizing and making primate field records available has been a major challenge for primatology. Different research teams have legacies of partially incompatible records, which complicates the process of comparing data from different sites and different species. My UW-Madison colleague Karen Strier together with many of the leading figures in primate field research have been involved for several years in an effort to bring life history records from different primate species together. One of the first tangible results of the collaboration is a paper that appeared earlier this month in Science by Anne Bronikowski and colleagues [1].

    Seems to me that this kind of archiving is absolutely essential to our ability to study primate behavior in the future. Not least, data archives will be necessary to document the effect of range contractions and habitat fragmentation on primate behavior. Openness is difficult to negotiate in these contexts, because of the long-term effort put into data collection. But in thirty years, these archives will not be useful unless they are extended and put into accord with formats that are widely used. Goldman describes the idiosyncrasies of Goodall's data, and many other field projects have similar traditions that differ from each other. Without building a larger community capable of understanding these records, the data may be as useful as WordStar files from 1981.


    References

  • Genomes unzipped, unzipped

    Mon, 2010-10-11 14:49 -- John Hawks

    Genomes Unzipped, has finally unzipped:

    From today, we’ll be making all of our raw genetic data and the reports generated from these tests freely available online. As the project proceeds, we aim to obtain data from an ever larger array of tests – ultimately extending to whole-genome sequencing – and release it openly. Right now you can freely download the 23andMe data from everyone in the project from this website.

    It's a great project, putting personal stories and reactions together with a scientific view on genotype data. It's also the perfect topic for a blog -- just the right amount of navel-gazing. It's worth doing just to make you figure out how to use the browser software.

    What I wonder is, how much will personal genomics be like nude beaches? I mean, it's been a long time since the first nude beaches, but most people don't take advantage of the opportunity. Clearly, there's variation in different countries! But most people neither feel compelled to see others' data nor feel comfortable sharing their own.

    Well, they used the word unzipped, not me!

  • NSF to require data access plan

    Thu, 2010-05-06 12:14 -- John Hawks

    Science Insider reports that the National Science Foundation is going to make a "data management plan" a requirement of every grant application.

    NSF's current policy requires grantees to share their data within a reasonable length of time so long as the cost is modest. "That's nice, but it doesn't have much teeth," said Seidel. Under the new policy, which is expected to be unveiled this fall, a researcher would submit a data management plan as a two-page supplement to any regular grant proposal. That would make it an element of the merit review process.

    NSF wants to avoid a one-size-fits-all approach to the issue, Seidel explained, because each discipline has its own culture about data-sharing. "A scientist might say that my plan is that I don't need one, because I don't save my data," he told the board committee, which has just formed a task force on data policy. "The important thing is that it puts people on notice that they have to think about it, maybe for the first time."

    It sounds to me like it still doesn't "have much teeth." The kind of scientist he describes, who "doesn't need a plan", doesn't need any federal money, either.

    I mean, seriously -- they're going to "put people on notice that they have to think about it"? Give me a break.

  • Online toolkits -- the good and the frustrating

    Sun, 2010-02-21 14:51 -- John Hawks

    In pursuit of my DIY genomics posts, I've been playing around with the Galaxy bioinformatics web tools. The team responsible for the South African genomes published the data to Galaxy, and their uploads are easy to get -- either to download, or to work with the online Galaxy platform.

    Working with a resource like this helps to illustrate both how tremendously useful bioinformatics tools can be, but also how frustrating it can be to figure them out. Some things are a breeze, although others are completely obscure. Documentation for the uploads is skimpy so far -- one thing that drove me up the wall is that SNPs are listed by genome, but without indicating genotypes -- is the individual a homozygote or a heterozygote? The paper by Schuster and colleagues describes their genotype calling procedure, but the results turn out not to be posted along with their other data. I'm sure they'll become available as the data are updated, but I did waste some time figuring out how the releases correspond to descriptions in the online supplementary material from the paper.

    Despite occasional frustrations, we seem to be heading in the direction of all-in-one online bioinformatics toolkits. Galaxy, for example, lists several advantages on a promo page. A couple of entries:

    Now your results are reproducible! | When publishing results, replace “the data were analyzed using a collection of in-house scripts” with a URL pointing to Galaxy’s history. Your reviewers will have no further questions. That’s reproducible genomics!

    ...

    No tools for new datatypes | Some datatypes generated by high throughput genomics are so new that there are no tools to analyze them. For example, how do you extract sequences of coding exons from the latest 28-way alignments of vertebrate genomes or analyze quality scores from 454/Solexa/SOLiD? With Galaxy.

    I live at the mathematical end of this stuff. I work with models of populations and assume that sequences are known, you know, as if we looked at them and read off the ACGT's. But in reality, a lot of complexity lies between models and the biochemistry. Going from sequencing reads to genomes, and aligned genomes, involves a lot of analysis. Many of the details differ entirely between different sequencing platforms. As we continue to move toward whole-genome analyses of populations and other species, it's really important to have an abstraction that allows for different underlying sequencing models, while allowing replication of the population genetics modeling.

    The disadvantage of a single widely used tool is that it can limit creativity and lock people into a certain way of processing data. Locked-in assumptions sometimes lead to wrong conclusions -- as we've seen in human genetics many times over the years. But the advantage is that it allows everybody access to the same methods and data, so that results can be replicated and augmented with new observations.

  • NAS president calls for data sharing

    Sat, 2010-02-06 22:58 -- John Hawks

    Science has a one-page editorial by National Academy of Science President Ralph Cicerone. He alludes to the climate change scandals of the last few months, and points to a significant loss of public confidence in science as a result:

    In the wake of the [University of East Anglia] controversy, I have been contacted by many U.S. and world leaders in science, business, and government. Their assessments and those from various editorials, added to results from scattered public opinion polls, suggest that public opinion has moved toward the view that scientists often try to suppress alternative hypotheses and ideas and that scientists will withhold data and try to manipulate some aspects of peer review to prevent dissent. This view reflects the fragile nature of trust between science and society, demonstrating that the perceived misbehavior of even a few scientists can diminish the credibility of science as a whole.

    Cicerone argues that scientists need to shape up. The only way to maintain confidence in the scientific enterprise is to establish "clarity and transparency":

    Clarity and transparency must be reinforced to build and maintain trust—internal and external—in science. Scientists are taught to describe experiments, data, and calculations fully so that other scientists can replicate the research. Last year, the Committee on Science, Engineering, and Public Policy (COSEPUP) of the National Academy of Sciences (NAS), National Academy of Engineering, and Institute of Medicine put forth a framework for dealing with research data,* emphasizing that "Research data, methods and other information integral to publicly reported results should be publicly accessible." Some journals have established policies that require the sharing of materials and data. However, post-publication complaints regarding data sharing persist. Despite many efforts, the scientific community has failed to uniformly integrate these standards into their practices.

    Access to data may not be enough. In the case of climate research, open access to models and software is equally important -- otherwise, results are not replicable. This means greater support must be given from grant agencies for public accessibility and publication of research methods, including software archives.

    It also means that data sharing policies must have some teeth in them. At a minimum, funding renewal should be contingent on meeting the guidelines for data sharing proposed in grant applications. In 2010, there is no reason in the world why these cannot be downloaded freely from third parties, so that the scientists do not feel "harassed" by requests for information.

    References:

    Cicerone RJ. 2010. Ensuring integrity in science. Science 327:624. doi:10.1126/science.1187612

Pages

Subscribe to data access

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.