Brett Smith for redOrbit.com – Your Universe Online
While scientific study results from decades ago are only a mouse click away, 80 percent of study data sets are lost within two decades after publication – according to a new study from a team of British and Canadian researchers.
Published on Thursday in the journal Current Biology, the study was based on an attempt to collect original research data from over 510 randomly chosen ecology studies published between 1991 and 2011. The study team discovered every dataset was available two years after publication. However, the odds of tracking down data sets fell by 17 percent per year after that.
“The current system of leaving data with authors means that almost all of it is lost over time, unavailable for validation of the original results or to use for entirely new purposes,” said study author Timothy Vines, an evolutionary ecologist at the University of British Columbia. “I don’t think anybody expects to easily obtain data from a 50-year-old paper, but to find that almost all the datasets are gone at 20 years was a bit of a surprise.
“Most of the time, researchers said ‘it’s probably in this or that location’, such as their parents’ attic, or on a zip drive for which they haven’t seen the hardware in 15 years,” Vines told Nature News. “In theory, the data still exist, but the time and effort required by the researcher to get them to you is prohibitive.”
The study researchers found they had difficulty simply tracking down study authors, which they were only able to do 37 percent of the time. The odds of finding a working e-mail address, even after a wide-ranging online search, dropped by 7 percent per year, researchers said. Also, only about half of the authors with working addresses responded to the requests for data, regardless of publication date.
Matthew Woollard, director of the UK Data Archive in Colchester who was not among the study authors, noted the analysis considered neither the extent of the individual data sets, nor whether the information was being held by institutions.
“In the late 1990s or even early 2000s, much larger data sets would be more unlikely to end up in personal collections and so, possibly, have a higher chance of being kept institutionally,” he said.
To solve the problem of disappearing data, Vines suggested publications request data sets and hold them as a condition of publication.
“It’s a very easy thing for journals to do, and I think it would dramatically improve the quality and quantity of data that are archived,” he said. “Losing data is a waste of research funds and it limits how we can do science. Concerted action is needed to ensure it is saved for future research.”
However, that may not be such a popular suggestion as a survey presented in September at the International Congress on Peer Review and Biomedical Publication in Chicago found medical researchers might be becoming more reluctant to share their data.