DataONE:GEO reuse study

From OpenWetWare
Revision as of 08:03, 18 June 2010 by Heather A Piwowar (talk | contribs) (GEO reuse study moved to DataONE/GEO reuse study: should be under a lab page)
Jump to: navigation, search

This DataONE OpenWetWare site contains informal notes for several research projects funded through DataONE. DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under a Cooperative Agreement.


Home        People        Research        Summer 2010        Resources       

Analysis of data reuse of NCBI's GEO dataset

Long term Aims

To understand the extent and value of data reuse for data stored in the NCBI's GEO database.

Short-term Aims

To fill in the blanks in these sentences:

We have collected some information using ??? on the GEO database, which is made possible because GEO citations are indexed in ????. We recorded all papers that cite the GEO data per year and the number of data sets in GEO for that year. We examined a subset of XXX of those citing papers to estimate the proportion of citations which (1) reused the original data in a significant way (rather than simply allude to its existence), and (2) did not include an author of the original work (because these authors would have access to the data in the absence of the archive). We also used this sample to record the nature of the reuse, for verification, meta-analysis or new questions.
For every data set in GEO, there are XXX citations to data. Moreover, GEO is rapidly growing and there is a necessary time lag between deposition and reuse (on average XXX months after deposition), from which we can estimate that the typical paper is likely to generate YYY citations over the short term. This number should increase as more time passes and citations continue to accumulate for each paper, and it is an underestimate because not all citations to the data use standard references that can be tracked by ????. Of these citations, XXX% of them are estimated to results in novel scientific work that could not have been performed with the archive, for a total of XXX new pieces of work for each archived data set.