DataONE:GEO reuse study/Phase 1
From OpenWetWare
Research Plan
- Query PubMed Central for GEO accession number patterns
- Only look at one year of PMC because deposit rate (and possibly spectrum) not constant over time
Open Questions
- Also look at Highwire Press, Google Scholar, other full text sources?
- More difficult because can't process queries automatically
- Look for accession number patterns for datasets and data series?
Limitations
Important for argument
This is a conservative estimate because:
- Many papers not in PMC (source for percentages?)
- Many data citations not attributed using accession numbers (source for percentages?)
Less important for argument
- Doesn't capture reuse outside the peer-reviewed literature (for example, reuse during training)
- Deposits into PMC not stable over time, distribution may change over time