This DataONE OpenWetWare site contains informal notes for several research projects funded through DataONE. DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under a Cooperative Agreement.
|Reuse of Repository Data
||Main project page|
Previous entry Next entry
Notes for June 29, 2010
- Go through previous entries, spreadsheets and data
- Write initial outline summarizing findings so far based on above review
- Create powerpoint presentation based on outline
Outline for Meeting
I. Motivations and initial questions
- Data deposit vs. data reuse
- How difficult is it to find data citations and why?
- How do the citations vary across discipline, repository and publication?
- What is the most common citation? Repository name? Data author name? Unique identifier like a study number or DOI?
- Initial search process: Test searches for TreeBASE resulting sample articles study accession numbers and data author names to search for later.
- Focused search
- ORNL DAAC
- ISI Web of Science Cited Reference Search
- Google Scholar
- Date range: 2008-2010
- Language: English
- Journal articles only
- Repository-specific search terms
- TreeBASE: repository name, study accession number, data author name
- Pangaea: repository name, DOI prefix, data author name
- ORNL DAAC: repository name, DOI prefix, data author name, project name (BOREAS, FLUXNET, etc.)
- 1. Search comparison spreadsheet hosted here
- Search methods, terms and datasets used to construct search terms were captured as well as the total number of results followed by respective hits and misses.
- Percentages of hits vs. misses calculated within the spreadsheet.
- Reasons for miss captured
- Reasons for hit captured
- 2. Shared fields template from Sarah with my input data hosted here
- Hosts data about individual articles, including DOIs as applicable, metadata and coding for hits and misses.
- Browse through observations made within OpenWetware journal entries
- Look through Search Comparisons spreadsheet for percentage of hits versus misses as well as the types of hits.
- Finding focus and the difficulty of going beyond the obvious
- Mention of repository could mean either data was deposited there or downloaded from there.
- TreeBASE study accession numbers cited in article may have changed over time (from StudyID to LegacyID after study publication).
- “Pangaea” can refer to either Pangaea.de data repository or the Pangaea supercontinent. How do I exclude these results?
- Sometimes narrowing search terms with boolean operators or “-” exclusion only resulted in no results at all while broadening back out resulted in too many results to read through manually.
- Google scholar does not make the distinction between published journal articles and dissertations deposited into academic repositories.
- "Missing” searches (use Search Methodology Table as visual aid in slideshow)
- For the sake of thoroughness, I intended to go through each possible search combination.
- Not all searches worked and I did not record them in my notebook. However, it is important to record these “failures” for future reference.
- Also, using the above-linked table helped show me that I missed some possible combinations.
- “Like trying to find someone on Facebook only knowing their hair color and favorite breakfast cereal.”
IV. Findings by Repository
- Most effective: Searching for author name and citations of the original article in which the dataset was used.
- Least effective: Searching by mention of repository name (also did not allow for search for study accession number).
- Most effective: Search for mention of TreeBASE with controlled vocabulary.
- Least effective: Author name or study accession number.
- Google Scholar
- Most effective: This should more accurately be called "least ineffective."
- Least effective: Even with controlled vocabulary, searching by mention of TreeBASE not helpful. Neither was searching for study accession numbers or data author names.
- Most effective: Search by individual author name and mention of Pangaea in Cited Author or Cited Work fields.
- Least effective: Some DOIs turned up in search results, but could not actually search using DOI in fields.
- Most effective: Searching by DOI prefix with "*" wildcard.
- Least effective: Searching by author name.
- Google Scholar
- Most effective: Possibly searching by DOI prefix with controlled vocabulary (not the same controlled vocabulary as used with TreeBASE, however).
- Least effective: Everything else.
3. ORNL DAAC
- Most effective: Search by data author name/original publication.
- Least effective: Once again, DOI cannot be used in in search fields.
- Most effective: Search by DOI prefix with "*" wildcard and search by author name with ORNL project name (FLUXNET, BOREAS, etc.). Search for mentions of ORNL DAAC also yielded solid hits.
- Least effective: Surprisingly, none.
- Google Scholar
- Most effective: Search for DOI prefix.
- Least effective: Search for mention of repository name (ORNL DAAC) even with controlled vocabulary.
VI. Future Plans
- Other repositories, search terms and databases
- Compare data with other interns