DataONE:Notebook/Reuse of repository data

{| width="800"
 * style="background-color: #cdde95;" align="center"|
 * style="background-color: #cdde95;" align="center"|




 * align="center" style="background-color: #e5edc8;" |

title=Search this Project


 * colspan="2" style="background-color: #F2F2F2;" align="right"|Customize your entry pages 
 * colspan="2"|
 * colspan="2"|
 * colspan="2"|

Project Description/Abstract

 * This project will observe how data housed in scientific repositories is cited and attributed. How do these citations vary by discipline, repository, or publication? How have these practices changed over time?
 * Initial Scope 06/10/2010: The use of TreeBASE data in articles from ISI Web of Science, Scirus and Nature. Other sources to be added as found. (SysBio, Google Scholar)
 * Procedure: Using various search terms, find articles that reuse data from TreeBASE and document how they are cited: DOI or URI (or in this case, study accession number), principle investigator's name only or according to TreeBASE's recommendations (including all author names).
 * Analysis: To be determined.

Relation to Other Projects

 * Valerie Enriquez 09:46, 25 June 2010 (EDT): As per feedback from Carl Boettiger, am including information on page about how this connects to other DataONE projects this summer.
 * Relation to other intern projects: Sarah's project focuses on individual journals and their citation practices, whether the authors are citing their own data or reusing data that others have deposited, while my focus is on searching methods and the citation of data reuse. She has shared a spreadsheet format with me so I can input my data and we can compare our findings easily for analysis later.
 * Nic's project focuses more on the policies that funders, journals and repositories place on authors regarding data citation. In phase II of his project, he will compare data with Sarah and me to find any correlations that may point to any potential impact that these policies have on research and publishing.

Search Limits

 * Journal articles published in 2008 or later
 * Articles in English

Search Strategies by Repository and Resource

 * 1) TreeBASE
 * 2) ISI Web of Science Cited Reference Finder: General searches for TreeBASE in the author or work fields are not very helpful, whereas searches for specific authors of datasets is more helpful. Example search string: Cited Author=(Yoo) AND Cited Work=(BMC PLANT BIOL) AND Cited Year=(2006)Timespan=2008-2010. Databases=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH Results: 6
 * 3) Scirus: Likewise, general searches for phrases "from TreeBASE," and "study accession number" are not very helpful, while more specific phrases like "selected from TreeBASE" or "selected from TreeBASE" more helpful. Searching for cited author not as helpful either, as Scirus's search function has no citation based function as ISI Web of Science does. Examples search string: (("obtained from TreeBASE" OR "selected from TreeBASE")) -deposit* OR -submit Dates: between 2008 and 2010 (month limit not allowed) Results: 38
 * 4) Google Scholar: Once again, many false drops occurred where articles that have deposited data into TreeBASE as opposed to reusing data from TreeBASE when searching in generalized terms. Have not attempted searching by cited data author yet. However, some success found when more limits are placed (although sometimes too many limits resulted in no results found). Example search string:  treebase download "study accession" -deposit, -submit Published between two dates: 2008-2010 (month not available) Search only articles in the following subject areas: Biology, Life Sciences, and Environmental Science and Medicine, Pharmacology, and Veterinary Science All articles excluding patents. Results: 38 results
 * 5) Pangaea:
 * 6) ISI Web of Science Cited Reference Finder: More success here in using general search for "Pangaea" in cited author or work fields. However, have found that a good deal of articles found also cite the doi for Pangaea as well as the author name. So, searching by data author name and doi may prove useful. Example search string: Cited Work(Pangaea) Limits: Timespan=2008-2010 (month field not available in advanced search) Language: English Results: 23
 * 7) Scirus: Running a search for term Pangaea alone is of course practically useless, as articles about the supercontinent would be pulled in the results. What proved fairly useful was to run the doi prefix string for Pangaea datasets with a wildcard symbol (*). Example search string: (exact phrase) doi:10.1594/PANGAEA* (in the complete document) Limits: Only show results published between: 2008 and 2010 (month field not available in advanced search) Only show results that are Abstracts, Articles Results: 12
 * 8) Google Scholar: As with other searches, general keyword search for mention of Pangaea not useful. However, using the doi prefix showed limited results. Yet using the doi prefix in conjunction with a particular data author's name has provided the most relevant hits so far in Google.
 * 9) ORNL DAAC
 * 10) ISI Web of Science Cited Reference Finder: Limited amount of success searching for specific data authors, since the data housed on DAAC might not necessarily be tied to a study/article. However, this is the only and therefore, best way to attempt finding reuse of data stored here. Example search string: Resource: ISI Web of Science Search Term(s): Cited Author=(Mulholland, PJ) AND Cited Year=(2006)Timespan=2008-2010. Databases=SCI-EXPANDED, SSCI, A&HCI. Results: 16
 * 11) Scirus: Due to the unique acronym, it seems like a lot more hits than misses occur with a direct search for ORNL DAAC. Will review resulting articles in detail and further refine search. Example search string: All of the words (ORNL, DAAC) in complete document. Only show results between 2008 and 2010. Only show results that are abstracts and articles. Results: 90
 * 12) Google Scholar: More searches must be made before I can make any commentary. Most recent search still resulted in a high number of hits: with all of the words (ONRL, DAAC) Return articles published between 2008-2010 Search only articles in the following subject areas: Biology, Life Sciences, and Environmental Science; Chemistry and Materials Science; Engineering, Computer Science, and Mathematics; Medicine, Pharmacology, and Veterinary Science; Physics, Astronomy, and Planetary Science Results: 257


 * Valerie Enriquez 10:23, 22 June 2010 (EDT): As an update of this search, searching by data author name shows more relevant hits, but much fewer. Conducting a general search for mentions of ORNL DAAC yield far too many hits to go through manually in this amount of time. However, since ORNL DAAC has different project names, perhaps running a search for terms like "FLUXNET" or "Boreas" will also work.
 * Valerie Enriquez 10:01, 24 June 2010 (EDT): Further update, searching by name of specific project and author name proved to be the most helpful as opposed to searching by doi or mention of repository name within the full text of the article.
 * Valerie Enriquez 17:32, 28 June 2010 (EDT): Table of revised search strategies can be found here. (probably still needs work)

Correspondence

 * Valerie Enriquez 16:57, 30 July 2010 (EDT): Email Correspondence from Bob Cook and James Kidder at ORNL DAAC

Papers that reuse ORNL DAAC, Pangaea and TreeBASE data
ORNL DAAC Articles:   Mendeley citation generator

Pangaea Articles:   Mendeley citation generator

TreeBASE Articles:   Mendeley citation generator


 * colspan="2" style="background-color: #F2F2F2;"|
 * colspan="2" style="background-color: #F2F2F2;"|


 * }

Phase 2
Where do we go from here? Other repositories:
 * GODAE
 * ORNL DAAC archive - update: explored in this study
 * Pangaea - update: explored in this study
 * STD-DOI project