DataONE:Notebook/Reuse of repository data/2010/06/18

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Reuse of Repository Data
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Notes for June 18, 2010

 * See also: Data Citation Spreadsheet (File renamed to reflect other databases investigated with tabs for each database)
 * After gaining fulltext access through UNM, spent this morning manually searching for, copying and pasting the sentences for the spreadsheet linked above.
 * Start working on ORNL-DAAC searches. Data Product Citation Policy
 * Continue forming ideas for abstract.
 * Note the more successful search strings on front page
 * Based on email from Robert B. Cook, I now have additional resources to reference when looking for articles citing ORNL. It will be fascinating to see how often researchers adhere to these best practices when depositing or citing information.
 * Data Provider Information
 * Best Practices for Preparing Environmental Data Sets to Share and Archive
 * Search function
 * Another search function

Resources searched with search terms and hit count for ORNL-DAAC

 * 1) Resource: ISI Web of Science Search term(s): Cited Author=(Schroeder W) AND Cited Year=(2007) Timespan=2008-2010. Databases=SCI-EXPANDED, SSCI, A&HCI. Results: 1
 * 2) Resource: ISI Web of Science Search Term(s): Cited Author=(Olson JS) AND Cited Year=(2003) Timespan=2008-2010. Databases=SCI-EXPANDED, SSCI, A&HCI. Results: 1
 * 3) Resource: ISI Web of Science Search Term(s): Cited Author=(San Jose) AND Cited Year=(1998) Timespan=2008-2010. Databases=SCI-EXPANDED, SSCI, A&HCI. Results: 44 (none relevant to dataset topic)
 * 4) Resource: ISI Web of Science Search Term(s): Cited Author=(Scurlock) AND Cited Year=(2003) Timespan=2008-2010. Databases=SCI-EXPANDED, SSCI, A&HCI. Results:  (none relevant to dataset topic)
 * 5) Resource: Scirus Search Term(s): All of the words (ORNL, DAAC) in complete document. Only show results between 2008 and 2010. Only show results that are abstracts and articles. Results: 90
 * 6) Resource: Google Scholar Term(s): with all of the words (ONRL, DAAC) Return articles published between 2008-2010 Search only articles in the following subject areas: Biology, Life Sciences, and Environmental Science; Chemistry and Materials Science; Engineering, Computer Science, and Mathematics; Medicine, Pharmacology, and Veterinary Science; Physics, Astronomy, and Planetary Science Results: 257
 * 7) Resource: ISI Web of Science Search Term(s): Cited Author=(Mulholland, PJ) AND Cited Year=(2006)Timespan=2008-2010. Databases=SCI-EXPANDED, SSCI, A&HCI. Results 16

Observations

 * Valerie Enriquez 12:43, 18 June 2010 (EDT): Based on the spreadsheet Robert Cook had sent us, I chose the first citation listed and went to the article on PLoS ONE.


 * I found the following citation in the references for this article:
 * Schroeder W, Morisette JT, Csiszar I, Giglio L, Morton D, et al. (2007) LBA-ECO LC-23 Characterization of Vegetation Fire Dynamics for Brazil: 2001–2003. Data set. Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee Available from http://www.daac.ornl.gov, accessed September 30, 2007.


 * From here, I ran a search in ISI's Web of Science Cited Reference Search for articles that cite the author Schroeder W. While there were 13 possible articles written by Schroeder W cited elsewhere, only one was relevant, with only one citation (the Adeney JM article on PLoS One).


 * I will go through the spreadsheet list and take note of my findings as I go.


 * Valerie Enriquez 13:08, 18 June 2010 (EDT): This article found next on the spreadsheet:
 * Anaya, JA.,Chuvieco, E.,Palacios-Orueta, A.,Forest Ecology and Management (2009). Aboveground biomass assessment in Colombia: A remote sensing approach, 257(4), 1237-1246


 * Cites these datasets


 * Olson et al., 2003 J.S. Olson, J.A. Watts and L.J. Allison, LBA Regional Carbon in Live Vegetation, 0.5-Degree (Olson), Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, TN, USA (2003) http://www.daac.ornl.gov.
 * San Jose and Montes, 1998 J. San Jose and R.A. Montes, NPP Grassland: Calabozo, Venezuela, 1969–1987, Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, TN, USA (1998) Data set: http://Www.Daac.Ornl.Gov.
 * Scurlock et al., 2003 J.M. Scurlock, K.R. Johnson and J.S. Olson, NPP Grassland: NPP Estimates from Biomass Dynamics for 31 Sites, 1948–1994, Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, TN, USA (2003) Data set available on-line: http://www.daac.ornl.gov.


 * Once again, I will attempt a search for citation by data author. The first dataset can be found here. There is a note that this data is a subset of two articles by Olson:


 * Olson, J. S., J. A. Watts, and L. J. Allison. 2000. Major World Ecosystem Complexes Ranked by Carbon in Live Vegetation: A Database (Revised November 2000). NDP-017. Available on-line from Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, Oak Ridge, Tennessee, U.S.A.
 * Olson, J. S., J. A. Watts, and L. J. Allison, 1985. Major World Ecosystem Complexes Ranked by Carbon in Live Vegetation. NDP-017. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, Oak Ridge, Tennessee, U.S.A.


 * The 1985 article is also cited in Anaya's article. So, like the TreeBASE data, a search by citation by article as opposed to citation by data set may result in more hits in ISI Web of Science's Cited Reference Search. Also, it seems that there is only one resulting article that cites the Olson dataset directly. Searching for articles citing San Jose's 1998 dataset in ISI only resulted in articles not related to Venezuelan grassland, while searching articles citing Scurlock's 2003 dataset pulled the Anaya article.


 * Valerie Enriquez 13:49, 18 June 2010 (EDT): Have decided to go back to Scirus to do previously used search method using keywords (ORNL, DAAC) and doi prefixes with wildcards after realizing limitations of ISI Web of Science's Cited Reference Search (finds mostly articles, not datasets unless datasets are tied to articles). The first article doi:10.1016/j.jnc.2009.08.001 found on the result list of 90 was a solid hit, having extracted data from ORNL DAAC's MODIS subsetted land products, Collection 5. The second article on the list doi:10.1016/j.rse.2009.10.013 also extracted data, citing ORNL's MODIS project directly. It's interesting to find that a search that was almost fruitless for one repository (TreeBASE) is pulling more relevant hits for another repository (ORNL DAAC).


 * Instead of manually entering all articles today, will return to search based on notes later. In the meantime, will also try other searches.


 * Valerie Enriquez 14:33, 18 June 2010 (EDT): First attempt to find articles citing ORNL DAAC in Google Scholar. The first article found doi:10.1016/j.rse.2007.06.025 had deposited data into ORNL DAAC as opposed to extracting data. Similar to the early TreeBASE searches, it looks like there needs to be a way to edit the search so that articles that have deposited their own data are eliminated from the results. The language used by this article in context of data the researchers have stored in ORNL DAAC includes the phrase "can be downloaded."


 * However, upon closer look at this article, it seems that data has been extracted for the purpose of comparing with the data the authors have collected. Once again, I have the dilemma of needing to eliminate irrelevant search results but also the risk of eliminating articles that both deposit and reuse data from repositories. I suppose that for now, the best tools I have are my own two eyes for reading each fulltext article one at a time just to be sure.


 * The third article listed in the Google Scholar search results is the same Xiao article found using Scirus doi:10.1016/j.rse.2009.10.013. With the possibility of overlapping results found here and other repositories, this may be another reason for detailed manual viewing of each search result.


 * Valerie Enriquez 15:44, 18 June 2010 (EDT): In email regarding ORNL from Robert B. Cook, found link to this dataset Walker Branch Watershed Stream Chemistry with an abstract with the citation:
 * Mulholland, P.J. 2006. Walker Branch Watershed Stream Chemistry. Data set. Available on-line from Environmental Data for the Oak Ridge Area, Oak Ridge, Tennessee, U.S.A.
 * So, I ran a search in the ISI Web of Science Cited Reference Search, resulting in several articles cited that were not relevant to Walker Branch Stream Chemistry and one article doi:10.1899/0887-3593(2006)25[583:EOLONU]2.0.CO;2 that was relevant with the citation:
 * Mulholland PJ, Thomas SA, Valett HM, Webster JR, Beaulieu J (2006) Effects of light on NO3 − uptake in small forested streams: diurnal and day-to-day variations. J N Am Benthol Soc 25(3):583–595
 * For now, instead of manually entering the found citations I have created a .csv file of the citations found for later investigation/integration with the new spreadsheet.


 * Valerie Enriquez 17:54, 18 June 2010 (EDT): Drafting ideas for abstract: While publications and funding bodies provide repositories for research datasets, how often are these datasets used by other researchers after the initial publication of their results? If the repositories have recommendations for how their data should be cited, do other researchers follow these guidelines? This study explores how the reuse and citation of datasets vary across different repositories, disciplines and the ease or difficulty of finding cited datasets using different search tools. Sample searches were made using ISI Web of Science Cited Reference Search, Scirus and Google Scholar. These searches were constructed to find data cited from TreeBASE, Pangaea and ONRL DAAC, or studies or articles directly related to this data cited in other articles. [conclusion yet to be drawn]


 * }