DataONE:Notebook/Reuse of repository data/2010/06/21

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Reuse of Repository Data
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Notes for June 21, 2010

 * See also: Data Citation Spreadsheet (File renamed to reflect other databases investigated with tabs for each database)
 * After gaining fulltext access through UNM, spent this morning manually searching for, copying and pasting the sentences for the spreadsheet linked above.
 * Continue working on ORNL-DAAC searches. Data Product Citation Policy
 * Conduct Pangaea searches in Google Scholar as per notes on main page.
 * Continue writing and refining abstract started on 06/18/2010
 * After meeting earlier today, will start integrating my data with Sarah's spreadsheet format.

Resources searched with search terms and hit count for ORNL-DAAC

 * 1) Resource: Google Scholar Search term(s): Regional Carbon Live Vegetation "Olson et al 2003" Search date range: 2008-2010 Search only in: Biology, Life Sciences, and Environmental Science; Chemistry and Materials Science; Engineering, Computer Science, and Mathematics; Medicine, Pharmacology, and Veterinary Science; Physics, Astronomy, and Planetary Science Results: 4
 * 2) Resource: Google Scholar Search term(s): downloaded, selected "ORNL DAAC" 2008-2010 Search only in: Biology, Life Sciences, and Environmental Science; Chemistry and Materials Science; Engineering, Computer Science, and Mathematics; Medicine, Pharmacology, and Veterinary Science; Physics, Astronomy, and Planetary  Results: 7

Resources searched with search terms and hit count for Pangaea

 * 1) Resource: Google Scholar Search term(s): "doi:10.1594/PANGAEA%" Search date range: 2008-2010 Search only in: Biology, Life Sciences, and Environmental Science; Chemistry and Materials Science; Engineering, Computer Science, and Mathematics; Medicine, Pharmacology, and Veterinary Science; Physics, Astronomy, and Planetary Science Results: 1,180
 * 2) Resource: Google Scholar Search term(s): "doi:10.1594/PANGAEA%" Stein, R et al. 2004 Search date range: 2008-2010 Search only in: Biology, Life Sciences, and Environmental Science; Chemistry and Materials Science; Engineering, Computer Science, and Mathematics; Medicine, Pharmacology, and Veterinary Science; Physics, Astronomy, and Planetary Science Results: 6

Observations

 * Valerie Enriquez 14:16, 21 June 2010 (EDT): I more or less constructed search #1 for ORNL-DAAC based on an article I had found yesterday, searching by author name and subject matter/keywords gleaned from the title.


 * The first article found in the first Google Scholar search for ORNL-DAAC today was the same Anaya article found in yesterday's entry. The second result had cited an article by Olson et al. regarding "a major gene influencing hair length and heat tolerance in Bos taurus cattle." Clearly, this is an irrelevant hit. The third result was a PhD thesis and the fourth had this citation:


 * Olson, R. J., Shalapyonok, A., and Sosik, H. M.: An automated submersible ﬂow cytometer for analyzing pico- and nanophytoplankton, Deep-Sea Res., 50, 301–315, 2003.

Once again, this is another case of me either not narrowing down search terms or only one article found citing that particular set of repository data.


 * Valerie Enriquez 15:08, 21 June 2010 (EDT): Deciding for a broader scope with a more generalized search for data cited from ORNL DAAC, I constructed the second search based on prior success with TreeBASE searches with keywords related to downloading data. The first article found was an article that only mentioned ORNL DAAC as developers of the WebGIS using OGC webservices providing geospatial data for FLUXNET sites. However, there were more relevant article hits (although there were two hits that were unpublished articles hosted at a research repository).


 * Valerie Enriquez 16:24, 21 June 2010 (EDT): Attempting to use Pangaea's doi prefix as a search operator did not prove useful as there were too many results that were either directly from Pangaea or articles that had deposited their data in Pangaea. Most likely, finding articles by cited author will find more relevant hits.


 * Valerie Enriquez 16:30, 21 June 2010 (EDT): Running a more narrow search based on an author name from a data citation found in my June 17, 2010 entry, this pulled fewer, more relevant results with overlap from the June 17, 2010 search in ISI Web of Science Cited Reference Search.

Abstract Drafting

 * Valerie Enriquez 17:40, 21 June 2010 (EDT): Why is there a need for this sort of study? Due to how much and how fast online data repositories are growing and how many journals track citations of articles, an initial study is necessary. Also, while there have been studies of citation practices of articles, there is not much out there regarding the citation of data, much less the citation of reused data from repositories.


 * Selection process: Searches of random publications found in ISI Web of Science Cited Reference Index, Scirus and Google Scholar. Citations sought for datasets from the following three repositories: 1. TreeBASE 2. Pangaea 3. ORNL DAAC. Sample sizes will be relatively small, as most searches revolve around finding articles citing particular data authors.


 * Measurement: Counting citations falling under the following categories: 1. Direct mention of repository name (TreeBASE, Pangaea, or ORNL DAAC) 2. Citation using unique identifier like a study accession number (TreeBASE), doi (Pangaea and ORNL DAAC) 3. Author name only 4. As per the full recommendations of the repository (varied, will consult with Nic's findings).


 * Interpretation: After data is aggregated, will make interpretations based on statistical significance and create appropriate tables, charts or graphs as needed.


 * }