DataONE:Notebook/Reuse of repository data/2010/06/14

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Reuse of Repository Data
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Notes for June 14, 2010

 * See also: TreeBASE Citation Spreadsheet
 * For citations of resources found, please refer to this CiteULike page.

Resources searched with search terms and hit count

 * 1) Resource: Systematic Biology Search term(s):  data matrix TreeBASE (all words anywhere in article) Limit: Published between two dates: January 2008-May 2010 Results: 56
 * 2) Resource: Systematic Biology Search term(s):  obtained OR selected "from TreeBASE" (all words anywhere in article) Limit: Published between two dates: January 2008-May 2010 Results: 6 (including Ripplinger and Lakner articles)
 * 3) Resource: Scirus Search term(s): Exact Phrase: "(obtained OR selected) AND from TreeBASE" ANDNOT All of the word:(deposit* OR submit*) Dates: between 2008 and 2010 (month limit not allowed) Results: 22
 * 4) Reource: Scirus Search term(s):(("obtained from TreeBASE" OR "selected from TreeBASE")) -deposit* OR -submit Dates: between 2008 and 2010 (month limit not allowed) Results: 38
 * 5) Resource: Scirus Search term(s): Exact Phrase:((obtained OR selected) AND from TreeBASE) ANDNOT ((deposit* OR submit*)) Dates: between 2008 and 2010 (month limit not allowed) Results: 73

Observations

 * Valerie Enriquez 08:58, 14 June 2010 (EDT): First result for first search term: Jennifer Ripplinger and Jack Sullivan. "Does Choice in Model Selection Affect Maximum Likelihood Analysis?" Syst. Biol., February 2008; 57: 76 - 85. doi:10.1080/10635150801898920 In text, phrase "data sets obtained from TreeBASE" found.
 * Valerie Enriquez 09:24, 14 June 2010 (EDT): Fourth article result in first search term: Clemens Lakner, Paul van der Mark, John P. Huelsenbeck, Bret Larget, and Fredrik Ronquist "Efficiency of Markov Chain Monte Carlo Tree Proposals in Bayesian Phylogenetics" Syst Biol, February 2008; 57: 86 - 103. doi:10.1080/10635150801886156 In text, phrase "data sets selected from TreeBASE" found.
 * Valerie Enriquez 09:38, 14 June 2010 (EDT): Most of the 56 articles found in first search today included phrasing such as: "provided as supplementary data in TreeBASE," "data set and analysis [...] available in TreeBASE," "submitted to TreeBASE," or "supermatrix deposited in TreeBASE."
 * Valerie Enriquez 10:34, 14 June 2010 (EDT): Second search resulted in 6 articles found, with two overlapping from the first search. Also, free fulltext not available for two articles. The remaining articles still included terms like "deposited" and "available" for data placed in TreeBASE as opposed to being taken out. While altering the search in Syst. Biol. did not yield valid new hits, it may help finding articles missed in Scirus, Web of Science or Nature.
 * Valerie Enriquez 16:24, 14 June 2010 (EDT): Third search resulted in articles that still included the word "deposited" as well as articles about depositing articles into TreeBase (such as doi:10.1186/1471-2105-8-158). Further refined search to what is now listed as resource search number 4.
 * Valerie Enriquez 17:03, 14 June 2010 (EDT): Even after refinement, there are limitations to this combined keyword search. The phrase "can be obtained from TreeBASE" in some articles can still refer to data generated while writing the article deposited into TreeBASE, as opposed to data obtained and reused from TreeBASE. Further limitations to this search method are the potential articles missed because they had reused data from TreeBASE, but had also generated new data and deposited into TreeBASE. Also, all except five of the 38 items found were either dissertations or other pieces not published in journals.
 * Valerie Enriquez 17:39, 14 June 2010 (EDT): Broadened third search again by removing quotation marks in case I missed anything.
 * Valerie Enriquez 17:39, 14 June 2010 (EDT): I am noticing a lot of articles that also mention GenBank along with TreeBANK. Sometimes the information is obtained from GenBank and used to create a supermatrix deposited by the authors into TreeBANK, such is the case in this article doi: 10.1093/sysbio/syp060.
 * Valerie Enriquez 17:39, 14 June 2010 (EDT): Also, after re-broadening search, term "available in TreeBANK" usually found in context of deposited/uploaded as opposed to reused/downloaded data.
 * Valerie Enriquez 18:03, 14 June 2010 (EDT): Another limit to this search is that "obtained" exists in the passive form "can be obtained" and is found in many articles, resulting in false drops.

Conversation with Heather Piwowar around 9:30 a.m.

 * Note: Anonymous user 1748 is Heather

Anonymous user 1748: Hi Valerie, is that you? This is Heather

me: hello

yes

Anonymous user 1748: Good morning :)

me: good morning

Anonymous user 1748: I was just looking at your spreadsheet.

Looks like you are finding a lot of cases of people depositing data into TreeBASE, eh?

oh, I see....

me: yes

Anonymous user 1748: are the first few rows from when you were still extracting data that was being shared into treebase?

me: I'm trying to narrow it down

yes

Anonymous user 1748: I see the last few rows are reuse = Great

me: I'm improving the search

(and making notes in the OWW lab notebook)

Anonymous user 1748: are you thinking you'll keep adding rows for the shared cases?

me: I hope I'm more successful in the search.

shared cases?

Anonymous user 1748: hmmmm. let's see if I can explain it better

me: when there's both reused data AND deposited data?

Anonymous user 1748: nope. I mean I can see by your notes that one of your searches returned 56 results, is that right?

me: yes

Anonymous user 1748: are you putting all 56 results in the spreadsheet?

and then some turn out to be reuse of treebase and some turn out to be people depositing data into treebase?

me: I have not been (I am making a note that many of the articles found have the phrasing "data deposited in")

Anonymous user 1748: right.

great.

me: should I be adding them anyway?

or is the note sufficient

Anonymous user 1748: nope, I think what you are doing sounds right on.

me: ok good

Anonymous user 1748: adding them to the SS would be way too time consuming for too little benefit, that's what you are thinking too?

me: because I was afraid doing that would detract from time spent finding actual hits

Anonymous user 1748: yes

I was just a bit confused because I saw

row 19, 20, etc

that say available/deposited in/etc

me: oh, I think those were from Friday

Anonymous user 1748: gotcha!

me: only the last three or so are from today

(maybe two)

Anonymous user 1748: ah hah!

yes, and those last few look great :)

cool.

me: I'm going to go back to Web of Science and Nature and Scirus with the improved search later.

I just figured it was probably better if I left the original searches in the spreadsheet

Anonymous user 1748: one idea then might be to add a column that simply encodes whether the mention of TreeBASE is in the context of putting data in or out

me: at least for now, for my own reference

ah, good idea

Anonymous user 1748: hmmm maybe two columns "data into treebase"

"data out of treebase"

and then both could be true or false

and mostly your new columns will only be "data out of treebase"

me: ok, would that go at the end of the rows or near the beginning?

er, columns

Anonymous user 1748: but it will help describe what is "old data" from when you were doing it a different way

your call where the columns go :)

me: ok

Anonymous user 1748: Your idea of keeping track of the refs via citulike is handy

me: it might get confusing because I had exported a full list of articles prior to looking through the text first

adding the DOIs is really useful too

Anonymous user 1748: hmmmm I see. well, yeah, though I could see how adding them to citeulike any other way would be harder. well, your call how to you want to handle that.

good

me: I'll try to figure something out.

Anonymous user 1748: yeah, though don't spend a lot of time on it because it is only a nice-to-have.

me: maybe I'll import a list after I've completed the searches using DOIs

ok

Anonymous user 1748: your last few rows you've added capture lots of the meat... they are great

me: thanks

we're meeting today at noon, right?

Anonymous user 1748: one thought I had is that you might want to specify the end date in your searches a little bit more definitely

otherwise everytime you search for something ending in 2010 you'll get different results :)

me: ok, I know Sys. Bio automatically set it to May 2010 ah

Anonymous user 1748: so maybe end in 2009, or on June 1 2010.

ok, yup, that would work

me: ok

I know some searches only let you search down to the month.

Anonymous user 1748: just wanted to make sure it was defined and it wasn't immediately obvious from a quick glance at the notes.

yup!

good

me: ok

Anonymous user 1748: any questions or anything before our meeting, or are you good?

me: when I re-do the Web of Science and other searches, I'll make sure to make that definitions

I think I'm a lot clearer than I was before.

thanks for helping me sort that out

Anonymous user 1748: no problem! ok, talk more at noon your time, ping me in the meantime if you get stuck on anything.....

me: ok, neat.

thanks again!

Anonymous user 1748: (one more small comment in your new columns... I would explictly add "0" for false, to distinguish it from blank = don't know or haven't figured it out yet or wasn't clear....)

bye :) me: ok, thanks! Anonymous user 1748: no problem! ok, talk more at noon your time, ping me in the meantime if you get stuck on anything.....

me: ok, neat.

thanks again!

Anonymous user 1748: (one more small comment in your new columns... I would explictly add "0" for false, to distinguish it from blank = don't know or haven't figured it out yet or wasn't clear....)

bye :) me: ok, thanks!


 * }