DataONE:Notebook/Reuse of repository data/2010/06/11
|Reuse of Repository Data||Main project page|
Previous entry Next entry
Notes for June 11, 2010
Resources searched with search terms and hit count
Valerie's Conversation with Heather in Google Docs around 2:00 pm June 11, 2010
Note: Heather was both Anonymous User 166 and Anonymous User 183
Anonymous user 166: Hi Valerie! I'm looking at your google spreadsheet. Nice start!
Anonymous user 166: I have a few suggestions. When would be a good time?
me: I'm good now. I just finished adding the new page you told me about.
Anonymous user 166: cool. great.
me: http://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data oh wait, am I talking to Hannah or someone else?
Anonymous user 166: perfecto. You are talking to Heather. I think maybe sometimes you have given me the nicname Hannah by mistake :) I like the name Hannah, as it turns out LOL
me: whoops, sorry
Anonymous user 166: That's ok, no problem at all.
me: I think it's because you're listed under my former TA Hannah on my gchat
Anonymous user 166: ah hah! That'd do it. that notebook page looks great.
me: thanks, I more or less took the summary from the questions pageAnonymous user 166: Add
This DataONE OpenWetWare site contains informal notes for several research projects funded through DataONE. DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under a Cooperative Agreement.
to the top to help navigation, and add a link back to the Research Questions page, and then keep flushign out content I haven't used the calendar parts of Notebooks yet, but I think it may be handy
me: I've been copy/pasting my thoughts from emails into the dated entries
Anonymous user 166: Do you have questions for me before I suggest a few additional columns? Valerie, perfect! Yes I think putting the correspondence there makes sense. We're clearly figuring it out still, so let us all know if that organization ends up working for you
me: ok I like being able to see the dates and timestamps on things
Anonymous user 166: yeah, agreed it looks like you are getting a bit of traction on finding things
me: should I log everything?
Anonymous user 166: is the overall project making sense at this point? do you have questions now that you've had a chnance to think about the aims/context/etc a bit?
me: I've only been logging things that looked like data reuse, then making a note of it if they're not after running a search. I just want to make sure I'm not doing things wrong :)
Anonymous user 166: good question, I don't know if you should log everything. I think log at the level of detail that you feel is right for you... err on being open, if you are otherwise on the fence
me: the thing is, sometimes the search results end up in the hundreds with maybe only about 20 somewhat relevant ok
Anonymous user 166: oh, I see what you mean by everything. ok, well here is one idea
me: or should I just mention the number of search results for each source
Anonymous user 166: start a new wiki page for today (or click on today in your Notebook calendar, and add a new "search results" section under a "correspondence" section or something) And then keep a running commentary
me: that makes much more sense
Anonymous user 166: so say "first I ran this search" and then paste in the url
me: because otherwise the spreadsheet would end up having too much information
Anonymous user 166: and then say "it returned 23423 hits" and then maybe paste in the return-url if there is one (or what makes sense)
me: ok, that does make a lot more sense I really like OWW so far
Anonymous user 166: and then you could comment and say "but I did a quick look at them and they were almost all data CREATION links, so I then refined the search to this" etc oh good I'm glad
me: ok, I figured I should keep detailed information about my search methodology anyway
Anonymous user 166: yes.
me: just in case someone has better ideas
Anonymous user 166: yes :) and also, to keep track of what doesn't work
me: ok, that's definitely important too
Anonymous user 166: one of the founders/initial instigators of the open notebook science concept, Jean-Claude Bradley, feels that that is one of the main benefits of ONS the fact that you record what doesn't work
me: sweet. I kept a leadership journal like that once in college.
Anonymous user 166: great. then you are ahead of me. I've never been very systematic about it... but I think there are huge benefits in it so I'm hoping to start
me: I'm not sure if anyone referenced it, but I left it in the organization's library for future reference.
Anonymous user 166: wow, cool!
me: well, admittedly, a lot of the journal was written after the fact, but in some cases, hindsight is 20/20. it'll be interesting to keep a journal through the whole process though.
Anonymous user 166: yeah, agreed. ok, so for your search results I imagine you will play with the queries a bit, recording your decisions and tweaks, then get to something that looks good enough to dig into in more detail (you may already be there) I think there is some benefit in recording some detail about the links that aren't reuse...but it doesn't have to be at the same level of detail.
me: or the fact that sometimes even the original data creators don't cite their data properly (like the link that goes to harvard for some reason)
Anonymous user 166: right!
me: and about what you said earlier, it might not be that helpful for me to go through the articles citing the original research article
Anonymous user 166: I think it is worthwhile to keep some level of detail about how people talk about data creation links to the extent that you find them when you are looking for data reuse links
me: because if the other researchers got the information from the article and not TreeBASE, it's kind of a moot point. ok
Anonymous user 166: because that information would help you refine your query. said "we deposited data in", 5 times etc "we have uploaded data into", 10 times
me: ok, that makes sense
Anonymous user 166: not exactly that, modify it so that it is easy to update
me: I was going to try boolean searching next with NOT and "uploaded" or NOT "deposited"
Anonymous user 166: that may be too much detail... but capture some detail about what you are capturing that you don't want to help inform the NOTs. yes, exactly.
Anonymous user 166: now I'd also keep track of the fact that you are doing that. obviously but also keep track of it in another section for example, on your main new project page you might want to start a section called "limitations" and add to it "if I end up using a query that has NOT uploaded in it, this will mean that I will not find papers that created AND reused data"
me: ah, this is true
Anonymous user 166: this woudl be fine, but it woudl be worth remembering that it is a limitation of the approach that affects generalizability I don't think it effects it too much, and may definitely be worth it in an 80/20 study like this one.(personally I think it will be a big help and you should add the NOTs)
me: ok. I'll also make a note of searches that for whatever reason don't allow boolean searching
Anonymous user 166: just brainstorming ways we can use OWW to keep track of potential thoughts and implications of our research as we are doing them yup, great
me: so in other words, all else fails: make an entry in OWW
Anonymous user 166: so you could have a "lessons learned" section (ro somethign) that says "the Nature website is really hard to query for purposes like this because it doesn't support ABC" or whatever yup :)
me: ok, I can definitely do that
Anonymous user 166: great. I personally thank you, because it will help me later :)
me: no problem
Anonymous user 166: for what it is worth, I expect you will start data gathering about three times in the next three weeks, as you learn more.
me: I figure showing one's work is part of the whole scientific process anyway. ok
Anonymous user 166: so expect to have to start over again and don't consider it failing, it is all part of the learning. especially if documented ;) yeah.
me: ok, because at first I was really worried if I didn't get any solid results
Anonymous user 166: so here are a few thoughts on additional columns for your spreadsheet. no, don't be worried about that.
me: ok, I remember Sarah mentioning something about how the large blocks of text would be hard to import or count in a database
Anonymous user 166: first of all, I think it will take you a week (?) of trying before you start to get a feel of what works and what doesn't this isn't an easy query task, so it will take a bit of exploring. both in terms of full-text, and in terms of looking up DOI prefixes in reference lists, etc. hrm, I wonder what Sarah meant by that, I'll have to go reread.
me: ok, I'm not sure if she sent everyone the message
Anonymous user 166: ok. I do think that small blocks of text could be really helpful though. so for example, it woudl be helpful if you added another column
me: this is what she said: "Generally speaking, I think its good that we collect the original text (i.e. in your policy sheet which has large chunks of text), but I think each of those fields should be accompanied by a coded categorical or quantitative field, otherwise they aren't useful for statistics. There are different pros and cons to coding the data during or post data collection...we should discuss this on Monday.
Anonymous user 166: into which you paste the actual text of the sentence that makes the citation reference so we can see what words are used
me: ah, that is a good idea
Anonymous user 166: yes, agreed! So do copy the text, but we also need to break out the important parts.
Anonymous user 183: so I think the fact that you have a separate breakout column for the url or the citation itself is useful, in addition to the full sentence mistake
me: wait, where should that go?
Anonymous user 183: trying undo! Hmm, not quite sure what happened there.
me: it's ok, I need to go back through and look at most of these again anyway
Anonymous user 183: Maybe you changed the F19 cell? Or did I? I can't see the revision history, maybe because I'm Anonymous for some reason....
me: yeah, I'm not sure why everyone's showing up anonymous
Anonymous user 183: hmmm. whoops.
me: oh, I think it's because I'm sharing with everyone so they're not logging who's in the document at any given time
Anonymous user 183: well, I'll just let it go. gotcha. I think I just lost our chat history though. Not sure if it is helpful to save... if so, could you save it please?
me: ok, I'll try to copy/paste it, although it'll probably say anonymous for us
Anonymous user 183: anyway, a few more quick thoughts.... did I explain well enough what I mean by the reuse-sentence column? that's ok
me: the sentence within the body of the article that cites TreeBASE?
Anonymous user 183: yes, that's right.
me: like The matrices in figure 1 (study accession number S### in TreeBASE) or something
Anonymous user 183: yes, exactly
me: ok, I can do that, especially since I can actually get fulltext Nature articles when I'm on a Simmons computer
Anonymous user 183: or "We used the results from three Treebase studies [34-37]" great
me: most of what I've found so far has been "We deposited our data in TreeBASE" (but I'm working on a way to exclude those results)
Anonymous user 183: it woudl also be helpful to have a column that contains a link to the articles themselves
me: like the DOI?
Anonymous user 183: though I know that link depends on a user's proxy settings so it may not be reasonable
me: a lot of the articles have DOIs
Anonymous user 183: yeah, a doi might be a better idea. add a doi column
me: at the very least, links to the abstracts, right? ok
Anonymous user 183: right two cols for doi and link to abstract would be great
me: ok. ok, just to be clear, who am I talking to now? Heather was "Anonymous user 166" earlier
Anonymous user 183: yup, I think that would be a really good start. Yes, sorry about that! Heather both ttimes
me: oh ok, for some reason I thought there were multiple people talking Anonymous user 183: one more thing
Anonymous user 183: if it turns out you are getting too many hits, you can decide to limit the journals or issues you are looking in the number of hits in Nature might not be too great, but if you start searching in Scirus or something you might want to limit it to 2009 publications, perhaps something like that
me: ok, that makes sense I should probably explicitly state a date limit and language limit (works in English) right?
Anonymous user 183: yes
me: (unfortunately the only languages I can read in are dead) ok, I'll make a note of that
Anonymous user 183: one more idea is to add the journal Sys Bio to your short list http://sysbio.oxfordjournals.org/ they do lots of data sharing, and may also do lots of data reuse
me: oh neat
Anonymous user 183: I think that Sarah will be looking at it thoroughly for a few years, but you could do a repository-depth sort of serach. ??? anyway, it is a thought
me: I can look at that as well
Anonymous user 183: Though as I said it, I think the idea of doing a look across many journals, limited to 2009 for example, is a better fit since Sarah is focussing on specific journals
me: many journals limited to 2009 in English
Anonymous user 183: you can limit Google Scholar etc that way. yup, that would be my current thought.
me: yeah, there's tons of stuff in Google Scholar
Anonymous user 183: (ok, one different idea to explore.... use 2008 instead. The reason is that there may be better full text coverage in Pubmed Central for 2008" since the NIH requirements woudl have start hitting. ????. I'm not sure in this domain though since it isn't NIH.... so probably not as relevant. Your call, 2008 or 2009.
me: ok, some articles I found were in Pubmed and Biomed so I think I might start with 2008 and if that's too many I'll go to 2009
Anonymous user 183: ok! great. Send me more questions if/when you have them, maybe though your OWW pages... I'll go make sure I have them "watched"
me: all right. I'll make sure to do that either OWW or email or gchat
Anonymous user 183: great! looking forward to seeing those citation sentences... I'm curious to know how much they vary!
me: thanks again for all your help on this (since I'm rather new)
Anonymous user 183: oh, one more thing. You probably saw that I added a few comments to the "research questions" talk page
me: oh, yes
Anonymous user 183: if you could streamline the project research questions and plan there a bit more, so that it would be a good summary for people coming to check out the projects, that woudl be great. feel free to gut what is there and make it your own
me: ok, I wasn't sure how to answer all of the questions so I'll answer what I can and then get rid of the rest
Anonymous user 183: that's ok. You don't have to answer all the questions. you could just say "Pilot data collection is underway to help answer these questions" and then list the subset of the questions that you want to answer soon and if you decide that some of the questions aren't relevant, that's ok. Mostly I just wanted to give you a flavour of what you could be considering. yup, great
Anonymous user 183: cool! Talk more sometime soon then. bye!