DataONE:Notebook/Summer 2010/2010/07/13

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Project name
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Group chat agenda
Knoxville wrap-up
 * OWW page for Knoxville... agenda + slides + links to notebook summaries?
 * slides on plone
 * reimbursement
 * other

Other
 * blurbs on spreadsheet README
 * ASIS&T poster?
 * Other deadlines, thoughts?

Nic
 * Google Fusion recap
 * Data validation status
 * R and stats status

Valerie
 * spreadsheet walkthrough
 * article thoughts
 * DAAC followups?

Sarah
 * data collection status
 * R and stats status

Group Meeting Transcript July 13, 2010
12:59 PM Heather: You've been invited to this chat room! 1:00 PM Sarah has joined Heather: Hi all! me: hello Sarah: good morning! Heather: I just posted a last-minute agenda here: http://www.openwetware.org/wiki/DataONE:Notebook/Summer_2010/2010/07/13 nicholas.m.weber: Hi Heather: Additions? 1:01 PM nicholas.m.weber: looks good me: not really, those points look like they address my questions 1:02 PM Heather: ok... let's start. Knoxville. Realized that although some people posted notes about the meeting, we don't have a face-to-face page on OWW would probably be good, post agenda, link to discussions, slides, etc. volunteer to make one? 1:03 PM me: I have some notes and could make a page about the meeting on OWW that others could add to Heather: thanks valerie, perfect. maybe link from the notebook date, too? have you guys put your presentations up on plone yet? me: you mean make a notebook with date entries with each of the notes? no, I don't believe so 1:04 PM (the main DataONE site?) Heather: I haven't, I need to do that. Can each of you do that too if you haven;t? yup, the main DataONE site... we are to put all of our "final" products there, and I think our presentations coudn as interm final products :) nicholas.m.weber: ok me: ok, cool. I can do that 1:05 PM Sarah: i have my notes on my oww calendar...you mean just link it to the main agenda? Heather: Valerie, sorry, I meant go to http://www.openwetware.org/wiki/DataONE:Notebook/Summer_2010/2010 and click on the first date of the face-to-face meeting and add a pointer from there Sarah: i'm putting my ppt up now me: oh, ok Heather: Sarah, yup. And/or link from the f2f page that Valerie will make to your notes pages 1:06 PM My goal: A "summer 2010" page that summarizes, briefly, our face 2 face meeeting so links from there to sarah's notes, the agenda, our slides, etc. does that make sense? kind of like our "correspondance" chat transcripts, but flushed out with a few more links..... 1:07 PM still confusing? not confusing? bueller?

1:08 PM Sarah: i'm clear nicholas.m.weber: I think it's clear 1:09 PM Heather: Valerie, you good? if not, ping me offline.... Valerie, you good? if not, ping me offline.... me: I'm good Heather: any other questions about knoxville wrap-up. reimbersement or anything? (Not that I know much about that, but you can ask....) nicholas.m.weber: not here 1:10 PM Heather: ok. you probably saw the attempts to provide more context to our OWW site disclaimers, clarifications, etc. me: yes Heather: any suggestions? 1:11 PM if so, now or later, let us know. or, for that matter, just make the changes yourself! it is a wiki after all :) nicholas.m.weber: I thought it was comprehensive... with respect to the "readme" disclaimer... should we be doing that with each ss? me: should we link to both our project pages and the main Summer 2010 page? Heather: Valerie, from your SS you mean? 1:12 PM Nic, I think yes. nicholas.m.weber: ok Heather: Valerie, from your spreadsheet README blurbs you can just link to one OWW page, I think, whichever one you would want to get to first if you were the person following the link.... me: yes 1:13 PM ok Sarah: i just added them on the first sheet of each ss Heather: great, thanks Sarah. ok, any other thoughts on that stuff? if so, chime in. 1:14 PM if not, just want to check in on the ASIS&T deadline and any other deadlines coming up.... Nic, you thinking a poster? Anyone else thinking a poster? or not? nicholas.m.weber: http://www.asis.org/asist2010/cfp-postersdemosvideoss.html 1:15 PM Sarah: i don't think i would be able to make it out to the meetings, and i don't know if it is the best place for my research anyways Heather: sarah, that works, agreed. me: I don't think I can make the meetings either, and I'm not sure if my data is best presented as a poster/video/demonstration. 1:16 PM Heather: ok, good valerie. nicholas.m.weber: there are six tracks for this years conf http://www.asis.org/asist2010/schedule-track.html Heather: Nic, what do you think? were you planning to go? nicholas.m.weber: I think mine fits well into the Track 6 – Information in Context: Economic, Social, and Policy Perspectives and Im already planning on going Heather: yup... sounds like your stuff, eh? 1:17 PM nicholas.m.weber: so I'm hoping to get a draft done late tonight Heather: ok, great. Circulate in email tomorrow so that mentors have a chance to weigh in? nicholas.m.weber: sure Heather: and make sure the funding sentences are good and whatever else we need to make sure we get right for a "formal" DataONE release. I mean... the mentors can then make sure.... 1:18 PM nicholas.m.weber: I was going to search for a presentation to see what others had done and then try to model that Heather: great. yeah, don't stress over it. nicholas.m.weber: there might be something on the plone site? ok Heather: just wanted to reiterate that having mentors see it before submission is a good idea, so you need to give them two or three days ideally 1:19 PM not sure, probably feel free to bounce it off of me whenever, if you want faster feedback. otherwise, I look forward to seeing it when you send it out. 1:20 PM nicholas.m.weber: great Heather: want to give us a quick summary of what you think of Google Fusion? do you recommend it? if so, for what? nicholas.m.weber: sure. its really nice for making annotations sure. its really nice for making annotations Heather: if not, ??? nicholas.m.weber: but its very hard to make edits to individual cells 1:21 PM I think it would be nice to use in a situation where a group is trying to hash out the fields they need to gather Sarah: I like it better than regular docs, but am having some bugs me: ah, I noticed it only liked uploading one sheet at a time (as opposed to whole workbooks) nicholas.m.weber: it's nice that way me: (unless I'm doing it wrong) Heather: I noticed that too Valerie Sarah: agreed, i like the commenting feature but it might not be as useful to us at this point Heather: hrm... what does that mean about our README idea, btw? nicholas.m.weber: maybe I'm missing something, but I couldnt figure out a way to merge me discussions with new sheets Sarah: yes, but you can put the readme with the description 1:22 PM Heather: true nicholas.m.weber: good idea sarah Sarah: also, can you save figures (visualizations) that you like for others to see? Heather: sarah, what kind of bugs? Sarah: oh, just controlling the data Heather: nic, in what ways is it difficult to make edits? nicholas.m.weber: I like that you can set up alerts for comments 1:23 PM well when I was trying to change cells in a column it didn;t allow me to use keyboard nav Heather: I don't know, Sarah, about saving vizes arg that is a pain 1:24 PM nicholas.m.weber: so I was clicking between each cell and then moving the curser with the mouse... small complaint, but if you're editing a big set it can get time consuming Heather: yeah for sure so what do you guys think? are you goign to keep experimenting? 1:25 PM or decide to skip it at this point if it doesn't solve any major problems you were having? or ? (btw feel free to type while I'm typing and interrrupt me, I don't mind a bit.....) 1:26 PM nicholas.m.weber: I think I'll use it to share my sheets but not create them (meaning once I get them edited I'll begin uploading there instead of googledocs) it would be a lot easier for a mentor to give feedback that way Heather: yup. and you want to share them that way is because then other people can easily comment by using the comments? yup. and you want to share them that way is because then other people can easily comment by using the comments? gotcha 1:27 PM so here is a different idea.... I think Google Spreadsheets supports RSS feeds. 1:28 PM You could have a "comments" column or two where you explicitly ask people to give comments, then monitor via RSS Downside: the comments aren't cell-specific nicholas.m.weber: that could work Heather: Upside: you could set it up such that people could actually edit the cells, which I think is a better approach for gathering input from the commumity Sarah: does fusion have rss ? Heather: less approval based 1:29 PM hrm, probably? nicholas.m.weber: i know in the discussion you can check an "alert me" not rss for the entire sheet though I don't think 1:30 PM Heather: I guess I'm thinking that if it is a pain to enter data there, that is a pretty big knock against it unless there are strong advantages to using it. (even just entering data as a modification activity, after doing most of the work creating the spreadsheet) 1:31 PM but I don't have strong opinions, just wanted to suggest alternatives the ability to merge tables via Fusion does look pretty cool.... shrug, I dunno. nicholas.m.weber: If I can't get comfortable with it by the end of the day I'll probably take the "comments" approach back to google spreadsheets 1:32 PM Heather: ok. and Sarah or Valerie if you keep using it that is fine too... I think collecting info about what works and what doesn't for what usecases is valuable 1:33 PM me: ok, sure Heather: ok. Nic, how are your tables going? You are commenting now? 1:34 PM Let us know when you are at the point where you want people to dive in and help curate the ambigous data? nicholas.m.weber: good, yesterday and today I spent time defining what columns I had tried to collect and then figuring out what I was missing Heather: ok nicholas.m.weber: one sec Im trying to get the links for my tables Heather: I think that in parallel with this it will help to be thinking about stats 1:35 PM the reason I say that is that thinking about stats is often a fast and real way to figure out what data you really need, in what format, etc so I'd say don't try to get your spreadsheets perfect and then think about stats nicholas.m.weber: ok Heather: because it never works that way :) 1:36 PM nicholas.m.weber: so in thinking about stats... I started to play with the R commands that you gave me... but I think I need to spend more time validating so I know what is valuable to look for 1:37 PM Heather: ok, so you've got R installed and running and you can see plots etc? nicholas.m.weber: i could perform most of the commands 1:38 PM Heather: great. ok, then I suggest that maybe we have a dedicated chat to work through the next phase of R things nicholas.m.weber: I'm not real familiar with it but Im anxious to play around Heather: it might be lengthy, so don't necessarily want to do it here now nicholas.m.weber: ok Heather: valerie and sarah you are welcome to participate, but not sure how valuable? your call me: I haven't really poked around much with R yet. 1:39 PM (sorry) Heather: That's ok Valerie, I think maybe hold off on R for now because you have lots of cool article prep to do To the extent we do R things, we can do them customized to your data later me: ok, cool Heather: Sarah, I don't think we are goign to do anything you don't tknow 1:40 PM Nic, when you want to have an R talk? Sarah: yeah, i'm good in terms of r nicholas.m.weber: Maybe tomorrow ? Heather: ok. maybe 10am Pacific? nicholas.m.weber: sure 1:41 PM Heather: great see you on google chat then. nicholas.m.weber: ok Heather: anything else you want us to go over here now? nicholas.m.weber: I don't think so Heather: ok. 1:42 PM Maybe we just skip to Sarah quickly. Sarah, how's data collection going? Stats? Anything you want to cover here now? (I want to make sure we spend lots of time on Valerie's spreadsheets ;) ) 1:43 PM Sarah: sorry, i was on another window trying to get my ppt on the plone (which isn't working, i get an error...has anyone else tried?) anyways, 1:44 PM i'm finishing up data collection and anticipate being done by tomorrow Heather: haven't tried. Maybe as a PDF? great! Sarah: i'm shooting for at least 15 articles per journal per year....is that adequate for stats? that's for the 2000/2010 comparison that's for the 2000/2010 comparison and not all of them have a reuse 1:45 PM Heather: Hrm, it is low, 25 is probably better, but who knows. stats is a bit of an art when you don't know the magnitude of the effect you are expecting. Sarah: my problem right now is that many of the journals barely have 50 articles per year, so should i would be sampling a greater proportion from those journals 1:46 PM Heather: I don't think that is a big problem actually It means that those journal-years woudl be weighted more heavily, indirectly but in multivariate analysis that would mostly be taken care of better any remaining bias from that, I think, than not enough samples 1:47 PM Sarah: ok, well then, push my data collection projection back a bit also, the 2000 snapshot doesn't seem to be that informative very few have any reuse and sharing instances 1:48 PM so that cuts the sample size from 15 to 1 or 2 Heather: not surprising, but informative nontheless Sarah: so, should i proceed in 2000? Heather: yeah. that's ok though. hrm, I think it depends on your time constraints. I don't have a good enough sense of how long it takes you per article so what the cost is of you doing the extra 10 per year 1:49 PM me: Is that something that can be noted in the discussion or methods, an explanation of the sample size discrepancies? Sarah: um ... 2hours for 10 articles Heather: It makes it an easier story to tell if you have 25 for all years Sarah: that's conservative, but ends up being realistic when dealing with difficult articles Heather: Valerie, yes for sure. Methods if it is a reason for not doing something (ok, or discussion, fields vary) and discussion for how the sample size may limit the generalizability of results 1:50 PM yup, I believe it. That is still pretty fast, given all you are extracting. So if you were to go for 25 across all journal-years, when would you push your data collection done date back to? Sarah: friday probably considering download times and such considering download times and such Heather: Yeah, I Yeah, I 1:51 PM I'd say go for it and get a 25 across the board picture 1:52 PM That's my opinion. You are closer, to push back when/if your gut disagrees. Sarah: so....25 in 2010, 25 in 2000, and then 25 per year in my two time series correct? Heather: definitely 25 in 2010, 25 in 2000 Sarah: like i told you before, I think the time series, even though not relevant for trends, is the most statistically usable dataset 1:53 PM Heather: so yes, I'd say 25 in the time series too, ideally that said, since that dataset has more... um... what is the word that I'm looking for Sarah: robust ? better sampling of actual resuses? 1:54 PM Heather: the years are more similar to each other, so more overlap. yeah, robust... or hrm,,,, Sarah: i.e. though my sample size is 25, not all of those can be assessed for reuse/sharing practices Heather: not duplication, but similar datapoints anyway Sarah: yeah, i get it even though were lacking the word Heather: that dataset has more similar datapoints, so having 15 per year (or whatever) but lots of years 1:55 PM wouldn't be as bad as 15 per year with 10 years between them know what I mean? Sarah: yeah Heather: so if you have to cut back on collecting extra you could probably manage without beefing up the time series 1:56 PM I hear you that not all can be assessed for sharing/reuse Ideally, sure, we'd have 25 sharing and 25 reuse or something (or a sample size large enough to acheive that) but that is clearly outside the possibility of this summer project so we'll make due with what we have and add a wish list in the discussion section 1:57 PM Sarah: ok, and just use the word "preliminary" a lot in the writeup Heather: yeah exactly Sarah: got it. i'm good for now then if you want to cover valerie's stuff Heather: great. Valerie. where do you want to start? 1:58 PM nicholas.m.weber: (maybe she's editing the OWW ?) me: probably the search spreadsheet (sorry) nicholas.m.weber: ah me: yeah, I was posting some links there 1:59 PM Heather: great. what do you think? want to start by giving us a tour of your spreadsheets? or talkig about DAAC data, or ? me: I think the overall search might be the data I work with the most. ok, tour of the spreadsheets sounds like a good starter 2:00 PM I can re-link if needed Heather: want to post links, or point us to the page that summarizes links, or? me: ok yes Heather: ohhhh I just thought of it, sarah. The word I was looking for was redundant. 2:01 PM anyway. valerie, carry on. me: My first raw data spreadsheet https://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE1LYlYtWHRXblNXa3ladXNNY3BDbEE&hl=en My first raw data spreadsheet https://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE1LYlYtWHRXblNXa3ladXNNY3BDbEE&hl=en Heather: I love the red warning block. me: My search comparison spreadsheet: http://spreadsheets.google.com/ccc?key=0AgM1E1R2tI_6dE9yX2J2NGwwcWhtSWg0NUZvRWlXdmc&hl=en ha I figured it was eye-grabbing (if not eye-gouging) 2:02 PM Heather: It definitely makes me want to think twice before basing my next research grant on your current results :) me: ha The spreadsheet I made based on Sarah's Shared Fields template http://dl.dropbox.com/u/2281212/SharedFields_Valerie.xls 2:03 PM I should probably start chronologically Heather: great. so those three are your main ones? me: yes Heather: (sorry, btw, that I didnt' keep better tabs on this. Portland conference and all that, but no excuse.) great. me: it's ok, a lot of the information overlaps/is still pretty raw 2:04 PM I more or less wanted to go through to see if there was anything else I needed to capture Heather: ok, so where would you start explaining these to someone? me: well, the first link, the data_citations spreadsheet was when I was running random searches for the databases 2:05 PM Heather: yup me: I captured information only for articles that had cited data, although a lot of times, I found it was cases of deposit so I created the "phase II" or "edit" sections Heather: so I'm a bit confused... ok, each row is a "hit" is that right? me: I went from a generalized search to a specific search yes yes 2:06 PM actually no in phase 1, some of them were misses after I copied the sentences and they were revealed to be data deposit and not data reuse Heather: where is this phase II/edit section? ah hah! me: on the TreeBASE and Pangaea pages, they're down Heather: I was on the DAAC tab and didn't see it me: (I mean, if you scroll down) 2:07 PM yeah, I hadn't done that with the DAAC tab because those were from the spreadsheet Bob sent Heather: gotcha! 2:08 PM ok, so if you found the same reference via multiple searches, it shows up on multiple rows? me: I think there was one case where that happened, and I made a note of it in the same row I probably should have made separate rows to avoid confusion I probably should have made separate rows to avoid confusion but I'm pretty sure that only happened once 2:09 PM Heather: only one case? I'm surprised me: unless I have duplicates which may be possibl e Heather: I woudl have thought that doing similar searches in ISI or google scholar or whatever woudl have produced overlapping resutls me: ah, this is something I should be noticing actively and taking note of then? Heather: no, not necessarily 2:10 PM me: after awhile, the results sort of blur together Heather: I'm just trying to make sure I understand 100% me: so it is likely that there are overlaps Heather: yeah, I hear you me: I'm sorry I got confused Heather: also, I think at this point you were doing very exploratory searches, right? are the results from all of your "formal" 27 searches in here? 2:11 PM me: yes Heather: ok me: and the summary of those searches is the search_comparisons spreadsheet there's actually 38 there but I used multiple examples Heather: gotcha me: (for particular author names/datasets/etc.) Heather: I'm not at all suggesting you go back and get this now.... 2:12 PM but if/when doing things like this again, another piece of information that woudl be helpful is the sentence that makes the citation so not the citation itself (though I'm glad you have that too!) but the sentence that makes the citation to understand the context and words it uses to talk about its reuse me: ok. I thought I had put that in column P 2:13 PM Heather: maybe you did, hold on hmmm, so in row 53 of the treebase tab, hmmm, so in row 53 of the treebase tab, column P looks like this, right? HIGDON JW Phylogeny and divergence of the pinnipeds (Carnivora : Mammalia) assessed using a multigene dataset BMC EVOLUTIONARY BIOLOGY 7 : ARTN 216 2007 me: oh Heather: was that in the article bibliography? 2:14 PM me: yeah, I might have had that as a placeholder Heather: ok, I can see that in some other rows above you do have the sentence me: (Since I didn't have fulltext on some of them) Heather: gotcha 2:15 PM hmmm, in that case it might just make sense to replace the placeholder with "unknown, no access to full text" or something like that? me: ok, that makes more sense Heather: I do see all the other sentences though in other rows. that's great I'll turn it around now then and ask for the other thing too.... 2:16 PM me: oh wait, are you looking at the RAW or EDIT page for TreeBASE? Heather: so in addition to the sentence, which it looks like you do mostly have.... me: I may have found the fulltext for all/most of the articles Heather: it would be useful to also have the reference, to see if it did say anything at all about the data within the bibliometric citation I was looking at TreeBASE RAW. Should I have been looking at EDIT? 2:17 PM me: oh yeah, sorry Heather: no problem me: I think I only went through and found the fulltext for the edited sheet Heather: ah yes, column P is much more complete! awesome me: I was worried for a second because I'm pretty sure I was able to get most of these through UNM 2:18 PM Heather: ok nice me: all right, but yeah, it was a lot more useful to see the citation in context like one example where it just mentions treebase in row 19 "The TreeBASE interface http://www.treebase.org supports six query types: author, citation, study accession number, matrix accession number, taxon and structure. " it's more of a mention than a citation 2:19 PM Heather: yeah. and not really data in or out, eh? so I'm looking at the DAAC tab in that spreadsheet and I'm a bit confused. me: yeah, I wasn't sure what to put for that Heather: what is in it? 2:20 PM me: I put in 1s where it should probably be 0 since it didn't actually use data since it didn't actually use data Heather: right, yup, that might be best so on the DAAC tab, it has 17 data rows. so on the DAAC tab, it has 17 data rows. 2:21 PM you didn't pull all of the data from Bob's spreadsheet for this sheet, but you did pull some? me: I think I pulled some to construct the searches for this spreadsheet Heather: ok, gotcha me: like the cited author search or the doi search 2:22 PM Heather: ok. I think I've got a handle on that spreadsheet now. on to your searching ss? me: should I have gone through all of the ones on Bob's spreadsheet? ok, yes as I mentioned, even though I came up with 27 types of searches, this sheet has 38 rows because I tried to use multiple examples of author name/doi 2:23 PM Heather: ok, makes sense now your dropbox sheet? me: ok Heather: (Nic and Sarah feel free to jump in whenever if you have comments!) 2:24 PM me: I had used Sarah's formulas on the ISIraw_PasteFullRecordHere page Heather: I'm a fan of your DAAC tab, I know that so far me: and copied and pasted the ISI downloads into it thanks the one thing I ran into was for my non-ISI searches, I ended up just entering the DOI on the Reuse pages as opposed to trying to fill out the ISI full record page 2:25 PM Heather: ok, so is the "Article" tab part of what makes the formulas work? so it contains transient info? me: I think so. I started filling it out, but as I got into articles not from ISI, I did most of my filling out of information on the Reuse pages 2:26 PM Heather: ok so the DAAC tab contains one row for every row in Bob's spreadsheets, right? me: I tried to copy/paste them all onto each page and place 0s where they didn't apply yes plus the other ORNL DAAC articles I had found through other searches Heather: what about thte other reuse sheets? 2:27 PM they contain the reuse articles you foudn for that repository type? they contain the reuse articles you foudn for that repository type? me: now that I think about it, I should have made a way to integrate the search spreadsheet with this spreadsheet so I would know which articles came from what searches yes Heather: do they also contain reuse articles that sarah found (I'm guessing not, just want to make sure) me: the 0s are placeholders because I tried to list all of the articles on each page 2:28 PM I think I cleared Sarah's data before putting mine in just in case of overlap Heather: ok me: although she had sent me the template early in her data collection Heather: so where there is orange zero blocks, that is because that article reused data from a repository other than the one the tab is named for, is that right? me: yes either the orange blocks or a 0 Heather: ok 2:29 PM me: although when I look through each page, I don't think that the rows all have the same numbers, so I may have accidentally pasted over some things Heather: ok. and the share tabs don't really have any useful content at this point, right? 2:30 PM me: I hadn't looked into the sharing, since I was mostly looking for reuse only. Heather: right. just making sure I understand just making sure I understand and that there isn't secret text hiding behind the orange background or something

nice Tennessee orange btw me: there might be. I'm not quite familiar with all of the formulas ha, I didn't realize that Heather: nah, I was joking. Sarah: it's just conditional formatting 2:31 PM Heather: gotcha me: ah Sarah: i did so i could see empty cells that needed my attention Heather: good idea Sarah: so in theory, if the record is complete, nothing should be orange though in mine, i haven't taken the time to enter all the "no" data yet 2:32 PM Heather: ok, valerie is there anything else you want to point to in this data deluge? me: well, I was just wondering if I should try to combine either this sheet or my raw data sheets with the search comparison sheet would it be clearer to explain? 2:33 PM Heather: hmmm, I'd wait I'd figure out what your top goal is and be driven by that at this point me: ok Heather: You've done lots of bottom-up, which is super. Now time to flip around I think....... 2:34 PM me: is this where I go through the spreadsheets to do reverse searches? Heather: Well, I'm not sure. So I'm starting to get an idea in my mind about what the backbone of your article could be based on Do you have ideas about that? 2:35 PM Thinking if we start there, it will inform what other searches to do, data to gather, spreadsheets to consolidate, etc me: well, I'm figuring out that each search function is built to accommodate different methods Heather: So maybe let's talk ideas me: good plan. I wasn't sure if I was coming to the right conclusions 2:36 PM I had been making observations in OWW but nothing particularly in depth Heather: Yup. Want me to relay my DAAC idea, and you can see if it rings true for you, or inspires other things, or ? me: sure sure Heather: (or maybe you read it already and just want to cut straight to the commenting?) 2:37 PM here, I'll try saying it one more time nicholas.m.weber@gmail.com has left nicholas.m.weber@gmail.com has left me: the searching by DOI idea? Heather: because sometimes hearing things multiple ways/times can help make them clear. Yeah. or I guess more generally, using the DAAC experience as the backbone of the article 2:38 PM You paint a picture of a repository that wants to know how its data is reused 2:39 PM so that they can learn, give feedback to funders, provide links to data creators, etc They ask a librarian to look for these reuses once a year (hrm in case it isn't clear, you paint a picture of the DAAC repository itself, not a hypothetical one) me: ok 2:40 PM Heather: Ask the DAAC librarian who does it how long it takes her how she does it, etc Report that in the article I'm guessing it takes a while and is frustrating me: and compare that with my own experience? Heather: if her experience is anything like yours! me: she's probably a million times better at it Heather: I wasn't thinking compare, as much as compliment. me: ah, ok 2:41 PM Heather: Focus on the DAAC experience as a case study for the first half of the article, maybe me: and then mention the others? Heather: and towards the end of the DAAC part, you could say "and furthermore they offer DOIs!" ok without the ! me: ah Heather: and talk about how DOIs are supposed to make traking articke reuse easier but darn it, no one is using them 2:42 PM me: ooh, that's good Heather: and you can quantify the "no one" by finishing up the great analysis you were doing on the DAAC tab of your dropbox spreadsheet that will allow you to say somethin glike "only 14% of all the reuses found by the DAAC librarians actually used DOIs" or something like that 2:43 PM me: I wasn't really doing much analysis, I was just plugging away based on the awesome model Sarah set up. I wasn't really doing much analysis, I was just plugging away based on the awesome model Sarah set up. Heather: right, but I think you were capturing the references, is that right? or if you weren't, you could? me: yeah Heather: to see the patterns of citation? me: the sentence and location 2:44 PM and the fact that almost none of them have it in the references section Heather: the benefit of the DAAC dataset is that it is a librarian-derived set of repository based reuses and so it provides a great baseline... something your study has been missing until now 2:45 PM you could say, because of this diversity in data citations, my initial attempts to find DAAC reuses met with little success. only 23% mention the DAAC url, etc 2:46 PM then follow on to the DAAC first half me: ah, ok others just mention the data authors, etc. Heather: by saying "most repostiories don't have DOIs and so finding their reuses is even harder" yeah so the points of the article would be something like 2:47 PM a) finding instances of data reuse is hard (estimate of difficulty: estimate from DAAC librarian plus a bit of anecdotal colour from you) 2:48 PM b) there are plans to make it easier, but so far the uptake has been low (estimate from # DOIs in references within DAAC set) c) hrm, I'm sure there is a third point in here somewhere :) Sarah has left Sarah has left 2:49 PM Heather: And this is done using the motivation/dataset/discussion of the DAAC case to start with, and flushed out towards the end with your experiences qualitatively and with different repositories Thinking you have a conversation with Bob Cook and the DAAC librarian and maybe others to collect their thoughts and experiences. Whatcha think? me: this sounds good definitely a way to open up a dialogue definitely a way to open up a dialogue 2:50 PM Heather: Doesn't have to be like that exactly of course, make it your own me: it's good to have a framework to work with Heather: But I think the DAAC dataset and reuse-hunting-experiences provide a useful framework yeah exactly me: I've written arguments before, but not in a scientific/bibliometric capacity. 2:51 PM ok, neat Heather: so not sure where you start me: should I email the DAAC librarian? Heather: yeah. maybe Bob Cook first? me: ok 2:52 PM Heather: maybe before you do that, you might want to have a look at the DAAC data you have so far extracted on the DAAC tab me: come up with a list of interview questions? oh ok Heather: so that you could come into the conversation with a bit of knowledge 2:53 PM along the lines of "I looked at 54 of the 124 reuses in that spreadhheet and it appears that only about 12 of them have DOIs" "this was less than I would have guessed" is that in line with your experience? etc or whatever the reaal numbers are 2:54 PM me: yeah, that makes sense I could quantify those Heather: tally up a few other things too, maybe, so that when the librarian tells you how she looks for the reuses her answers make sense to you based on the data you have like are there some where when you look at the full text you have no idea how the librarian would have found them? 2:55 PM if so, you can ask her explicitly about those. you know what I mean. Spend a bit of time with the DAAC citations you've extracted so you are familiar and can ask good questions and understand the answers :) 2:56 PM me: ok Heather: one thought is before you do this, sleep on it me: that was why I was wondering about combining the search compariosns with the other sheet Heather: just to make sure you don't have some direction that you'd rather take it 2:57 PM another idea is to send an email to data_citations email list describing this direction for the article, seeing if they have any suggestions, etc before you email Bob and the librarian anyway, follow your gut on that. just wanted to share a backbone idea so that you could start to frame your article and drill to what you needed me: excellent idea 2:58 PM I'll write out a rudimentary outline I'll write out a rudimentary outline and some initial figures/question s and then email the datacitations list Heather: yeah, ok, if you think combining the search spreadhseet witht eh DAAC one would help, I could definitely see that me: ok 2:59 PM I think it could help at least for keeping me from getting dizzy juggling all the sheets :D Heather: I'm guessing you might also want to merge together your DAAC dropbox sheet with some of the columns of Bob's original sheet 3:00 PM like the column that said what dataset number they were reusing me: yeah Heather: what DAAC project, etc (basically all the columns, why not) 3:01 PM it would be interesting to know what % of the DAAC reuses included the Data_Set_ID number in their citations/papers, etc. it would be interesting to know what % of the DAAC reuses included the Data_Set_ID number in their citations/papers, etc. whatcha think? me: as opposed to the article doi good plan Heather: right. or in addition, or who knows? 3:02 PM shall we go through the DAAC dropbox sheet in detail, briefly? me: sure Heather: I know they are Sarah's columns, but want to make sure they are capturing everything you need to capture for your arguement and story 3:03 PM Now in this case the orange 0s are because you didn't have full text, or you didn't get to those, or ? Now in this case the orange 0s are because you didn't have full text, or you didn't get to those, or ? me: the orange 0s were the ones carried over from the other reuse pages except for one where I didn't have full text I meant to keep using my own color coding scheme that I started on the ISI raw data page 3:04 PM Heather: the DAAC spreadsheet from Bob had 116 rows of reuse articles I think, right? I think you want a sheet that only has those 116 reuse articles on it me: oh hm. I wonder why I have another ORNL DAAC spreadsheet with even less than that 3:05 PM I think I mixed up my spreadsheets Heather: so maybe make a copy of this sheet and cut out everything that comes from your searches instead me: It looks like I didn't copy/paste all of bob's spreadsheet ok Heather: jsut for the sake of the "first half" of the paper as I'm envisioning it.... 3:06 PM yeah. make sure you have exactly those papers, no more, no less, then you can calculate stats based on "what the DAAC librarian found" me: ok Heather: ok, column D in the dropbox sheet ok, column D in the dropbox sheet is Y when they mention the DAAC somewhere in their reuse paper or citations? 3:07 PM me: yes Heather: great type of dataset me: There's a key I made up that either abbreviated the name of the project or did something else 3:08 PM Heather: ok me: like RP: River Productivity Data Heather: so maybe make a standalone spreadsheet of this stuff and in the README you could put those codes? or some other way you think it woudl be easy to communicate me: ok ok 3:09 PM Heather: location of intext citation? intro methods abstract etc, right me: yes Heather: now if it is in multiple places, how did you decide what sentence to cut and paste into column I? 3:10 PM me: I tried to get all of them adding [...] between each sentence adding [...] between each sentence Heather: ok, just concatiated togetehr. gotcha. me: it got lengthy Heather: is the "R" in location for references? me: yes 3:11 PM Heather: ok. me: although in some cases I might have mistaken D for R by saying Repository instead of Depository Heather: it might be useful to make a second column beside I to hold the references cut and pastes yeah, I hear you. The R/not R distinctioni is important, since it determines whether the info can be looked up through full text or ISI me: oh you mean like a 1 or 0 for if there is a relevant excerpt followed by the excerpt? 3:12 PM Heather: hmmm, not sure what you mean. me: oh Heather: I mean have two columns like your "I" column now me: there is a "relevant bibliographic citation" in the next rows Heather: where one of them has text that is in the body of the article me: in K, I believe Heather: ok, gotcha, so I'm jumping ahead of myself, eh? ok, gotcha, so I'm jumping ahead of myself, eh? 3:13 PM me: Sarah was very thorough in her headers Heather: yeah, ok. hrmmm/. ok, let me come back to this thought in a minute me: ok Heather: in the mean time, in column G, what is R? 3:14 PM references again? ?? me: I think that might have been R for repository I meant to put D for Depository I meant to put D for Depository Heather: ok me: since those cite ORNL or Oak Ridge, etc since those cite ORNL or Oak Ridge, etc 3:15 PM Heather: what is column H? me: it seemed redundant I took it as where did it come from Heather: what does "NI" stand for, do you know? me: either from the author or the repository or Not Indicated 3:16 PM Heather: ah hah. ok. is having a Y in column J the same thing as having an "R" in column F? me: not necessarily 3:17 PM Heather: so then what does it mean to have an R in column F? me: oh, I think I put it in R for references if that was the only mention 3:18 PM Heather: or there is an R in column F if the references themselves mention data me: yes Heather: as opposed to the references being the original data-collection paper? me: or the repository name yes yes as is the case for most citations by author name Heather: there is an R in column F if the references themselves mention data or the repository name there is an R in column F if the references themselves mention data or the repository name is that right? me: yes 3:19 PM Heather: gotcha then the citation itself is in column K me: yes where either the repository name, author name, doi, etc. are mentioned where either the repository name, author name, doi, etc. are mentioned in the reference page in the reference page Heather: and what was the criteria you used for determining column L? me: er reference section Heather: yup, I hear you. that makes sense. 3:20 PM me: L was more or less if it was according to ORNL's data citation policy where it includes the DOI so I put N for most of them Heather: ok. me: I have a link to ORNL's data citation policy I can reference in the paper Heather: you might want to go recode that a bit one thing that would be useful to pull out explicitly, into its own column, is whether there is a DOI 3:21 PM it looks like, for example, rows 81 and 82 have a Y in that column but no DOI it looks like, for example, rows 81 and 82 have a Y in that column but no DOI 3:22 PM me: oh, I think I counted it if the URL was included Heather: ok. me: like http://www.daac.ornl.gov/MODIS/modis.html Heather: yup, I'd break that into two columsn for your own information me: so y/n DOI column Heather: yup me: ok cool Heather: plus a y/n url column 3:23 PM plus anything else that you think would be helful to break out maybe a y/n "it mentions the name DAAC in the bibliographic reference" for example part of the point of data citations is that it woudl be WAY easier to track them if we could use bibliometric resources me: ok, so you mean like the first spreadsheet I made? Heather: like ISI or Scopus like we do with articles but we can't do that unless 3:24 PM a) people use bibliometric citations rather than just in-text mentions and b) we know what to look for (and what field to look for it in) within the bibliometric citation standard with articles but as you found with ISI it frankly isn't clear what to look up where to find doi and citations in bibliographies! 3:25 PM me: yeah Heather: whoops: but as you found with ISI it frankly isn't clear what to look up where to find doi and data citations in bibliographies! me: there's not a field for it Heather: right. that is a very useful poitn to make in your article I'm surprised ISI doesn't have an [all] search aiblity, to search in all facets of the citation it is a pain not to have it! me: yeah no fulltext 3:26 PM every other search has fulltext Heather: which reminds me... do you have scopus access? me: I'm not sure I haven't used it Heather: yeah. I wouldn't put it at the top of your list, but I think it might be useful to use me: ok Heather: if you are going to redo any of your searches 3:27 PM it does have an [all] so you can search in all aspects of the citation me: good to know Heather: yeah... for what it is worth, I'd be careful using the word "full-text" with regard to citations it is easily confusing, for me it is easily confusing, for me because you mean the full string of the citation, right? but most people think of the full text of the article, the intro, results, etc. 3:28 PM me: oh that was what I meant full-text article searching Heather: hmmm. me: because I know Google and Scirus do that but not ISI Heather: yeah, so agreeed, full-text article searching would be helpful and would solve the problem me: although it does have its limits 3:29 PM Heather: riught like they don't have the coverage that ISI has because ISI only has metadata and references, right? me: yes Heather: because that is what publishers are willing to share with them. so while it woudl be nice if ISI had full-text searching, that isn't likely to happen any time soon 3:30 PM me: at the very least, a doi search Heather: what I wish that ISI had, in a practical, I-don't-see-why-they-can't sense is the ability to search for a word in any part of the citation as opposed to just in the authors field or just the journals field or just the first-page field admittedly there aren't many people who want to find all citations to papers by Dr Apple and published in the journal called Apple 3:31 PM me: yes Heather: but I think it woudl be useful in our case because we frankly don't know where the doi is going to show up, or where they are going to slot the "Data from Oak Ridge" phrase but I think it woudl be useful in our case because we frankly don't know where the doi is going to show up, or where they are going to slot the "Data from Oak Ridge" phrase It doesn't look like ISI supports that to me, does it to you? you can OR all the parts together, but that is cumbersome and still maybe lossy 3:32 PM me: ok I was just thinking from the ANDS angle where they were working with Thomson Reuters and Elsevier to improve search functions for data Heather: yeah. agreed. 3:33 PM might be worth touching base with them when you have some of your article flushed out, in case they want to add or give context to something me: ok, neat Heather: ok, so how are you feeling? me: more up to speed Heather: do you have stuff to go on? 3:34 PM me: I'll process these notes today and sleep on it Heather: does it feel like you have a clear path? one that makes sense to you? one that you believe in? that's the goal anyway

me: yes Heather: ok, sounds good. me: I'll go post this conversation. 3:35 PM Heather: ok, great. let me know if you need a sounding board if some of it isn't making sense or doesn't sit right let me know if you need a sounding board if some of it isn't making sense or doesn't sit right me: sure thanks a ton Heather: you'll aim to have an email out to datacitations in the next day or two? me: yes once I get a solid outline of what I want to cover 3:36 PM and after I look through the 100+articles Heather: cool. maybe we have another chat towards the end of the week? we'll play it by ear. me: definitely Heather: if you have trouble getting full text, remember that Bob offered to post them or something. me: ok Heather: guessing that will take a little time, so if you expect to have problems, probably best to ask for that earlier than later 3:37 PM I'm kicking myself that we didn't meet with Oak Ridge Bob when we were there. me: I guess we just didn't have time. Heather: Was he out of town? I don't know. Anyway, he has been very supportive via email so we'll just soldier on remotely me: yes, the spreadsheet helped guide my searches 3:38 PM Heather: Have a good rest of the day and talk soon! me: you too! later


 * }