DataONE:Notebook/Summer 2010/2010/07/27 chat

From OpenWetWare
Jump to navigationJump to search

In the chat room: Heather Piwowar (, Nicholas Weber (

11:55 AM Heather: You've been invited to this chat room! has joined

11:56 AM hi guys

Heather: Hi guys. Trying to get Maribeth and Bruce
Nicholas: Hi
Sarah: Hello

11:57 AM Heather: Unfort they are just showing up as "invited" not "online." Feels like there is some google chat magic I'm missing for connecting to people the first time. Any suggestions?

 (btw Valerie had another commitment, but will try to join us later)

11:59 AM just popped off an email to them. it's not quite 3pm here yet, so... 12:00 PM while we are waiting, let me update you on the survey. 12:01 PM Apparently Rebecca Koskela was in a review panel last week, so never got the chance to set up the survey monkey form

 I would expect to see it before the end of the week, although she hasn't given us a precise date

12:02 PM has joined folks there?
Heather: Yay, Bruce is that you?
refbruce: here now welcome!

12:03 PM Heather: Great, then I think we can get started. I'll keep an eye out for

 Maribeth and Valerie.
refbruce: Haven't used this before. It looks like there's voice and video?
Heather: Yes,
 though they don't facilitate the lovely text capture we all have come to love type fast bruce
Heather: but useful sometimes

12:04 PM refbruce: 'k. Tx

Heather: proposed agenda:
 a) OWW lessons learned doc
 b) intern updates on what they've been doing, what they have planned next

12:05 PM (bruce, jump in whenever, interrupting is no problem) warning - will leave at 4
Heather: c) ideally some pointers on end-of-internship stuff
 other topics? thats a good start
Heather: we're thining we'll keep the intern summaries brief this time
 Suzie sent me email asking about the OWW lessons learned doc

12:06 PM and when we would have it avail

 because she'd like to use OWW for a project for her new phds
 so mostly this is just a heads up
 that I'm going to make a page and ask for your help flushing it out
Sarah: sure
Nicholas: ok from the perspective of students, or mentors, or both?

12:07 PM Heather: yeah, good question

Sarah: do we want individual perspectives (1st person) or topic by topic how to
Heather: I think there is a market for two different types of docs
 right i like the topic by topic idea
Heather: so I think one doc that would be valuable to Suzie, and I'm sure others, is "one way to get started"

12:08 PM and we could give them pointers to what we've learned about how to make a "lab", RSS feeds, talk pages, etc. Mostly pointing to the docs that exist.

 then I think attached or separately could be random "lessons learned" and tips.

12:09 PM start a google doc?

Nicholas: I think it would be good to share our "formatting" and lessons learned, but part of that value is learning and then posting for one another...
Heather: (needless to say I think we also highlight this doc/these docs on our blogs and welcome feedback) or do it in OWW?
Heather: I think in OWW

12:10 PM because valuable for it to live in this community so that others here can find it gd pt
Heather: agreed Nic, nothing beats learning it yourself. But hopeuflly we can help someone avoid the struggles with "do I make a notebook under my name or my lab's name?"
 ok, probably enough time on that, just wanted to give it visibility

12:11 PM I'll make a page and send around a link

Nicholas: you're right Heather, I guess I was just thinking out loud
refbruce: And what are the lessons learned that are relevant to other platforms along the lines of OWW, not just this specific toolset.
Heather: good point Bruce
 definitely some things
 like value of a text chat log
refbruce: The advantage of being old. I've seen lots of fads and evolution :-).

12:12 PM Heather: (value of posting the text chats regularly, with little lag, <<whoops, recently>>)

 ok! good. Will try to emphasize that.
 oh, there is Valierie, hold on

12:13 PM Valerie has joined

Heather: Hi Valerie
 Just in time for status reports
 Nic, do you want to go first?
 quick summary of where you've been and where you are going?
Nicholas: sure
 So this week I’ve spent most of my time getting phenomenal R and stat lessons from heather

12:14 PM and then using what she has taught me to discover what in my data needs to be cleaned up

 and trying to undertand which variables are valuable did you figure out a way to get comments from others - I remember Heather saying there was a bug with the way it was set up initially

12:15 PM Nicholas: I've kept a pretty close record my oww pages

Heather: I think Todd means your spreadsheets
Nicholas: well, I think fusion tables was really hard to edit in
Valerie has left
Nicholas: so I moved "feedback spreadhseets" to google docs
 and posted a note to the group
 (also I'm focusing almost entirely on Journal data this week)

12:16 PM get feedback yet? (I still need to give you mine)

Heather: (I still need to give him mine too)
Nicholas: not much... Sarah gave me good feedback last night so I could dive in and sort out my Subscription model column ko
Nicholas: where I plan to go:

12:17 PM is understanding my stats better so that I can write them into a paper

 I've started an Abstract and Introduction sections for the Data Science Journal any take home messages yet?
Nicholas: and I'll post them tomorrow publicly
 take home messages?

12:18 PM headline results

Valerie has joined

12:19 PM Heather: I've mostly been trying to convince him to not look at the results very much yet :)

Nicholas: hmmm, not that far honestly -- it seemed at first there was some evidence of a relationship between Impact Factor and journal policy.... but that's ebbing and flowing as I clean and change data sorry...
Heather: since we are still data cleaning, etc
 nah, that's ok!

12:23 PM You've been invited to this chat room!

Nicholas: seperate

12:24 PM I broke out the original request / require column (what I had in Knoxville) into seperate columns 12:25 PM last question: what are the publisher categories?

Nicholas: and kept archiving directions and citation directions separate sounds good
Nicholas: Wiley Elsevier Springer and Other ok

12:26 PM Heather: Any other questions for Nic?

 Or Nic, questions for others?
Nicholas: it breaks down to Other - apx 125 -- The 3 Major Pubs -- 185

12:27 PM I don't think so, if you have input for my stats, I've been putting most things on my OWW calendar pages great thanks!
Heather: Yeah, Nic has not only been picking up the stats really fast,
Nicholas: I have a reallly really good instructor
Heather: he's also been blazing ground about how to show R code and results
 on OWW pages.
 Learning lots through is experiences that we'll all be able to use soon/later. Good stuff,

12:28 PM ok. Valerie, do you want to go next?

 Quick summary of what you've been doing, where you are going next?
Valerie: sure

12:29 PM Now that I have James's answers, I'll be able to finish this round of drafting on my perspectives piece

Heather: YAY! James's answers!
Valerie: Todd sent me guidelines to Learned Publishing, which is probably a good route to go
 since this is mostly aimed for publishers
refbruce: sorry that took so long on this end.
Valerie: oh no
 that's ok

12:30 PM I know you all are really busy there

 also, I've been adding links/files to Mendeley
 in collections based on articles I've found citing TreeBASE, Pangaea and ORNL DAAC data (an a separate folder for the ORNL DAAC articles found by James, etc.)

12:31 PM I also added the .pdfs for the web resources page

 (that was originally on the DataONE and my OWW notebook)
 the .pdfs that are up are only the web resources and the files Ranjeet sent me
 however, I am limited to how many people I can share all .pdfs with

12:32 PM (10 users per collection)

Heather: yes
 in general, how has the mendeley experience been?
Valerie: it's been great
 I've been able to add so much very quickly.
Heather: pros/cons as a way to keep up this biblio as a livign and useful contribution?
Valerie: a lot of things will autopopulate if you just have a DOI I have to say I'm really impressed with it
Heather: yeah me too
Valerie: the only con I can think of is whether or not someone other than me will be able to maintain the collection

12:33 PM I haven't quite looked into that

refbruce: Been doing some looking at Mendeley. Interesting. But I don't see an export capability.
Heather: I think you can change the owner, but I don't know if there can be more than one owner. worth looking into.

12:34 PM Valerie: the desktop program is very slick too 12:35 PM Bruce - the desktop app exports in bibtex, ris or endnote xml

refbruce: Tx. Good enough. I don't like lock-in. Those are good formats.
Nicholas: in one of the research centers here, they have subscription that allows us to share with unlimited users
Valerie: oooh
Heather: people there like it, Nic?
Nicholas: its a really great way to get a grasp one what people are doing research on
 they just started

12:36 PM like two weeks ago (just found out myself) not sure we need to share the PDFs as much as the citations
Valerie: ok
 yeah, there's a limit to how much file space you have for sharing
Heather: yeah, though sharing PDFs amongst ourselves can be useful
Valerie: (500mb, I think
Heather: solves the "who has this 1988 paper" problem
Valerie: (for the free basic account at least)

12:37 PM I think I was able to find that one and upload it

 the one on JSTOR?
Heather: yup! great.
 ok, so thumbs up on mendeley so far. Valerie, send aroudn a pointer to the public and shared collections
 and we can keep exploring/tagging/seeing what we think
 back to your research paper for a sec?
Valerie: a link to the datacites group?

12:38 PM (and/or on the OWW page?)

Heather: both
Valerie: ok
Heather: also maybe a blog post with a link to the public collection? that would be great
Valerie: sure
Heather: ok. as you looked over james's response, did it seem to answer all of your questions?

12:39 PM or enough of them? or ?

Valerie: I haven't looked at it yet (just got home), but the answers look very detailed
Heather: basically: do you have what you need? ok, great
Valerie: and all questions I listed look answered
Heather: let us know soon if you think you need more of something :)
Valerie: ok, if I have any followup, I'll be sure to send that around

12:40 PM Heather: great. and asking them if you can post their responses on our public OWW would be great.

Valerie: it shouldn't take me long to work these answers into the article text
Heather: ok, quickly moving to Sarah.
Valerie: oh yeah, I'll send that in my thank you email
Heather: so we can be done by the top of the hour
 Sarah, quick summary?
Sarah: ok....
 i'm in a similar state as nic

12:41 PM mid analysis and cleaning data

 i hashed out the rest of my factor classifications this morning with heather
 and am currently working on running analysis for reused and shared datasets in terms of what we're calling "resolvability" and "attribution"

12:42 PM and a combined score of "ideal" data citation

 *but it doesn't necessarily need to be called ideal
 just resolvable + attributed = a "good" data citation
 in that it allows you to find the dataset from the information in the paper and gives the original data author attribution for the dataset in the paper

12:43 PM so, that's we're i'm at

 year and depository are coming out as significant factors with genbank reuses driving the trend
 i haven't yet run the statistics on sharing

12:44 PM but am expecting similar things

 any questions on that?
 or, heather, things that I'm missing that the others should know
Heather: nope, I think that is good

12:45 PM one attribute of score we aren't pursuing

 due to sparsity of data, is what we might call "discovereability" did you agree to keep the two different samples separate for analysis?
Sarah: no
 we've combined them b/c the stats came out similar separate and combined
Heather: so how findable is the data reuse citation by someone doing what Valerie was trying to do. Would include where in the paper the reuse attribution was made. But since so few were in the biblio, for example, mostly not relevant for stats.

12:46 PM Sarah: i thought you said not to combine them, heather felt that you did

 so, that's still up for debate and could use further clarification, perhaps in response to the email string we've been passing back and forth
 or we can hash it out here

12:47 PM if the results are the same either way, that is justification for combining them - but make sure to note that the trend (as opposed to just the p value) is not different for the separate analysis

 if thats actually the case, of course
Sarah: ok, i'll check for that as well
 i also have a factor for "sampling method" to make sure that isn't significant

12:48 PM but I haven't run that yet 12:49 PM not sure about the wisdom of including that factor. It's not that you necessarily expect the means to differ, but perhaps the shape of the distribution

Sarah: it wouldn't be included in the final analysis
 just as a test of sampling artifact
Heather: good point, Todd
Sarah: so, i'm planning to run the complete analysis first, then run it with the sampling artifact
Heather: Sarah's going to post her code and results soon, so we can have a deep look

12:50 PM Sarah: is that not the way i should do it? fine as an exploratory thing
 just tough to interpret a negative results
Sarah: yeah, i'm hoping to put my cleaned code up today or the RSS
Heather: Todd, you are suggesting doing a stratified analysis
 to make sure the trends go the same way, right?

12:51 PM right

Heather: ok. but not formally stratified... just sanity checking the results from the separate analyses against the combined one
Heather: in general could do with third opinions on number of variables.
 yes, gotcha. that makes sense.

12:52 PM so I've been running with the "rule of thumb" of about 30 datapoints per degree of freedom in a regression

 of course such things depend on size of effect etc
 but do you have any rules of thumb that are different, that you prefer? my thumb has the same rules
Heather: ok :)
Sarah: ok...i'll proceed with that then

12:53 PM are you too heavy with variables and factors?

Sarah: if I'm not careful
Heather: nic too
Sarah: depending on factors we want to consider and factor/character states of each
 so i'm whittling down on both ends

12:54 PM you can do multiple uni/bivariate analyses and only do a combined analysis when there's some interest in the interaction per se

 like from a prior hypothesis

12:55 PM did that make any sense?

Sarah: yep
 that's what i'm moving towards ie test the response variable against individual factors before getting complicated
Heather: so do univariate analyses of all the things we are interested in, and only put those that remain interesting into the multivariate analysis
Sarah: more general on the multivariate end and then univariate where it's interesting
Heather: yup

12:56 PM Nic, bookmark this part of the conversation for later reference :) sarah - is that reversed?
Sarah: hmm?

12:57 PM univariate on everything, multivariate where you have reason to wonder if there is something going on

Sarah: i mean, i have broader character states for multivariate and then more specific ones for univariate
Heather: yeah, so Sarah actually had a different approach than that
 so Sarah maybe we talk this through and see which approach makes more sense..... oh i see changing the number of factors. hmmm... i'd need to know more to respond
Heather: the "univariate first" approach is probably more standard
Sarah: most of it is in the emails we've been cc ing to you

12:58 PM i did univariate first

 when i was having multivar problems sorry i am not keeping up with my email deluge very well these days...
Sarah: and at that point we decided the multivar was more interesting
 as i remember at least
Heather: yup. I think maybe going through R results will help. i'll look back at the correspondence

12:59 PM ok - sounds good. I should take off now. any parting issues?

Heather: none that can't be done in email? ok - bye all!

1:00 PM Sarah: so, heather, i think you got cut off on "discoverability"

Heather: Bruce, any comments? has left
Heather: yeah, that's ok.
 Sarah, you and I offline will make sure we are on the same page about what Todd was suggesting, and what to do about it.

1:01 PM that works?

Nicholas: Sarah, if you could post some of the code for univarite stuff that you do that would also be really helpful for me to see, even if its something simple that you just throw me
Heather: Yup. Nic, your summary() plot was "univariate" stuff too, just in case that wasn't clear
Nicholas: ok
Sarah: yeah, sorry i've been slow on code

1:02 PM refbruce: Nope, no immediate comments. I had a student who'd scheduled to come to my office at 4:00, and I've taken care of his needs :-). 1:03 PM Heather: ok! ok, shall we all sign off for now then?

refbruce: Will re-read the transcript and see if there are any thoughts that come up.
Nicholas: ok.. I guess not ... Bye
Heather: Maribeth said sorry she wasn't here... got caught in a meeting

1:04 PM Valerie: ok

Sarah: i'm good...heather, do you want to talk now or later?
Heather: she'll read the transcript and please email her if there is anything she can help with
 Sarah, now works for me
Valerie: is anyone posting the transcript?
Sarah: i will
Valerie: ok, thanks
Heather: Thanks. In general guys if you can help by posting any/all transcripts that aren't up yet
 I'll try to do them too, but I'm behind

1:05 PM (I know lots are up, but my conversations with Nic are slackign at least, whoops)

 ok! Bye all. Sarah, let's start a new chat window?
Nicholas: Ok, I'll get those up this afternoon
Valerie: ok, we'll talk soon

1:06 PM Nicholas: bye, thanks

Valerie: later and thanks again
Valerie has left has left
Sarah: yeah, new chat window is good
Heather: bye!
Heather has left