DataONE:meeting notes:8 June 2010follwupemail

Follow up email from Heather regarding June 8 2010 meeting
Sarah and Nic,

You'll notice I moved the main home of the DataONE OWW site... I took Dryad out of the name to make it more broadly appealing.

I just talked to Todd and he likes the direction of the projects. He had a few ideas to make the results even more valuable.

Unfort I have to run in a few minutes, but here are the highlights: And finally, we need research questions :) and then of course a plan for the next three weeks.
 * 1) no need to do an inventory across all the Dryad journals, rather, pick a few that are representatives of different areas (I have some background info on how to do this, will get it to you tomorrow)
 * 2) it would be very useful to look at trends over time, going back about 15 years or basically before electronic journal supplementary materials were common. Could do a continuous sample across time, or just a sample every five years for example
 * Nic, rather than just funders you could also look at data sharing and data reuse/citation policies of journals and the most-common data repositories
 * 1) might be neat integration across the two projects if you both look at the same journal issues. That way we would have a picture of what constraints/policies the authors were under with respect to sharing their own data (Nic's side) and how they behaved, as well as info on how the same authors behaved with respect to reusing other people's data (Sarah's side)

Let's work together in the next day or two to flush out the research questions... and let's do it openly on the OWW site to solicit wide feedback. Then you guys can run with the research plan and methods.

Will talk more soon but don't wait for me to keep hashing things out on OWW, Heather

Sarah's response to bulleted points

 * 1) Currently, I am focusing on only a few journals anyways...SysBio, AmNat, and MolecularEco b/c they have the most datasets posted on Dryad, so I figured the most possibility of dataset citation. Also they cover a good range...SysBio (strong journal policies about datasharing, subject: phylogenetics); MolecularEco (strong history in datasharing with genbank; subject: genetics); and AmNat (I don't know about the history, but typically a leader; subject: broad - ecology...often behavioral or model based).  I would appreciate the background info you promise to select a few other journals...particularly an earth science/GIS focused journal.
 * 2) I've started with 2010 just because it's most recent. I was thinking of sampling the first issue (or first 20ish journals, whatever comes first) of each journal for every year since 2000, then five year samples pre-2000 (which is also generally pre-good internet access). I'm also interested in other trends including rates of dataset citation correlated with if an article/journal is open access, changes in metaanalysis approach to data citation, topic/field trends for data citation (phylo vs. genetics vs. behavior, etc). These are a few of my questions that have arisen as I have done preliminary article-centered investigation....they are purely philosophical and unrefined at this point.
 * 3) Great! I was wondering if we were looking into these things, especially: existence of a data citation policy, terminology of policy, internal data repositories, partnerships with data repositories, how authors are to submit data (cumbersome, independent, journal takes care of it, whether journal validates that data was submitted....an investigation of how genbank and its affiliates make it work would be good here). Some general journal stats may be good as well: impact factor, publications per year, turnover rate of publication, affiliation with a society, number of journals in that society (i.e. like the ESA group of journals). I've thought of lots more...this might be good for Nic to do since he's already collecting the journal "metadata" related to funding. I also think this information would be good to have in a database with the article-centric data so we can look at global (journal) vs. local (article) factors that contribute to data citation practices.
 * 4) see above - also, one of the first articles I came across in AmNat was by a professor at my alma mater. I'm going to contact him soon about his experience with the journal to get the data cited, the encouragement by the journal, his personal motivation for doing so. Though this will be qualitative, I thought it would be good to have an inside perspective of someone that's gone through that process. On a side note, I also spoke with NEON representatives today at the conference I'm attending. The organization was mentioned in our first dataONE meeting. Even though I have nothing formally "reportable" from that discussion, I felt it was beneficial to gauge general perceptions of datasharing in the society I'm affiliated with.

Heather's response to Sarah's response

 * Good stuff, that sounds like a diverse first selection of journals. There are some survey results that could inform how to define the disciplines and possibly allow some nice integration between this work and the survey results.  I'll dig into those and upload them and get back to you.
 * 2010 is a great place to start, and you are clearly already thinking about trends in time. Great.  Before  you collect much data, think ahead about what stats you might run to look for trends.  Would you be better off with equal spacing across time?  a few timepoints with a larger number of datapoints at each, or a more continuous sample?  Depends on your analysis approach and goals, I think.
 * Agreed, correlation with other attributes would be very interesting. Agreed, most are probably outside the scope of your data collection, but they could possibly be done by Nic depending on where his research goes.  Nice list of potential correlates, btw!
 * Cool on personal interactions.... agreed, getting general perceptions can definitely help focus/highlight/ground the work, good stuff.
 * Do you want to take your ideas here and translate them into feedback on the research questions and draft plan and methods: DataONE/Summer_2010/Research_questions