DataONE:Notebook/Data Citation and Sharing Policy/2010/07/19

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Project name
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Entry title

 * Nic Weber 12:43, 19 July 2010 (EDT):Having not done qualitative data analysis before, I thought I would begin by brainstrorming (nothing below is formal) what kinds of correlations I'd like to test in the data I've gathered thus far. Keeping in mind I don't yet know how to test these.

Repositories: Will be difficult to establish a value. A repository being better, more important or more respected is hard to quantify. Nevertheless, I would like to know correlations like

Journals: The impact measures and stats from Reuters Journal Citation Reports will be helpful in establishing a relative value of a journal's importance within it's given category (...though calculations like Impact Factor are themselves flawed...)
 * Does a journals request....or requirement of data to be shared affect impact factor?
 * Does place of recommended deposit affect impact factor?
 * What is profile of a journal with required or requested data sharing ? (is it more likely to be larger publisher, affiliated with a society, affiliated with other ISI categories (if so which categories)? etc.

Funding Agency: Again, its hard to determine a hierarchy of value for a funding agency. One interesting thing to look at might be the location of a funding agency... not sure how this would be analyzed. It might also be worth trying to establish the likelihood that national foundations (such as NSF) are more likely to have explicit sharing plans. Regardless, it would be interesting to look at the characteristics (that is it's specifications) for national government agencies vs non-profit/ charities. Not sure how to quantify the characteristics though.


 * Nic Weber 16:02, 19 July 2010 (EDT):I ran confidence intervals for the stats we entered in our paper... (the code is saved, results below) It's obvious that we had much better CI for the journals with a sample size of approximately 300 than the Rep (26) or Funding Agencies (53). So, we can either significantly increase the scope of our repositories and funding agencies...or

> #confidence intrvals > #Repositories > (100*binconf(03,26)) #direct journal affiliation PointEst   Lower    Upper 11.53846 4.003245 28.97590 > (100* binconf(03,26))#require associated publication PointEst   Lower    Upper 11.53846 4.003245 28.97590 > (100* binconf(08,26))#give instructions how to cite their holdings PointEst   Lower    Upper 30.76923 16.50132 49.98826 > #Journals > (100* binconf(30,307))#request data to be archived PointEst   Lower    Upper 9.771987 6.930944 13.60733 > (100* binconf(10,307))#require data to be archived PointEst   Lower    Upper 3.257329 1.778763 5.891211 > (100* binconf(32,307))#give explicit directions on where to archive PointEst   Lower    Upper 10.42345 7.480629 14.34447 > (100* binconf(20,307))#give isntruction on how to share upon request PointEst   Lower    Upper 6.514658 4.256477 9.847645 > (100* binconf(17,307))#give instructions on how to cite data PointEst   Lower    Upper 5.537459 3.485725 8.688153 > #Funding Agencies > (100* binconf(1,53))# direction on how to cite data PointEst    Lower    Upper 1.886792 0.0967798 9.942912 > (100* binconf(23,53))#require data to be shared in some way PointEst   Lower    Upper 43.39623 30.95040 56.73465 > (100* binconf(4,53))# specify duration of data pres PointEst   Lower    Upper 7.54717 2.973956 17.85848 > (100* binconf(13,53))#give directions on the type rep PointEst   Lower    Upper 24.5283 14.93290 37.56656 > (100* binconf(4,53))#supplemental funds for data PointEst   Lower    Upper 7.54717 2.973956 17.85848 >


 * }


 * Nic Weber 16:28, 19 July 2010 (EDT):Completely unrelated to what I was working on, I wanted to go back into the ESA journals and look at their metadata registry language. But while there I noticed this recommendation for citing Scientific reports:

Scientific and Technical Reports and their Parts NOAA (National Oceanic and Atmospheric Administration). 1961. Climatological data–Kansas. Asheville (NC): Environmental Data and Information Service, National Climatic Center. Report NOAA-03-88-1.

Does this mean I retrieve it? ( Well after some poking around I did... ) But I'm still confused as to why a journal with the foresight to create and maintain a metadata registry would be so casual and unintentional about asking their authors to identify data reuse.