Talk:DataONE:Notebook/Data Citation and Sharing Policy/2010/07/21

re funders and repositories

 * Heather A Piwowar 01:42, 22 July 2010 (EDT): Nic, hold off doing stats on funders and repositories for now. The N there is WAY smaller, so there probably won't be many compelling patterns (in statistics language:  the datasets are "underpowered" = too small to study the questions you are really interested in.  A fact of life, often... when you know it to be true, it is often best not even trying to run statistical tests with p-values).   Guessing that just descriptive analysis, tables, and proportions (with confidence intervals) may be the appropriate outputs for the funder and repository info.  Focus on getting really comfortable with the journal data stats for now.  Looking at alternative dependent variables (rather than the simple request/required variable) sounds like a good plan.  Make them 0/1 to run the analyses we have done so far.

re confidence intervals in dot plots
Download it into a text file named whatever.R then from R run the line source("whatever.R") (or you can cut and paste the contents of the gist directly into R)
 * Heather A Piwowar 01:42, 22 July 2010 (EDT): Nic, I've added some code to another gist: http://gist.github.com/485585

After that, try the code from earlier, but substituting plot.summary.formula.response.CIs for plot. plot.summary.formula.response.CIs(response) for example.

Can you see the lines on either side of the dots? They represent 95% confidence intervals of the proportions on the subset of data included in that individual "row" for the N= whatever is on the far right.

If the confidence interval lines overlap with one another (= include a few of the same estimates), then there isn't strong statistical evidence that the actual proportions in the real world are really different from one another. There is more than a 5% chance that the proportions are actually the same, and just randomly look a bit different in this sample that you have due to random luck. If the confidence intervals don't overlap, then there is statistical evidence that there is an "effect."

Does it work? Make sense? Send me a chat if/if not.

ps if you add so many variables that your text size gets hard to read, play with the cex.labels and cex values below: plot.summary.formula.response.CIs(response, cex.labels=0.1, cex=0.7)