Talk:DataONE:Notebook/Data Citation and Sharing Policy/2010/07/12

R start

 * Heather A Piwowar 00:28, 13 July 2010 (EDT): Nic, once you install R, here are a few commands to get you going. Note: these graphs aren't pretty, they aren't very relevant, and the code is ugly... I just wanted to show you how to run some things on your own spreadsheets to get the feel of R.  I used your EvoBio_Journals sheet.

Two ways to get the data in. Either make your Google Fusion pages with Share->Visibility->Public and then use


 * filename = "http://tables.googlelabs.com/api/query?sql=SELECT%20*%20FROM%213622"

Or export the table and then use (substituting in the proper path)


 * filename = "~/Downloads/EvoBio_Journals.csv"

After one of those lines, you can run the following, one line at a time at the R prompt: dat.raw = read.csv(filename, stringsAsFactors=F) dim(dat.raw) names(dat.raw) str(dat.raw) plot(table(dat.raw$Publisher)) plot(table(dat.raw$Peer.Reviewed)) plot(table(dat.raw$Policy.Has.Instructions.how.to.share.data)) hist(as.numeric(dat.raw$Impact.Factor)) plot(as.numeric(dat.raw$Impact.Factor), as.numeric(dat.raw$Cited.Half.life)) plot(as.numeric(dat.raw$Impact.Factor), as.numeric(dat.raw$X5.Year.Impact.Factor)) abline(lm(as.numeric(dat.raw$X5.Year.Impact.Factor) ~ as.numeric(dat.raw$Impact.Factor))) lm(as.numeric(dat.raw$X5.Year.Impact.Factor) ~ as.numeric(dat.raw$Impact.Factor)) cor(as.numeric(dat.raw$X5.Year.Impact.Factor), as.numeric(dat.raw$Impact.Factor), use="complete.obs") cor.test(as.numeric(dat.raw$X5.Year.Impact.Factor), as.numeric(dat.raw$Impact.Factor), use="complete.obs")

Let me know how it goes! Then we can start figuring out what stats you really want to calculate and how to do that.