DataONE:Notebook/Data Citation and Sharing Policy/2010/07/27

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Project name
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Cleaner Analysis

 * Nic Weber 14:13, 27 July 2010 (EDT): glm with relevel to talk about with Heather:

> filename = "/Users/nicholasweber/Desktop/JournalData1.csv" > mydata = read.csv(filename) > ImFa = Impact.Factor > ImFa[ImFa==0] = NA > hist(ImFa) > summary(ImFa) Min. 1st Qu. Median   Mean 3rd Qu. Max. NA's  0.064   1.000   1.578   2.132   2.762  16.690   6.000 > SomeOA = ifelse(Subscription.Model == "Sub", 0, 1) > table(SomeOA) SomeOA 0  1 236  71 > Afil = ifelse(Affiliation.Code > 0, 1, 0) > table(Afil) Afil 0  1 148 158 > table(PubCode) PubCode other elsevier springer   wiley 149      52       58       48 > PubCode = relevel(PubCode, ref="other") > is.EnvSci = rep(0, length(ISI.Category)) > is.EnvSci[grep("*Environmental Sciences*", ISI.Category)] = 1 > table(is.EnvSci) is.EnvSci 0  1 143 164 > is.Eco = rep(0, length(ISI.Category)) > is.Eco[grep("*Ecology*", ISI.Category)] = 1 > table(is.Eco) is.Eco 0  1 181 126 > is.EvoBio = rep(0, length(ISI.Category)) > is.EvoBio[grep("*Evolutionary Biology*", ISI.Category)] =1 > table(is.EvoBio) is.EvoBio 0  1 267  40 > > > > mylogit = glm(requests~log(ImFa)+ Afil+ PubCode+ is.Eco+ is.EnvSci+ is.EvoBio, family=binomial(link="logit"), na.action=na.omit) ## log creates even distribution for IF > summary(mylogit)

Call: glm(formula = requests ~ log(ImFa) + Afil + PubCode + is.Eco +    is.EnvSci + is.EvoBio, family = binomial(link = "logit"),     na.action = na.omit)

Deviance Residuals: Min      1Q   Median       3Q      Max -1.0326 -0.5140  -0.3973  -0.3177   2.5392

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)     -2.8726     0.7481  -3.840 0.000123 *** log(ImFa)        0.2252     0.2738   0.822 0.410910 Afil             0.4374     0.4100   1.067 0.285966 PubCodeelsevier  1.4796     0.4888   3.027 0.002472 ** PubCodespringer -0.3144     0.6995  -0.450 0.653050 PubCodewiley     0.9403     0.5256   1.789 0.073634. is.Eco          -0.2317     0.5807  -0.399 0.689973 is.EnvSci        0.2959     0.6399   0.462 0.643746 is.EvoBio       -0.1708     0.7475  -0.228 0.819299 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 224.11 on 299  degrees of freedom Residual deviance: 205.36 on 291  degrees of freedom (7 observations deleted due to missingness) AIC: 223.36

Number of Fisher Scoring iterations: 5

> confint(mylogit) Waiting for profiling to be done... 2.5 %    97.5 % (Intercept)     -4.3924248 -1.4488793 log(ImFa)      -0.3014913  0.7736730 Afil           -0.3572635  1.2583180 PubCodeelsevier 0.5290583  2.4588049 PubCodespringer -1.8708292 0.9588558 PubCodewiley   -0.1174621  1.9669483 is.Eco         -1.4100069  0.8765251 is.EnvSci      -0.9617093  1.5543530 is.EvoBio      -1.7375732  1.2369173 > exp(mylogit$coefficients) (Intercept)      log(ImFa)            Afil PubCodeelsevier PubCodespringer    PubCodewiley          is.Eco       is.EnvSci 0.05654958     1.25255157      1.54872058      4.39102729      0.73020193      2.56077481      0.79322294      1.34440052       is.EvoBio 0.84301868 > exp(confint(mylogit)) Waiting for profiling to be done... 2.5 %    97.5 % (Intercept)     0.01237070  0.2348333 log(ImFa)      0.73971425  2.1677138 Afil           0.69958816  3.5194967 PubCodeelsevier 1.69733323 11.6908318 PubCodespringer 0.15399592 2.6087098 PubCodewiley   0.88917420  7.1488273 is.Eco         0.24414160  2.4025365 is.EnvSci      0.38223896  4.7320241 is.EvoBio      0.17594686  3.4449772


 * Nic Weber 13:37, 27 July 2010 (EDT):I am going to attempt to clean up some of the last post. The code that I've just uploaded is embedded below, The changes include a summary for the Impact Factor category, and I called out the categories from the PubCode column in my dataset for individual tables. I neglected to properly include all of the publisher categories in the last code, so the stats have also changed. I will include those below as well:




 * This code then obviously changed my stats, and the Other category is no longer statitically significant. I will include the Other category (coded as OthPub) just for comparison.
 * The Coefficients including the P values for Impact Factor and Society Affiliation:

Estimate Std. Error z value Pr(>|z|) (Intercept) -4.55278   1.23451  -3.688 0.000226 *** log(ImFa)   1.04003    0.29007   3.585 0.000337 *** Afil        1.06761    0.46302   2.306 0.021125 * OthPub      1.60747    1.09173   1.472 0.140913 Estimate Std. Error z value Pr(>|z|) (Intercept) -4.55278   1.23451  -3.688 0.000226 *** log(ImFa)   1.04003    0.29007   3.585 0.000337 *** Afil        1.06761    0.46302   2.306 0.021125 * OthPub      1.60747    1.09173   1.472 0.140913 (Intercept)  log(ImFa)      S        Afil           OthPub 0.01053787 2.82930675   2.90840677      4.99014645       2.5 %     97.5 % log(ImFa)   1.6484164883  5.1777868 Afil       1.2128683481  7.5593332 OthPub     0.8527096723 95.7104652
 * Confidence Int. for Coefficients:
 * Exponents:
 * Exp Conf Int.


 * Full Stats:

> filename = "/Users/nicholasweber/Desktop/JournalData1.csv" > mydata = read.csv(filename) > ImFa = Impact.Factor > ImFa[ImFa==0] = NA > hist(ImFa) > summary(ImFa) Min. 1st Qu. Median   Mean 3rd Qu. Max. NA's  0.064   1.000   1.578   2.132   2.762  16.690   6.000 > SomeOA = ifelse(Subscription.Model == "Sub", 0, 1) > table(SomeOA) SomeOA 0  1 223  84 > Afil = ifelse(Affiliation.Code > 0, 1, 0)] # Society Affiliation Error: unexpected ']' in "Afil = ifelse(Affiliation.Code > 0, 1, 0)]" > table(Afil) Afil 0  1 148 158 > is.EnvSci = rep(0, length(ISI.Category)) > is.EnvSci[grep("*Environmental Sciences*", ISI.Category)] = 1 > table(is.EnvSci) is.EnvSci 0  1 143 164 > is.Eco = rep(0, length(ISI.Category)) > is.Eco[grep("*Ecology*", ISI.Category)] = 1 > table(is.Eco) is.Eco 0  1 181 126 > is.EvoBio = rep(0, length(ISI.Category)) > is.EvoBio[grep("*Evolutionary Biology*", ISI.Category)] =1 > table(is.EvoBio) is.EvoBio 0  1 267  40 > > Springer = rep(0, length(PubCode)) > Springer [grep("*springer*", PubCode)] =1 > table(Springer) Springer 0  1 249  58 > Elsevier = rep(0, length(PubCode)) > Elsevier [grep("*elsevier*", PubCode)] =1 > table(Elsevier)Wiley Error: unexpected symbol in "table(Elsevier)Wiley" > Wiley = rep(0, length(PubCode)) > Wiley [grep("*wiley*", PubCode)] =1 > table(Wiley) Wiley 0  1 259  48 > OthPub = rep(0, length(PubCode)) > OthPub [grep("*other*", PubCode)] =1 > table(OthPub) #Includes all other publishers from dataset OthPub 0  1 182 125 > > mylogit = glm(requests~log(ImFa) + SomeOA+ Afil+ Elsevier+ Springer+ Wiley+ OthPub+ is.Eco+ is.EnvSci + is.EvoBio, family=binomial(link="logit"), na.action=na.omit) ## log creates even distribution for IF > summary(mylogit)

Call: glm(formula = requests ~ log(ImFa) + SomeOA + Afil + Elsevier +    Springer + Wiley + OthPub + is.Eco + is.EnvSci + is.EvoBio,     family = binomial(link = "logit"), na.action = na.omit)

Deviance Residuals: Min      1Q   Median       3Q      Max -1.5987 -0.5199  -0.3057  -0.1653   2.9759

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.55278   1.23451  -3.688 0.000226 *** log(ImFa)   1.04003    0.29007   3.585 0.000337 *** SomeOA     -0.02429    0.43966  -0.055 0.955949 Afil        1.06761    0.46302   2.306 0.021125 * Elsevier    0.07862    1.20986   0.065 0.948188 Springer   -0.54932    1.45679  -0.377 0.706117 Wiley       1.19822    1.14467   1.047 0.295199 OthPub      1.60747    1.09173   1.472 0.140913 is.Eco     -0.33555    0.57484  -0.584 0.559403 is.EnvSci   0.46216    0.65644   0.704 0.481416 is.EvoBio   0.75327    0.67710   1.112 0.265925 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 224.11 on 299  degrees of freedom Residual deviance: 179.30 on 289  degrees of freedom (7 observations deleted due to missingness) AIC: 201.3

Number of Fisher Scoring iterations: 6

> confint(mylogit) Waiting for profiling to be done... 2.5 %    97.5 % (Intercept) -7.6716071 -2.4547973 log(ImFa)   0.4998151  1.6443777 SomeOA     -0.9051808  0.8301939 Afil        0.1929881  2.0227830 Elsevier   -2.1005325  3.1452976 Springer   -3.8359538  2.7374242 Wiley      -0.7306740  4.2048215 OthPub     -0.1593362  4.5613276 is.Eco     -1.4946319  0.7684235 is.EnvSci  -0.8165620  1.7687779 is.EvoBio  -0.5939997  2.0820360 > exp(mylogit$coefficients) (Intercept)  log(ImFa)      SomeOA        Afil    Elsevier    Springer       Wiley      OthPub      is.Eco   is.EnvSci   is.EvoBio 0.01053787 2.82930675  0.97600675  2.90840677  1.08179389  0.57734165  3.31422494  4.99014645  0.71494624  1.58749158  2.12394266 > exp(confint(mylogit)) # conf int for exp Waiting for profiling to be done... 2.5 %    97.5 % (Intercept) 0.0004658685  0.0858806 log(ImFa)  1.6484164883  5.1777868 SomeOA     0.4044687314  2.2937636 Afil       1.2128683481  7.5593332 Elsevier   0.1223912330 23.2265858 Springer   0.0215807450 15.4471456 Wiley      0.4815843005 67.0086336 OthPub     0.8527096723 95.7104652 is.Eco     0.2243311619  2.1563641 is.EnvSci  0.4419484746  5.8636828 is.EvoBio  0.5521145557  8.0207830


 * Nic Weber 12:16, 27 July 2010 (EDT):Below contains mistakes -- As of 11:15 am cst I am cleaning up the code for Publishers... Another post to follow

(posted at apx 10:30 cst) Today I have cleaned up some of my code from yesterday and updated my public dataset to reflect the changes in columns Publisher Code, and some cleaning in the Subscription Model column.


 * To begin, I ran the following code:



Coefficients: Estimate Std. Error z value Pr(>|z|) log(ImFa)       1.04003    0.29007   3.585 0.000337 *** Afil            1.06761    0.46302   2.306 0.021125 * PubCodeother    1.52884    0.71422   2.141 0.032309 * 2.5 %    97.5 % log(ImFa)              0.4998151  1.6443777 Afil                        0.1929881  2.0227830 PubCodeother    0.2325047  3.1117025 log(ImFa)                Afil                      PubCodeother 2.82930675           2.90840677      4.61284400                      2.5 %     97.5 % log(ImFa)       1.648416488  5.1777868 Afil           1.212868348  7.5593332 PubCodeother   1.261756334 22.4592496
 * This gave me the following significant results (Full Results from R below)
 * For P Values of Impact Factor, Society Affiliation, and all publishers other than Wiley, Springer, Elsevier and Taylor Francis Ltd.
 * With confidence intervals of :
 * And exp of :
 * With exp confidence intervals of:


 * Below are the full Results for context

> summary(mylogit)

Call: glm(formula = requests ~ log(ImFa) + SomeOA + Afil + PubCode +    is.Eco + is.EnvSci + is.EvoBio, family = binomial(link = "logit"),     na.action = na.omit)

Deviance Residuals: Min      1Q   Median       3Q      Max -1.5987 -0.5199  -0.3057  -0.1653   2.9759

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)    -4.47416    0.94554  -4.732 2.22e-06 *** log(ImFa)       1.04003    0.29007   3.585 0.000337 *** SomeOA         -0.02429    0.43966  -0.055 0.955949 Afil            1.06761    0.46302   2.306 0.021125 * PubCodeother    1.52884    0.71422   2.141 0.032309 * PubCodespringer -0.62794   1.19405  -0.526 0.598963 PubCodetaylor  -0.07862    1.20986  -0.065 0.948188 PubCodewiley    1.11960    0.76456   1.464 0.143093 is.Eco         -0.33555    0.57484  -0.584 0.559403 is.EnvSci       0.46216    0.65644   0.704 0.481416 is.EvoBio       0.75327    0.67710   1.112 0.265925 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 224.11 on 299  degrees of freedom Residual deviance: 179.30 on 289  degrees of freedom (7 observations deleted due to missingness) AIC: 201.3

Number of Fisher Scoring iterations: 6

> confint(mylogit) Waiting for profiling to be done... 2.5 %    97.5 % (Intercept)     -6.4901304 -2.7394727 log(ImFa)       0.4998151  1.6443777 SomeOA         -0.9051808  0.8301939 Afil            0.1929881  2.0227830 PubCodeother    0.2325047  3.1117025 PubCodespringer -3.6768695 1.5175314 PubCodetaylor  -3.1452976  2.1005325 PubCodewiley   -0.3119753  2.7709763 is.Eco         -1.4946319  0.7684235 is.EnvSci      -0.8165620  1.7687779 is.EvoBio      -0.5939997  2.0820360 > exp(mylogit$coefficients) (Intercept)      log(ImFa)          SomeOA            Afil    PubCodeother PubCodespringer   PubCodetaylor    PubCodewiley 0.01139980     2.82930675      0.97600675      2.90840677      4.61284400      0.53368914      0.92439051      3.06363807          is.Eco       is.EnvSci       is.EvoBio 0.71494624     1.58749158      2.12394266 > exp(confint(mylogit)) # conf int for exp Waiting for profiling to be done... 2.5 %    97.5 % (Intercept)     0.001518351  0.0646044 log(ImFa)      1.648416488  5.1777868 SomeOA         0.404468731  2.2937636 Afil           1.212868348  7.5593332 PubCodeother   1.261756334 22.4592496 PubCodespringer 0.025302058 4.5609519 PubCodetaylor  0.043054111  8.1705199 PubCodewiley   0.731999586 15.9742218 is.Eco         0.224331162  2.1563641 is.EnvSci      0.441948475  5.8636828 is.EvoBio      0.552114556  8.0207830


 * }