DataONE:Notebook/Data Citation and Sharing Policy/2010/07/29

From OpenWetWare
Jump to navigationJump to search
Project name Main project page
Previous entry      
  • Nic Weber 18:05, 29 July 2010 (EDT): Started a rough draft of a paper for Data Science Journal. It's worth having a look at their statement to authors here
  • Nic Weber 11:23, 29 July 2010 (EDT):This morning I had a thought about journals that have more than one category ranking in JCR. My hypothesis was that journals ranked simultaneously in 2 of our main 3 categories (ecology, evobio and env sci) might be more likley to have a sharing plan...although this does not imply causation, I wanted to explore the idea.
  • I began first to look for the number of observations for these three overalps:
    • Ecology and Evolutionary biology:
> table(is.Eco*is.EvoBio, requests)
   requests
      0   1
  0 256  35
  1  12   4 

The number of observations is relatively small (n=16), and number of relevant sharing plans is 4.

    • Ecology and Environmental Sciences:
table(is.Eco*is.EnvSci, requests)
   requests
      0   1
  0 256  34
  1  12   5

Again, the number of observations is relatively small (n=17) and number of relevant sharing plans is 5.

    • Environmental Science and Evolutionary Biology had no observations of overlap in categories
  • I included the first two overlaps (eco+evo bio; eco+envsci) in my glm as the following code:

<html><script src="http://gist.github.com/498381.js?file=TooManyVariables"></script></html>

    • this gave me the following output:
>filename = "/Users/nicholasweber/Desktop/JournalDat.csv"
> mydata = read.csv(filename)
> attach(mydata)
> requests = ifelse(Policy.request...require.code > 0, 1, 0)

> ImFa = Impact.Factor
> ImFa[ImFa==0] = NA
> hist(ImFa)
> summary(ImFa)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.064   1.000   1.578   2.132   2.762  16.690   6.000 
> SomeOA = ifelse(Subscription.Model == "Sub", 0, 1)
> table(SomeOA)
SomeOA
  0   1 
236  71 
> Afil = ifelse(Affiliation.Code > 0, 1, 0) 
> table(Afil)
Afil
  0   1 
148 158 
> table(PubCode)
PubCode
   other elsevier springer   taylor    wiley 
     125       52       58       24       48 
> PubCode = relevel(PubCode, ref="other")
> is.EnvSci = rep(0, length(ISI.Category))
> is.EnvSci[grep("*Environmental Sciences*", ISI.Category)] = 1 
> table(is.EnvSci)
is.EnvSci
  0   1 
141 166 
> is.Eco = rep(0, length(ISI.Category))
> is.Eco[grep("*Ecology*", ISI.Category)] = 1
> table(is.Eco)
is.Eco
  0   1 
181 126 
> is.EvoBio = rep(0, length(ISI.Category))
> is.EvoBio[grep("*Evolutionary Biology*", ISI.Category)] =1
> table(is.EvoBio)
is.EvoBio
  0   1 
267  40 
> table(is.Eco*is.EnvSci)

  0   1 
290  17 
> table(is.Eco*is.EvoBio)

  0   1 
291  16 
> table(is.EnvSci*is.EvoBio)

  0 
307 
> 
> mylogit = glm(requests~log(ImFa)+ SomeOA+ Afil+ PubCode+ (is.Eco*is.EnvSci)+ (is.Eco*is.EvoBio), family=binomial(link="logit"), na.action=na.omit) ## log creates even distribution for IF
> summary(mylogit)

Call:
glm(formula = requests ~ log(ImFa) + SomeOA + Afil + PubCode + 
    (is.Eco * is.EnvSci) + (is.Eco * is.EvoBio), family = binomial(link = "logit"), 
    na.action = na.omit)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4691  -0.4919  -0.3130  -0.1729   2.8107  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -2.92229    1.00264  -2.915 0.003561 ** 
log(ImFa)         1.02171    0.30037   3.402 0.000670 ***
SomeOA            0.65486    0.43746   1.497 0.134405    
Afil              1.65102    0.46251   3.570 0.000357 ***
PubCodeelsevier  -0.08405    0.75494  -0.111 0.911353    
PubCodespringer  -0.97327    0.93352  -1.043 0.297144    
PubCodetaylor     0.98921    0.77208   1.281 0.200117    
PubCodewiley      0.18189    0.63355   0.287 0.774034    
is.Eco           -1.69855    1.07497  -1.580 0.114088    
is.EnvSci        -1.04489    0.91824  -1.138 0.255150    
is.EvoBio         0.22520    1.24949   0.180 0.856968    
is.Eco:is.EnvSci  2.61601    1.22591   2.134 0.032848 *  
is.Eco:is.EvoBio  1.30203    1.36129   0.956 0.338838    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 224.11  on 299  degrees of freedom
Residual deviance: 178.31  on 287  degrees of freedom
  (7 observations deleted due to missingness)
AIC: 204.31

Number of Fisher Scoring iterations: 6

> confint(mylogit)
Waiting for profiling to be done...
                       2.5 %     97.5 %
(Intercept)      -5.12964850 -1.0731688
log(ImFa)         0.46426212  1.6472070
SomeOA           -0.22233647  1.5055746
Afil              0.78959105  2.6216448
PubCodeelsevier  -1.59701468  1.4160320
PubCodespringer  -2.94562018  0.7494305
PubCodetaylor    -0.60483176  2.4815256
PubCodewiley     -1.09579839  1.4290479
is.Eco           -3.78086695  0.5806436
is.EnvSci        -2.74867053  1.0122159
is.EvoBio        -2.15442671  2.8428185
is.Eco:is.EnvSci  0.07283376  4.9809395
is.Eco:is.EvoBio -1.50034861  3.9289123
> exp(mylogit$coefficients)
     (Intercept)        log(ImFa)           SomeOA             Afil  PubCodeelsevier  PubCodespringer 
      0.05381005       2.77795076       1.92486449       5.21230825       0.91938633       0.37784705 
   PubCodetaylor     PubCodewiley           is.Eco        is.EnvSci        is.EvoBio is.Eco:is.EnvSci 
      2.68910224       1.19948729       0.18294913       0.35173014       1.25257649      13.68103454 
is.Eco:is.EvoBio 
      3.67674169 
> exp(confint(mylogit))
Waiting for profiling to be done...
                      2.5 %      97.5 %
(Intercept)      0.00591864   0.3419233
log(ImFa)        1.59083990   5.1924572
SomeOA           0.80064592   4.5067425
Afil             2.20249553  13.7583354
PubCodeelsevier  0.20250014   4.1207368
PubCodespringer  0.05256945   2.1157948
PubCodetaylor    0.54616631  11.9594954
PubCodewiley     0.33427262   4.1747226
is.Eco           0.02280291   1.7871883
is.EnvSci        0.06401291   2.7516917
is.EvoBio        0.11596966  17.1640736
is.Eco:is.EnvSci 1.07555172 145.6111127
is.Eco:is.EvoBio 0.22305239  50.8516369
  • Problems:
  1. Variables are now over 10
  2. While the p value and confidence interval for the coefficent Eco+Env Sci looks good, the exponent confidence interval is huge (1-145)