DataONE:Notebook/Summer 2010/2010/07/27

From OpenWetWare
Jump to navigationJump to search
Owwnotebook icon.png DataONE summer internships 2010 Report.pngMain project page
Resultset previous.pngPrevious entry      
Heather: Hi Nic!
Have you got a few minutes to chat stats, or when would be good?
10:52 AM 
me: Hi Heather,
now is good
Heather: cool
I love watching your evolving code, good stuff
A few ideas
10:53 AM 
one idea is not to worry right now about what is statistically significant or not
no need to call it out of your results
me: ok
Heather: you are still clearly getting the data flushed out and cleaned, your columns rejiggered.... all of this makes the results change
and really the results aren't to be trusted until all of that is done
10:54 AM 
me: yeah, I've spent a lot of time cleaning up columns this morning
Heather: and in some ways when you comment on it early it suggests that that there is something worth looking at, when really there isn't yet :)
what I mean is, there isn't anything worth looking at yet because it is all still transient
me: ok I understand
Heather: cool
10:55 AM 
ok, different topic: columns
and whether they are separate or together
I haven't explained "factors" to you yet, is that right?
me: thats right, we haven't gone over that
Heather: right. so let's do that a bit right now :)
me: ok
Heather: there are multiple datatypes for a column
10:56 AM 
and "category"
where category could be favourite colour
and if you could only have one favourite colour
then it would make sense to have a column called fav color
10:57 AM 
and within it it would say "Red" "Blue" "undecided" etc
these are also called nominal variables
and also called, in R, "factors"
where the levels of the factors are the distinct values it can take
make sense so far?
10:58 AM 
to more things about factors
one is that they can be ordered
me: yes
Heather: so for example a journal policy can be "weak" "medium" or "strong"
this isn't really an integrer 0, 1, 2
10:59 AM 
because it doesn't make sense to do math on it. strong isn't medium*2
but it is more than just a category, because it is ordered
so in R this is called an ordered factor
and when you have an ordered factor it can help to tell R that because then the stats can use that information
11:00 AM 
me: ok
Heather: to do that, you can see the command
or we can just talk about it when it is relevant :)
a different conversation about factors is when to put them in the same column, and when to make a bunch of different binary columns
if you allow people to have one fav colour, then you should just have one column
11:01 AM 
but if you let people have several fav colours, all of a sudden one column doens't work very well
and it works better to have muliple columns, that are all binary
so..... since a journal can have muliple ISI categories, each of the categories should have their own columns
11:02 AM 
but since they can only have one publisher, it makes most sense for the publisher to stay in a single column that has muliple factor "levels"
that helps to interpret the stats
I'll show you how to do that.
make sense as a concept?
11:03 AM 
me: yes I think so
Heather: any questions about it? you seem a bit unsure?
11:04 AM 
me: no I think I get it
Heather: ok.
so I think using your PubCode variable in the analysis directly woudl probably work, woudl it?
how many different values can it take?
11:05 AM 
me: four
other, elsevier, wiley, springer
Heather: and taylor? or not?
me: well I was finding that I had too many variables, so I collapsed taylor
into Other
11:06 AM 
Heather: gotcha
ok, so I think if you could rerun a glm including PubCode and post its results, I think we could go through them and I could show you how to interpret them.
11:07 AM 
one command that I've never used but I think would be helpful is relevel
it tells R which level to use as the basis, the reference level
I think your results would be most interpreable if that was "Other"
11:08 AM 
so I think (but am not sure) that the following code will work:
relevel(PubCode, ref="Other")
you'd put it right before the table(PubCode) command, before the glm call
11:09 AM 
let me know?
me: ok just one sec
11:13 AM 
ok, I just posted it
11:17 AM 
Heather: ok, so it does still have a taylor in it, is that right?
11:18 AM 
me: shoot I'm sorry
I called the wrong file in
11:19 AM 
Heather: also, it looks like this line has an error, an extra ] at the end?
> Afil = ifelse(Affiliation.Code > 0, 1, 0)] # Society Affiliation 
Error: unexpected ']' in "Afil = ifelse(Affiliation.Code > 0, 1, 0)]"
11:21 AM 
Nic, I think I made a mistake... I think you actually have to make it
PubCode = relevel(PubCode, ref="Other")
11:22 AM 
relevel(PubCode, ref="Other")
isn't enough....
it has to be assigned back to the PubCode variable
I'm learning too, clearly :)
me: ok
let me fix that and the Afil
11:23 AM 
Heather: sorry about not seeing it before. your results up on your OWW page helped me figure it out :)
7 minutes
11:30 AM 
me: It might take me a few more minutues, I don't know why but it keeps showing Taylor in PubCode
Heather: ok, no prob
12 minutes
11:42 AM 
me: Ok, I posted what I ran in OWW-- I think there is a problem somewhere though
11:43 AM
11:45 AM 
Heather: what makes ou think that?
11:46 AM 
the fact that there is no PubCodeother in the results is actually a good thing, in case that was it....
11:47 AM 
the reason that is true, is that "other" is used as the base case or the reference
so... to interpret these other factors,
using the "exp(confint(mylogit))" results
PubCodeelsevier 1.69733323 11.6908318
11:48 AM 
means that, compared to "other" publishers (= ones coded as other), journals published by elsevier are 1.7 to 11.7 times as likely to have a data sharing policy
PubCodespringer 0.15399592 2.6087098
11:49 AM 
means that being published by springer, a journal is between 0.15 and 2.6 times as likely to have a data sharing policy.
(since this goes from less than 1 to more than 1, it doesn't actually tell us anything interesting.... not coincidentally... the pvalue for PubCodespringer is large!)
make any sense?
11:50 AM 
me: yes
Heather: cool
ok, any quick questions before we zoom over to the group chat?
11:51 AM 
I'm going to ask you and Sarah both (Valerie wil lbe joining a bit later, hopefully) to give everyone a brief rundown
on what you've been doing and what your plans are.
that sound ok?
me: sure
no other questions right now
Heather: great!
11:52 AM 
you relatively comfortable with understanding the statistics you are running right now? on a scale of 0 to 10?
me: 6
Heather: nice. good.
11:53 AM 
me: I wouldn't say I totally get it, but it makes more sense when I re read our conversations
Heather: keep asking if there are things that you'd like to talk through some more.
yup, makes sense.
ok, off to hopefully try to join everyone in. wish us luck and strong connections!
me: ok