User:Carl Boettiger/Notebook/Comparative Phylogenetics/2010/08/26
![]() |
![]() ![]() ![]() |
Likelihood Ratio Paper: Power in Trees
Power in Trees tests (Results from single OU simulation per alpha, run yesterday)2000 replicates gives a cleaner picture of the critical alpha, though with low alpha values the estimates of p are quite variable. Also more variable on the smaller trees. Even arbitrarily large alpha aren't significant on the 5-taxa primate tree. On the small geospiza tree (13 taxa) from the geiger package we find: <syntaxhighlight lang="rsplus"> 0.5735 0.29 0.704 0.6465 0.993 0.838 0.8605 0.9355 0.782 0.845 0.93 "0.1" "0.2" "0.3" "0.4" "0.5" "0.6" "0.7" "0.8" "0.9" "1" "2" 0.945 0.9985 0.958 0.7725 0.9995 0.982 0.949 0.992 1 0.99 0.999 "3" "4" "5" "6" "7" "8" "9" "10" "20" "30" "40" 0.9895 "50" </syntaxhighlight> P values on top row, alpha values in quotes below the p value. Notice 9 fails to be significant at 0.95% confidence level, though might as easily be somewhere around 6. For the Labrid tree with 114 taxa the pattern is clearer though not perfect: <syntaxhighlight lang="rsplus"> 0.621 0.502 0.9735 0.66 0.5185 0.8075 0.9975 0.9665 0.907 0.997 0.9945 "0.1" "0.2" "0.3" "0.4" "0.5" "0.6" "0.7" "0.8" "0.9" "1" "2" 1 1 1 1 1 1 1 1 1 1 1 "3" "4" "5" "6" "7" "8" "9" "10" "20" "30" "40" 1 "50" </syntaxhighlight> Significance clearly sets in between alpha of 0.9 and 1. Summary: <syntaxhighlight lang="rsplus"> > birds$min_alpha [1] 10 13 taxa > primates$min_alpha [1] 1000 (5 taxa) > carnivora$min_alpha [1] 1 260 taxa > labrids$min_alpha [1] 1 114 taxa > carex$min_alpha [1] 2 53 taxa > </syntaxhighlight> reducing the noise in the power estimationI've just implemented the power test for the smallest alpha that can be detected at the 95% level using the (whatarewecallingitnow) likelihood ratio monte carlo approach on a given tree. I try simulating under successively higher alpha values and then see if the resulting dataset can be distinguished. In my first implementation, this involves only one simulation from the OU model at a given alpha, and then the N bootstrap simulations from the null brownian model per alpha. Obviously I want more than one simulation from the OU process. It seems I could (1) simulate under the OU model of a given alpha, (2) fit the BM model to the resulting data, (3) simulate under that BM model to get the "bootstrap" data, for which (4) I refit the BM and OU models and compare the likelihood ratio of these refit models. That sounds pretty convoluted when I describe it that way, but I think this is the "natural" thing to do? i.e. steps 2-4 are the bootstrap process which I normally repeat N times for a single dataset, now instead of using a single dataset I draw a dataset from the distribution....
New approach: multiple simulations from alphaPlots show decreasing power with decreasing alpha![]() ![]() ![]()
## Draw a data sample from the distribution produced by test data <- simulate(test, n=nboot) ## fit the null models to these data -- so we simulate from the appropriate null nulls <- lapply(1:nboot, function(i){ update(null, datai) }) tests <- lapply(1:nboot, function(i){ update(test, datai) }) t0_dist <- sapply(1:nboot, function(i){ if(is(testsi, "hansentree")){ t0 <- -2*(nullsi@loglik - testsi@loglik) } else { t0 <- -2*(nullsi@loglik - testsi$loglik) } })
t <- sapply(1:nboot, function(i){ refit_null <- update(null, Xi) refit_test <- update(test, Xi) if(is(refit_test, "hansentree")){ out <- -2*(refit_null@loglik - refit_test@loglik) } else { out <- -2*(refit_null@loglik - refit_test$loglik) } out }) plot(density(out$t), col='blue', lwd=3,xlim=range(c(out$t0_dist,out$t)) ) lines(density(out$t0_dist), col='red', lwd=3)
![]() From these it is possible to estimate the power (see powertest.R code)
![]()
Meanwhile
Slideshow of results<html> <div style="width:480px;text-align:right;"><embed width="480" height="360" src="http://static.pbsrc.com/flash/rss_slideshow.swf" flashvars="rssFeed=http%3A%2F%2Ffeed1212.photobucket.com%2Falbums%2Fcc458%2Fcboettig%2Ffeed.rss" type="application/x-shockwave-flash" wmode="transparent" /><a href="http://photobucket.com/redirect/album?showShareLB=1" target="_blank"><img src="http://pic.pbsrc.com/share/icons/embed/btn_geturs.gif" style="border:none;" /></a><a href="http://s1212.photobucket.com/albums/cc458/cboettig/" target="_blank"><img src="http://pic.pbsrc.com/share/icons/embed/btn_viewall.gif" style="border:none;" /></a></div> </html> Code updates--Carl Boettiger 03:26, 27 August 2010 (EDT)
Updated power curves for treesUsing the faster approach avoiding re-sampling the null distribution for each alpha, I generate power curves for trees of different sizes and complexity. The larger, richer trees show more power. |