User:Carl Boettiger/Notebook/Comparative Phylogenetics/2010/10/12
Comparative Phylogenetics  Main project page Previous entry Next entry 
a day of writingFiguring out the best way to present the critique of AIC, introduction to the method, introduction to the power test. Seeking inspiration from Huelsenbeck 1996 and digging farther back into the literature:
Interesting question on RsigphyloMy reply: A few responses inline below, hope they're helpful. In general averaging over trees to estimate parameters sounds great, a couple hurdles. On Tue, Oct 12, 2010 at 5:08 PM, Lara Poplarski <larapoplarski@gmail.com> wrote: Dear List, I am new to R, and hope someone can kindly help with the following task. I have a Bayesian sample of trees in nexus format and discrete data; example trees and data are at the bottom of the email. I would like to use fitDiscrete in geiger to estimate parameter lambda for all variables. The idea is to then compare the distributions across trees of lambda values for the different variables. I have been using the following: for(i in 1:5) { cat("t",i,"\n"); fd<fitDiscrete(treesi,data,treeTransform="lambda") print(fd) } A few specific questions: 1. Since trees in my sample are nonultrametric, I get the following warning: Warning: some tree transformations in GEIGER might not be sensible for nonultrametric trees. Is the lambda transformation sensible for nonultrametric trees? I could not find further information on this in the manual or in the Pagel references.
2. In some cases, I get the following warning: [1] "Warning: may not have converged to a proper solution." Does it make sense to get fitDiscrete to repeat the ML estimation for these cases, until convergence? Can someone suggest how to modify the loop above to do this?
In leiu of this, you'll want to note and exclude estimates that do not achieve convergence.
3. Finally, I would be grateful for pointers on how to tackle the output. For example, can someone suggest how to go about calculating the mean lambda value, across trees, for each of the 4 variables?
You want to create a matrix where each row represents a different tree, and each column a different trait value. <syntaxhighlight lang="rsplus"> lambda < sapply(1:n_trees, # Apply what follows to all trees function( i ){ sapply(1:n_traits, # Apply what follows for each trait function( j ){ fd < fitDiscrete(treei, data[, j ], treeT="lambda" ) # fitDiscrete for tree i, trait j fd1$treeParam }) } ) </syntaxhighlight> Then just calculate the average / variance / histogram / etc over the column.
I also have a more general question. Both the tree sample and dataset are relative large (1000 trees * 80 variables), so based on preliminary runs I expect the analysis to take rather long. Any suggestions on how to speed things up and/or on a different approach to the tree sample/multiple variable setup would be most welcome! Good point. Since the order you do the trees in doesn't matter, an sapply function can be used instead of a for loop, it is generally much faster. If you have a multicore machine, take a look at the snowfall package in R, which will let you run your sapply loop in parallel automatically. Sounds like it will take a few days to run them all.
Many thanks in advance,
