User:Carl Boettiger/Notebook/Comparative Phylogenetics/2010/05/05

From OpenWetWare
Jump to navigationJump to search
Owwnotebook icon.png Comparative Phylogenetics Report.pngMain project page
Resultset previous.pngPrevious entry      Next entryResultset next.png


  • Experimenting with distributing code and data effectively. Pluses and minuses of three options:
  1. zipped file with R code and data. Requires setting working directory, harder to update. Subsequent uploads must include all the data, rather than incremental.
  2. Directly grab repository over github. Simple to update and access, stores only incremental changes. Requires knowledge of git
  3. R package: Nicely integrates data, code, and documentation. Requires installing R packages.
  • Learning the R packaging system for data. You can have .R files in the data/ directory of a package. The call to the data() function runs those files from that directory, making it unnecessary to specify the path to the data and allowing the use of complicated functions, see the example from the ape package:

<syntaxhighlight lang ="rsplus"> require(ape, quietly = TRUE, save = FALSE) bird.orders <- read.tree("bird.orders.tre") </syntaxhighlight>

Primate Phylogeny

Carl Boettiger 00:37, 30 April 2010 (EDT):

  • Simon sent me the tree data from their paper:
  1. Chatterjee HJ, Ho SY, Barnes I, and Groves C. Estimating the phylogeny and divergence times of primates using a supermatrix approach. BMC Evol Biol. 2009 Oct 27;9:259. DOI:10.1186/1471-2148-9-259 | PubMed ID:19860891 | HubMed [primates]
  • This is the first calibrated ultrametric tree I've been able to couple to the primate data Daniel provided. I've preformed a quick ancestral state reconstruction on log brain weight (warmer colors are more massive).


R code to create figure

<syntaxhighlight lang="rsplus"> library(ape) library(geiger) genera_set <-"mt.genera.ucld.trees") nuc_set <-"nuc.ucld.trees") genera <- genera_set1 nuc <- nuc_set1 nuc_names <- nuc$tip.label genera_names <- genera$tip.label traits <- read.csv("Primate_brain_comparisons.csv")

dups <- duplicated(traits'Genus') trait_names <- as.character(traits$Genus[!dups]) x <- traits$log_brain.weight[!dups] names(x) <- trait_names

compare <- treedata(nuc, x) plot(compare$phy) out <- ace(compare$data, compare$phy)

  1. a similar function but for the continuous ancestral state

plot_cts_ancestor <- function(phy, data, ancestral){ plot(phy) # just to get treelength treelen <- max(axisPhylo() ) plot(compare$phy, cex=1, x.lim=1.3*treelen, edge.width=2) mycol <- function(x){ tmp = (x - min(data)) /max(x-min(data)) rgb(blue = 1-tmp, green=0, red = tmp ) } nodelabels(pch=19, col=mycol(ancestral$ace), cex=1.5 ) tiplabels(pch=19, col=mycol(data), cex=1.5, adj=0.5) # add tip color code } plot_cts_ancestor(compare$phy, compare$data, out) </syntaxhighlight>

  • This is essentially just at the proof of concept level. A couple obvious steps:
  1. This uses a single tree from the posterior distribution of trees, which we'll eventually want to average over in the parameter estimation. Meanwhile should use a consensus MLE tree instead.
  2. Trees are resolved to genus level, so traits should be genus averages (instead of subsamples).
  3. Ancestral state reconstruction under the Brownian Motion model is the most preliminary treatment, will certainly be interesting to consider the various multi-peak OU models and identify potential transitions between peaks, as well as power analysis for the model fits.
  4. Multidimensional trait inference should also be interesting
  • Possible bug? The Homo/Pan and Gorilla/Pan nodes would be expected to be in the Gorilla/Pan colour rather than the Homo colour.

Code and data on Github

New github repository for sharing code and data: sandbox

  • Grab this repository using git:
 git clone git://
  • Revert to this edit's version of code and data using git:
 git checkout 77af6613db294ca068cf
  • Carl Boettiger 16:17, 5 May 2010 (EDT):
  • Updated git version using R package is:
 git checkout b1a90ad97b90d9cbeaa7