User:Carl Boettiger/Notebook/Comparative Phylogenetics/2010/03/22

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Comparative Phylogenetics
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Phylogenetic Tree formats

 * Consider a tree with N nodes. has $$ \frac{N+1}{2} $$ tips and $$ \frac{N-1}{2} $$ internal nodes.

ape Format: class "phylo"

 * Elements of the data structure:
 * phy$edge -- Topology. 2 by N-1 matrix.  First column has internal nodes only, each node appearing twice.  Internal nodes range from 1+(N+1)/2 to N-1, while the tip numbers go from 1 to (N+1)/2.  Each node appears once in the second column, and is a child of the node identified in the first column.

The order of the rows in the matrix doesn't matter. These are the essential components of a phylo object. Optional parameter includes a list of names for internal nodes:
 * phy$edge.length -- the N-1 length numeric. Lengths of the edges in the same order as phy$edge.  Optional for plotting.
 * phy$tip.label -- $$ \frac{N+1}{2} $$ length character vector. The index values correspond to the
 * phy$Nnode -- Number of internal nodes, equals $$ \frac{N-1}{2} $$
 * phy <- makeNodeLabel(phy) adds phy$node.label
 * phy$node.label

Ape format is very picky. Any tree topology is completely specified by listing the ancestor for each node. If the nodes are identified as 1:n, then the topology is an ordered vector of length n. This is the representation used by ouch. The node numbers can be arbitrary identifiers. Ape chooses a more restrictive representation. Nodes are assigned numbers such that the numbers 1 to (n+1)/2 are assigned exclusively to tips, and those after are assigned exclusively to internal nodes. Topology is specified by listing the ancestor of each node. The identity is stored in edge[,2] and the ancestor in edge[,1]. However, it appears that the root node cannot be assigned as n. Whichever is the ancestor will be missing from the node list on the right (edge[,2]), as it has no corresponding ancestor on the left. It seems the (n+1)/2 + 1 does best assigned to the root. Ouch2ape conversion could easily preserve the node-numbering used in the ape representation, but for some reason the function chooses to renumber them anyway. Reversing is harder, as not all applicable codings of a tree in the ouch representation meet the additional constraints of the ape tree.

ouch Format: class "ouchtree"

 * Topology specified in tree@ancestors
 * each node lists its ancestor
 * fitted models have class "hansentree" or "browntree" and contain parameter fitting details.


 * }