Major overhaul (15 April 2009): I reorganized the wiki pages on simulations of metagenomic data; all related pages should be linked to from this page.
It appears that there is no obvious choice for a method of constructing protein family phylogenies from metagenomic data. (See the discussion on phylogenetic methods.) We hope to shed some light on this issue by creating some simple simulated data sets and then testing different methods (existing and under development) on the simulated data sets.


As we set out to do these simulations, we discussed what parameters we would like to be able to tweak, what software we might use, and related issues.

The pipeline

Sam implemented the full pipeline.

  • See the simulation pipeline for a high-level description of the steps in the pipeline. The page also has links to scripts and examples, in case you want to run it yourself.

Simulated data sets

See the simulation pipeline web page and my iSEEM page for more information on available simulated data.

To do (as of Nov. 2009)

See the list of action items for the simulations and analysis