Simulations of Metagenomic Data
Major overhaul (15 April 2009): I reorganized the wiki pages on simulations of metagenomic data; all related pages should be linked to from this page.
-- Sam Riesenfeld.
Motivation
It appears that there is no obvious choice for a method of constructing protein family phylogenies from metagenomic data. (See the discussion on phylogenetic methods.) We hope to shed some light on this issue by creating some simple simulated data sets and then testing different methods (existing and under development) on the simulated data sets.
Discussion
As we set out to do these simulations, we discussed what parameters we would like to be able to tweak, what software we might use, and related issues.
- The discussion has been moved to a new page: Discussion on simulating metagenomic data.
The pipeline
Sam implemented the full pipeline.
- See the simulation pipeline for a high-level description of the steps in the pipeline. The page also has links to scripts and examples, in case you want to run it yourself.
Simulated data sets
See the simulation pipeline web page and my iSEEM page for more information on available simulated data.
To do (as of Nov. 2009)
See the list of action items for the simulations and analysis