Phylogenetic Methods

From OpenWetWare
Jump to navigationJump to search

Major overhaul (15 April 2009): I reorganized the wiki pages on phylogenetic methods, started a new page just for the discussion, and broke up the discussion page into several pages.
-- Sam Riesenfeld

Probing and discussing

There are a lot of different ways to try to build phylogenies from metagenomic data, and it's completely unclear which are preferable. We have talked with quite a few people about this to get ideas and find out what people are already doing.

Different categories of approaches

Initially, we saw two main categories of approaches to building trees from metagenomic data:

  1. Partial sequences get processed independently (the number of query sequences equals the number of trees to build). These trees may then get combined in some way.
  2. Partial sequences are pooled and processed together to make one tree.

Later, another option surfaced:

  1. Partial sequences are processed iteratively.

Which methods we are currently testing

  • Steve is testing full maximum likelihood inference on a dataset with reference and query sequences using RAxML. See his pipeline.
  • Steve is testing the algorithm for placing short reads on a reference phylogeny that is implemeneted in the latest version of RAxML.
  • Sam is working on developing an iterative method that alternates placing a read on the tree (hopefully, using Erick Matsen's pplacer) and running maximum likelihood (probably with RAxML) on part of the tree. See her notes for more details.

How will we test methods?

  • Steve has run his pipeline on the DeLong data set. His results were problematic because he hit very few reads per gene. He is planning to use his pipeline on the simulated data.
  • Sam has developed a pipeline for producing simulated metagenomic sets that can be used in testing. See also the main page on simulating metagenomic data.