List of phylogenetic methods

From OpenWetWare
Jump to navigationJump to search

We are trying to compile here a list of methods, and ways to tweak methods, that we might want to compare using (at least) the simulated data.

Methods to create alignments

  • Metagenomic alignment (includes metagenomic reads identified and aligned with STAP or AMPHORA or similar)
  • Metagenomic alignment plus general reference alignment (i.e. the reference alignment from STAP or AMPHORA)
  • Metagenomic alignment plus general reference alignment plus targeted reference alignment
    • Steve said in an e-mail: "I was wondering if we could also think about trying a strategy similar to what is done in STAP: first BLAST problematic/short metagenomic reads to find some phylogenetic neighbors in a reference database and add those neighbors to the reference alignment, which could then be used to get better placement/BL estimates for those reads when building a tree?"
      • Sam: I'm a little bit confused about this idea; maybe we can discuss it during or before the iSEEM call.
  • We should try different criteria for including/excluding sequences (length, etc.), these will be informed by the simulations

Methods to calculate distances among sequences in an alignment

  • "Traditional" (i.e. pick your favorite - JC, HKY, etc.)
  • Pseudocounts to account for non-overlapping sequences (i.e. as implemented in FastTree)
  • Methods in Cheng et al. to account for non-overlapping/fragmented sequences? (Which ones? Have they been tried on metagenomic data or only EST data?)
    • Katie will meet with Todd Vision in NC during the first week of Feb to discuss their approaches.

Methods to infer trees from alignments or distance matrices

  • Neighbour joining
  • Maximum likelihood
  • Parsimony
  • Bayesian

Please feel free to add to and edit this list!

It would be helpful to know if people (especially Dongying and Martin) have any thoughts on this list, including opinions about which methods should definitely be tested or how the experiments should be performed.