From OpenWetWare
Jump to navigationJump to search

Measuring phylogenetic diversity using metagenomic data

--Steven Kembel

  • General question is how we can measure phylogenetic diversity using metagenomic data. General approach being used right now:
  1. Identify sequences from different gene families (using AMPHORA)
  2. Align metagenomic reads identified by AMPHORA to the reference sequence database of AMPHORA
  3. Build a tree from the combined metagenomic/reference sequences
    • Trying several different tree inference methods (ML, Bayesian, MinEv, etc.)
  4. Use the resulting phylogeny to estimate various measures of phylogenetic diversity and community structure

Project: Estimating phylogenetic diversity using AMPHORA marker genes

Run AMPHORA on metagenomic data set (HOT/ALOHA data set in this case)

Combine AMPHORA reference alignments with aligned metagenomic reads for each gene family

Build a phylogenetic tree from combined metagenomic reads and reference sequences

  • Two different approaches
  1. Build 31 separate trees (one for each gene family)
  2. Concatenate/tile sequences into a single large alignment/supermatrix and build one tree
  • Scripts to do this will be posted here.
  • Sample size issues
    • For individual gene family alignments, most genes have few sequences, making it hard to estimate phylogenetic diversity
    • The combined alignment with all 31 gene families is large (7381 sites, 1075 sequences) but can be analyzed in reasonable time using RAxML (~12-24 hours on 8 cores).
  • Then prune reference sequences, leaving just metagenomic reads on tree (but tree was built using reference sequences as a phylogenetic scaffold)
  • Seems to be working well, most mg reads get placed fairly close to some sequence in the reference tree. Phylogenies will be posted here.

Calculate measures of phylogenetic diversity, turnover, etc.

  • Total/mean branch length within/among samples
  • Look at whether samples close together in space/environment more phylogenetically similar
    • i.e. cluster samples based on community phylodiversity similarity
  • Look at whether alpha diversity/phylodiversity changes in space/environment
  • Compare taxonomic (16S OTU) and phylogenetic diversity
  • Compare taxonomic assignments (from AMPHORA) to taxonomic assignments from COGs/etc.
  • Identify nodes on tree over/underrepresented in different samples (model habitat evolution)