Moore Notes 5 17 11

Group Call

  • Open access
  • Protein families in GOS
    • Sam: protein family phylogenetics simulation project
      • read length, reference Db, phylogenetic method
      • rpoB, lolC, 16S - rpoB and 16S are very similar
      • unifrac ROC curves on enterotypes (weighted more sensitive to tree error)
      • could run more proteins, e.g., one or two from each of Steve's clusters
    • Steve: built trees and estimated taxonomic composition for each AMPHORA gene (plus 16S) separately
      • compared patterns of PD across genes
      • different genes give different PD patterns across GOS samples - clustered into 6 groups
      • correlations with environment are different for the 6 groups
      • JE: are all of these still more similar to each other than other genes?
        • KP: scale would change, but negative correlation
      • JE: compare unifrac dissimilarity to other beta-diversity measures to make sure it isn't an issue unifrac per se
      • JG: how does relative abundance play into results? May depend on gene length
      • MEGAN (whole metagenome + pplacer) vs. AMPHORA approach - some taxa are in terms of very different in terms of presence/absence and 5-10% different in terms of relative abundance
        • should look at MEGAN with a subset of genes to eliminate this variable (vs. methods)
    • Overlap with AMPHORA2 paper
    • Syngergy between Sam and Steve's approaches
      • Sam will test if she can get more gene families running, plus pplacer in simulations
      • Can we quantify which gene families are better for phylogeny based analyses