Moore Notes 10 30 13

From OpenWetWare
Jump to navigationJump to search

Group Call

  • Participants: Katie, Jonathan, Tom, Patrick, Guillaume, Stephen, Ladan
  • PICRUSTs analysis (Tom and Guillaume)
    • English Channel metagenomes and 16S, both annotated to KEGG Orthology Groups
      • For PICRUSTs used GreenGenes 16S tree
      • JE: What genomes were used? This is under the hood, probably same as PICRUSTs manuscript and not appropriate for marine
    • Plots:
      • Weak correlation using abundance estimates
      • Many families with zero or low abundance in 16S/PICRUSTs but high abundance in shotgun metagenomes
      • Large Hamming distances (>0.7) on presence/absence estimates
    • What is different from Jack Gilbert's analysis?
      • Figfams vs. KEGG
      • Similar but not same algorithm as PICRUSTs
  • JE will ping Tara Oceans folks again re: data release
  • Stephen: Average genome size estimation project
    • Taxon specific? Possible maybe with taxon specific markers
    • JE: Compare to normalizing to number of reads hitting a single copy protein coding gene (e.g., recA)
      • recA is not one of Dongying's markers used in this analysis, because it is diverse between bacteria and archaea
    • Applications:
      • Protein family abundance estimate normalization
      • Ecological differences in average genome size, e.g., IBD significantly different from healthy in MetaHIT
    • Estimates between different libraries on the same sample don't agree as well as one would hope (different lanes of same prep and different simulation runs do agree)
      • KP: Could this be due to duplicate sequences? Probably not, but can check with fastqc
  • Next call: Nov 13