Moore Notes 10 30 13
From OpenWetWare
Jump to navigationJump to search
Group Call
- Participants: Katie, Jonathan, Tom, Patrick, Guillaume, Stephen, Ladan
- PICRUSTs analysis (Tom and Guillaume)
- English Channel metagenomes and 16S, both annotated to KEGG Orthology Groups
- For PICRUSTs used GreenGenes 16S tree
- JE: What genomes were used? This is under the hood, probably same as PICRUSTs manuscript and not appropriate for marine
- Plots: http://edhar.genomecenter.ucdavis.edu/~gjospin/picrust_test/picrust_test/
- Weak correlation using abundance estimates
- Many families with zero or low abundance in 16S/PICRUSTs but high abundance in shotgun metagenomes
- Large Hamming distances (>0.7) on presence/absence estimates
- What is different from Jack Gilbert's analysis?
- Figfams vs. KEGG
- Similar but not same algorithm as PICRUSTs
- English Channel metagenomes and 16S, both annotated to KEGG Orthology Groups
- JE will ping Tara Oceans folks again re: data release
- Stephen: Average genome size estimation project
- Taxon specific? Possible maybe with taxon specific markers
- JE: Compare to normalizing to number of reads hitting a single copy protein coding gene (e.g., recA)
- recA is not one of Dongying's markers used in this analysis, because it is diverse between bacteria and archaea
- Applications:
- Protein family abundance estimate normalization
- Ecological differences in average genome size, e.g., IBD significantly different from healthy in MetaHIT
- Estimates between different libraries on the same sample don't agree as well as one would hope (different lanes of same prep and different simulation runs do agree)
- KP: Could this be due to duplicate sequences? Probably not, but can check with fastqc
- Next call: Nov 13