Moore Notes 4 7 10

From OpenWetWare
Jump to navigationJump to search

Group Call

  • Steve will check with PLoS about our paper
  • Update from Steve
    • GOS analysis
      • Guillaume and Dongying produced new marker gene families for GOS
        • Now have good archaeal markers (old ones were mostly hitting bacteria)
      • Steve will run PD analyses on GOS with these markers
      • Can now search, align, mask other data sets
    • Does phylogenetic distance in tree predict a pair of genome's similarity in terms of 16S copy number?
      • KP: take a look at Matt Hahn's CAFE program
      • Can adjust relative abundance estimates to account for copy numbers
      • JE: would be interesting to do a phyla level analysis of normalized vs. non-normalized vs. protein marker genes (single copy)
        • Did this on HOT/ALOHA and human distal gut, and saw a big difference in HOT/ALOHA (Prochlorococcus goes from rare to abundant)
        • Will do GOS, shotgun (better) and PCR
      • JE: relative abundance could also be adjusted for genome size (since genome size may affect probability of getting a read in 16S)
      • JL: Is Jenna still thinking about estimating abundance? JE: paper is in press, could use her data to test the corrections
        • KP: Good idea to test the method on this or some other gold standard data set or simulation
  • Update from Guillaume
    • Generating new protein family HMMs
      • HMM search of Dongying's 100 families (universal or big, but not markers per se) vs. IMG
      • BLAST all vs. all on all sequences that didn't hit a family
      • MCL on these BLAST hits to cluster them into families
      • Just finished 20 day process of generating these results
        • 350,000 families (6,500 with >75 members)
        • MCL may need to be fine-tuned
      • Next: will build HMMs for these new protein families that we can use in various analyses
        • Will search back against data sets to see if the new HMMs look robust
      • Morgan will look at clusters that don't hit any previously described family (or in PFAM but no known function)
      • Tom will subdivide existing families into subfamilies, looking for novel subfamilies in metagenomic data
        • This subfamily approach could be applied to Morgan's families also
      • Goal: data freeze in the next month or so
  • Dongying: What deep coverage data sets are out there besides GOS
    • JE: Bejing's human gut data is available in the short read archive at NCBI (maybe BioTorrents - coming soon), much on their website
  • Steve: Bejing gut paper assembled Illumina reads into contigs - what do we think about that?
    • TS: used Sanger scaffolds to do some testing of the process
    • JE: checked some against known assemblies
    • AD: the Bejing assembler has not been documented
      • Variable coverage breaks most assemblers
    • JE: generally not a good idea to use assembled reads for metagenomic analyses (because they generate hybrids)
      • OK to leverage contigs/alignments in population level analyses
      • Tile reads vs. a reference assembly if you want to compare them to each other
      • Best not to use the consensus for downstream
    • JE: first Illumina based study that looks good - Illumina might be more promising for metagenomics than originally thought
      • Uses much less DNA (~1 microgram) than 454 (20-40 micrograms, though this may be coming down)
      • Need to check diversity measures vs. other data sets
      • This genome-guided approach get better as more human microbiome genomes are sequenced
    • Data here http://gutmeta.genomics.org.cn/