Moore Notes 4 7 10

From OpenWetWare

Jump to navigation Jump to search

Group Call

Steve will check with PLoS about our paper

Update from Steve
- GOS analysis
  - Guillaume and Dongying produced new marker gene families for GOS
    - Now have good archaeal markers (old ones were mostly hitting bacteria)
  - Steve will run PD analyses on GOS with these markers
  - Can now search, align, mask other data sets
- Does phylogenetic distance in tree predict a pair of genome's similarity in terms of 16S copy number?
  - KP: take a look at Matt Hahn's CAFE program
  - Can adjust relative abundance estimates to account for copy numbers
  - JE: would be interesting to do a phyla level analysis of normalized vs. non-normalized vs. protein marker genes (single copy)
    - Did this on HOT/ALOHA and human distal gut, and saw a big difference in HOT/ALOHA (Prochlorococcus goes from rare to abundant)
    - Will do GOS, shotgun (better) and PCR
  - JE: relative abundance could also be adjusted for genome size (since genome size may affect probability of getting a read in 16S)
    - AD: could estimate genome size by proportion of reads hitting genome assemblies (compare single vs. multiple copy genes)
    - SK: recent ISME paper about correcting for genome size (on citeulike): http://www.citeulike.org/group/6072/article/6912281
  - JL: Is Jenna still thinking about estimating abundance? JE: paper is in press, could use her data to test the corrections
    - KP: Good idea to test the method on this or some other gold standard data set or simulation

Update from Guillaume
- Generating new protein family HMMs
  - HMM search of Dongying's 100 families (universal or big, but not markers per se) vs. IMG
  - BLAST all vs. all on all sequences that didn't hit a family
  - MCL on these BLAST hits to cluster them into families
  - Just finished 20 day process of generating these results
    - 350,000 families (6,500 with >75 members)
    - MCL may need to be fine-tuned
  - Next: will build HMMs for these new protein families that we can use in various analyses
    - Will search back against data sets to see if the new HMMs look robust
  - Morgan will look at clusters that don't hit any previously described family (or in PFAM but no known function)
  - Tom will subdivide existing families into subfamilies, looking for novel subfamilies in metagenomic data
    - This subfamily approach could be applied to Morgan's families also
  - Goal: data freeze in the next month or so

Dongying: What deep coverage data sets are out there besides GOS
- JE: Bejing's human gut data is available in the short read archive at NCBI (maybe BioTorrents - coming soon), much on their website

Steve: Bejing gut paper assembled Illumina reads into contigs - what do we think about that?
- TS: used Sanger scaffolds to do some testing of the process
- JE: checked some against known assemblies
- AD: the Bejing assembler has not been documented
  - Variable coverage breaks most assemblers
- JE: generally not a good idea to use assembled reads for metagenomic analyses (because they generate hybrids)
  - OK to leverage contigs/alignments in population level analyses
  - Tile reads vs. a reference assembly if you want to compare them to each other
  - Best not to use the consensus for downstream
- JE: first Illumina based study that looks good - Illumina might be more promising for metagenomics than originally thought
  - Uses much less DNA (~1 microgram) than 454 (20-40 micrograms, though this may be coming down)
  - Need to check diversity measures vs. other data sets
  - This genome-guided approach get better as more human microbiome genomes are sequenced
- Data here http://gutmeta.genomics.org.cn/

Retrieved from "https://openwetware.org/mediawiki/index.php?title=Moore_Notes_4_7_10&oldid=990831"

Navigation menu