Moore Notes 4 7 10
From OpenWetWare
				
				
				Jump to navigationJump to search
				
				
Group Call
- Steve will check with PLoS about our paper
 
- Update from Steve
- GOS analysis
- Guillaume and Dongying produced new marker gene families for GOS
- Now have good archaeal markers (old ones were mostly hitting bacteria)
 
 - Steve will run PD analyses on GOS with these markers
 - Can now search, align, mask other data sets
 
 - Guillaume and Dongying produced new marker gene families for GOS
 - Does phylogenetic distance in tree predict a pair of genome's similarity in terms of 16S copy number?
- KP: take a look at Matt Hahn's CAFE program
 - Can adjust relative abundance estimates to account for copy numbers
 - JE: would be interesting to do a phyla level analysis of normalized vs. non-normalized vs. protein marker genes (single copy)
- Did this on HOT/ALOHA and human distal gut, and saw a big difference in HOT/ALOHA (Prochlorococcus goes from rare to abundant)
 - Will do GOS, shotgun (better) and PCR
 
 - JE: relative abundance could also be adjusted for genome size (since genome size may affect probability of getting a read in 16S)
- AD: could estimate genome size by proportion of reads hitting genome assemblies (compare single vs. multiple copy genes)
 - SK: recent ISME paper about correcting for genome size (on citeulike): http://www.citeulike.org/group/6072/article/6912281
 
 - JL: Is Jenna still thinking about estimating abundance? JE: paper is in press, could use her data to test the corrections
- KP: Good idea to test the method on this or some other gold standard data set or simulation
 
 
 
 - GOS analysis
 
- Update from Guillaume
- Generating new protein family HMMs
- HMM search of Dongying's 100 families (universal or big, but not markers per se) vs. IMG
 - BLAST all vs. all on all sequences that didn't hit a family
 - MCL on these BLAST hits to cluster them into families
 - Just finished 20 day process of generating these results
- 350,000 families (6,500 with >75 members)
 - MCL may need to be fine-tuned
 
 - Next: will build HMMs for these new protein families that we can use in various analyses
- Will search back against data sets to see if the new HMMs look robust
 
 - Morgan will look at clusters that don't hit any previously described family (or in PFAM but no known function)
 - Tom will subdivide existing families into subfamilies, looking for novel subfamilies in metagenomic data
- This subfamily approach could be applied to Morgan's families also
 
 - Goal: data freeze in the next month or so
 
 
 - Generating new protein family HMMs
 
- Dongying: What deep coverage data sets are out there besides GOS
- JE: Bejing's human gut data is available in the short read archive at NCBI (maybe BioTorrents - coming soon), much on their website
 
 
- Steve: Bejing gut paper assembled Illumina reads into contigs - what do we think about that?
- TS: used Sanger scaffolds to do some testing of the process
 - JE: checked some against known assemblies
 - AD: the Bejing assembler has not been documented
- Variable coverage breaks most assemblers
 
 - JE: generally not a good idea to use assembled reads for metagenomic analyses (because they generate hybrids)
- OK to leverage contigs/alignments in population level analyses
 - Tile reads vs. a reference assembly if you want to compare them to each other
 - Best not to use the consensus for downstream
 
 - JE: first Illumina based study that looks good - Illumina might be more promising for metagenomics than originally thought
- Uses much less DNA (~1 microgram) than 454 (20-40 micrograms, though this may be coming down)
 - Need to check diversity measures vs. other data sets
 - This genome-guided approach get better as more human microbiome genomes are sequenced
 
 - Data here http://gutmeta.genomics.org.cn/