Moore Notes 1 22 14
From OpenWetWare
				
				
				Jump to navigationJump to search
				
				
Participants
- Tom, Jonathan, Stephen, Guillaume, Josh, Dongying, Sarah, Patrick
 
Stephen's update on MicrobeCensus
- Estimating average genome size from metagenomic data and application to human microbiome
 - Avg. Genome Size (AGS) = expected size of genome in microbial community, an abundance weighted average of size across all taxa
- says nothing about distribution, and doesn't account for extrachromo DNA (plasmids)
 
 - Important for comparative analyses between two communities. Genome size could bias metrics.
- probability of sampling a given read with a given copy number will be affected by genome size
 
 - Understanding the relationship between genome size and the environment may reveal how the environment shapes microbial evolution
 - Estimating average genome size using a set of single copy, universally distributed gene families in Bacteria and Archaea (Dongying)
- If you align metagenomic reads to these families, the rate of alignment will be inversely proportional to genome size
- JAE: these genes tend to be clustered in the genome, so they may not be independent samples, may want to account for this. Regional biases and sequencing biases could be problematic
 
 
 - If you align metagenomic reads to these families, the rate of alignment will be inversely proportional to genome size
 - The Method:
- Using RAPsearch to align metagenomic reads to these genes.
 - Apply classification parameters to determine if read is homology of family
 - Calculate the rate that reads are assigned to these reads
 - Take a weighted estimate across these 30 genes.
- Weights: calculated via simulation, proportional to the accuracy of the family in predicting genome size
 
 - JL: This seems similar to mark and recapture analysis in ecology. Might check that out.
 
 - Classification parameters are specific to read length
 - MicrobeCensus performs well compared to GAAS, which uses a database of reference genomes to predict size
- MC is robust to situations when organisms are in your community are not present in your dataset
 - MC is also substantially faster b/c RAPsearch, only 30 genome markers, don't need all reads (accurate estimates at 1-5 M reads)
 
 - Also see good performance on real metagenomes (isolate genome sequencing projects)
 - Consistent measurements of AGS in replicate
 - AGS varies across across HMP microbiome body sites
- JAE: if you have a lot of reads from, say, viruses, would that make it look like genome size is large?
- SN: Yes, but I've looked into this and it doesn't seem to be a major factor
 
 - AGS is larger in gut compared to mouth, for example
 
 - JAE: if you have a lot of reads from, say, viruses, would that make it look like genome size is large?
 - Built a linear model to test potential sources of change in AGS.
- Used MetaPhLAn to calculate lineage specific abundance
 - Average genome size by different taxa
 - Suggests that differences in abundance between communities drives the observed differences between sites
 
 - Also found that human gut AGS varies between clinical studies
- Seems that Bacteroides may drive most of the differences
 - May be some evidence that there are differences in AGS with clinical parameters
- Looking to see if host phenotype can explain any of the residuals after accounting for taxonomic variation
 
 
 - Looked at a database of reference genomes of human microbiome body sites
- Do we see big differences in genome size between closely related taxa between different sites relative to two closely related taxa from the same site?
- Pairs of taxa within the gut have a wide range of genome size relative to the intergut-other site differences
 
 
 - Do we see big differences in genome size between closely related taxa between different sites relative to two closely related taxa from the same site?
 - Even within a body site, there may be pressure to functionally specialize
- Concern from group that lumping communities into high-scale types might fail to resolve microniche variation (e.g., difference between foot and forehead could be huge). Recommend looking at specific subsites.
 
 - Does AGS affect our analysis of functional differences between communities?
- Looked at functional markers between gut and mouth (KOs, modules, pathways) with and without genome size normalization.
- prenormalization has enrichment in mouth, postnormalization has enrichment in gut
 
 - How does this affect biomarker detection?
- no discovery at FDR corrected p-values, but possibly an enrichment
 
 
 - Looked at functional markers between gut and mouth (KOs, modules, pathways) with and without genome size normalization.