Moore Notes 1 22 14
From OpenWetWare
Jump to navigationJump to search
Participants
- Tom, Jonathan, Stephen, Guillaume, Josh, Dongying, Sarah, Patrick
Stephen's update on MicrobeCensus
- Estimating average genome size from metagenomic data and application to human microbiome
- Avg. Genome Size (AGS) = expected size of genome in microbial community, an abundance weighted average of size across all taxa
- says nothing about distribution, and doesn't account for extrachromo DNA (plasmids)
- Important for comparative analyses between two communities. Genome size could bias metrics.
- probability of sampling a given read with a given copy number will be affected by genome size
- Understanding the relationship between genome size and the environment may reveal how the environment shapes microbial evolution
- Estimating average genome size using a set of single copy, universally distributed gene families in Bacteria and Archaea (Dongying)
- If you align metagenomic reads to these families, the rate of alignment will be inversely proportional to genome size
- JAE: these genes tend to be clustered in the genome, so they may not be independent samples, may want to account for this. Regional biases and sequencing biases could be problematic
- If you align metagenomic reads to these families, the rate of alignment will be inversely proportional to genome size
- The Method:
- Using RAPsearch to align metagenomic reads to these genes.
- Apply classification parameters to determine if read is homology of family
- Calculate the rate that reads are assigned to these reads
- Take a weighted estimate across these 30 genes.
- Weights: calculated via simulation, proportional to the accuracy of the family in predicting genome size
- JL: This seems similar to mark and recapture analysis in ecology. Might check that out.
- Classification parameters are specific to read length
- MicrobeCensus performs well compared to GAAS, which uses a database of reference genomes to predict size
- MC is robust to situations when organisms are in your community are not present in your dataset
- MC is also substantially faster b/c RAPsearch, only 30 genome markers, don't need all reads (accurate estimates at 1-5 M reads)
- Also see good performance on real metagenomes (isolate genome sequencing projects)
- Consistent measurements of AGS in replicate
- AGS varies across across HMP microbiome body sites
- JAE: if you have a lot of reads from, say, viruses, would that make it look like genome size is large?
- SN: Yes, but I've looked into this and it doesn't seem to be a major factor
- AGS is larger in gut compared to mouth, for example
- JAE: if you have a lot of reads from, say, viruses, would that make it look like genome size is large?
- Built a linear model to test potential sources of change in AGS.
- Used MetaPhLAn to calculate lineage specific abundance
- Average genome size by different taxa
- Suggests that differences in abundance between communities drives the observed differences between sites
- Also found that human gut AGS varies between clinical studies
- Seems that Bacteroides may drive most of the differences
- May be some evidence that there are differences in AGS with clinical parameters
- Looking to see if host phenotype can explain any of the residuals after accounting for taxonomic variation
- Looked at a database of reference genomes of human microbiome body sites
- Do we see big differences in genome size between closely related taxa between different sites relative to two closely related taxa from the same site?
- Pairs of taxa within the gut have a wide range of genome size relative to the intergut-other site differences
- Do we see big differences in genome size between closely related taxa between different sites relative to two closely related taxa from the same site?
- Even within a body site, there may be pressure to functionally specialize
- Concern from group that lumping communities into high-scale types might fail to resolve microniche variation (e.g., difference between foot and forehead could be huge). Recommend looking at specific subsites.
- Does AGS affect our analysis of functional differences between communities?
- Looked at functional markers between gut and mouth (KOs, modules, pathways) with and without genome size normalization.
- prenormalization has enrichment in mouth, postnormalization has enrichment in gut
- How does this affect biomarker detection?
- no discovery at FDR corrected p-values, but possibly an enrichment
- Looked at functional markers between gut and mouth (KOs, modules, pathways) with and without genome size normalization.