Moore Notes 1 22 14

From OpenWetWare
Jump to navigationJump to search


  • Tom, Jonathan, Stephen, Guillaume, Josh, Dongying, Sarah, Patrick

Stephen's update on MicrobeCensus

  • Estimating average genome size from metagenomic data and application to human microbiome
  • Avg. Genome Size (AGS) = expected size of genome in microbial community, an abundance weighted average of size across all taxa
    • says nothing about distribution, and doesn't account for extrachromo DNA (plasmids)
  • Important for comparative analyses between two communities. Genome size could bias metrics.
    • probability of sampling a given read with a given copy number will be affected by genome size
  • Understanding the relationship between genome size and the environment may reveal how the environment shapes microbial evolution
  • Estimating average genome size using a set of single copy, universally distributed gene families in Bacteria and Archaea (Dongying)
    • If you align metagenomic reads to these families, the rate of alignment will be inversely proportional to genome size
      • JAE: these genes tend to be clustered in the genome, so they may not be independent samples, may want to account for this. Regional biases and sequencing biases could be problematic
  • The Method:
    • Using RAPsearch to align metagenomic reads to these genes.
    • Apply classification parameters to determine if read is homology of family
    • Calculate the rate that reads are assigned to these reads
    • Take a weighted estimate across these 30 genes.
      • Weights: calculated via simulation, proportional to the accuracy of the family in predicting genome size
    • JL: This seems similar to mark and recapture analysis in ecology. Might check that out.
  • Classification parameters are specific to read length
  • MicrobeCensus performs well compared to GAAS, which uses a database of reference genomes to predict size
    • MC is robust to situations when organisms are in your community are not present in your dataset
    • MC is also substantially faster b/c RAPsearch, only 30 genome markers, don't need all reads (accurate estimates at 1-5 M reads)
  • Also see good performance on real metagenomes (isolate genome sequencing projects)
  • Consistent measurements of AGS in replicate
  • AGS varies across across HMP microbiome body sites
    • JAE: if you have a lot of reads from, say, viruses, would that make it look like genome size is large?
      • SN: Yes, but I've looked into this and it doesn't seem to be a major factor
    • AGS is larger in gut compared to mouth, for example
  • Built a linear model to test potential sources of change in AGS.
    • Used MetaPhLAn to calculate lineage specific abundance
    • Average genome size by different taxa
    • Suggests that differences in abundance between communities drives the observed differences between sites
  • Also found that human gut AGS varies between clinical studies
    • Seems that Bacteroides may drive most of the differences
    • May be some evidence that there are differences in AGS with clinical parameters
      • Looking to see if host phenotype can explain any of the residuals after accounting for taxonomic variation
  • Looked at a database of reference genomes of human microbiome body sites
    • Do we see big differences in genome size between closely related taxa between different sites relative to two closely related taxa from the same site?
      • Pairs of taxa within the gut have a wide range of genome size relative to the intergut-other site differences
  • Even within a body site, there may be pressure to functionally specialize
    • Concern from group that lumping communities into high-scale types might fail to resolve microniche variation (e.g., difference between foot and forehead could be huge). Recommend looking at specific subsites.
  • Does AGS affect our analysis of functional differences between communities?
    • Looked at functional markers between gut and mouth (KOs, modules, pathways) with and without genome size normalization.
      • prenormalization has enrichment in mouth, postnormalization has enrichment in gut
    • How does this affect biomarker detection?
      • no discovery at FDR corrected p-values, but possibly an enrichment