Moore Notes 12 5 12

From OpenWetWare
Jump to navigationJump to search

Group Call

  • Participants: Stephen, Guillaume, Katie, Jonathan, Josh
  • Regular call time for next quarter: 10am Wednesdays
  • Data from MMI
    • Ian Joint from AMT cruises (ian.joint@gmail.com)
    • Working on contact at Tara (could be hard)
    • Do we want to be connected with MEGX? (maybe Nicole) - YES
  • GBMF investigators announced
  • Guillaume is working on automatic updating code
    • Will be ready to run when Tom is back in Jan
  • Update from Stephen
    • Recall is low for SFams with large family size
      • Common in genomes
      • Also more diverse and more connected in family network
    • LAST performs well and is fast (using non-stringent parameters)
      • Expanded simulation study to simulate 100 reads from every SFam to get family level performance statistics
      • About 12% of SFams have 0% recall (when doing leave-one-out analysis, better if perfect match is there)
        • For some due to strong similarity with a nearby SFam
        • For some due to low homology within SFam
      • Multiple best hits (ties) is probably playing a role
      • Similarity of consensus sequences does not predict misclassification of reads
      • Looking at all-v-all BLAST results to see if nearest neighbor sequence is a similar distance for sequences in own vs. other SFam
    • Will look at TIGRFAMs to see if results are specific to SFams or are also found with other HMM-based family dbs
  • JE: if/when we build new families, we should think about what genomes go into the pipeline
    • Should we include fragments from metagenomes
    • What is the seed set?
    • GBMF investigators might have some single-cell genomes and other new genomes that are not in dbs
    • Metaproteomics data?
  • Update from Josh
    • Working on code to do beta-diversity niche mapping
    • Here predicting community similarity as a function of distance between samples
    • Can make maps, e.g.,
      • Classify regions into ecological types (eco-regions)
      • Visualize community turnover in space
    • Well-developed methods for macroorganisms can be applied to microorganisms pretty directly
    • Could do cross-validation on marine macroorganism data (from grid cells where we have metagenomics data) to test the idea
    • Model selection is computationally intensive - modifying to get it working on the cluster