Moore Notes 12 5 12

From OpenWetWare

Jump to navigation Jump to search

Group Call

Participants: Stephen, Guillaume, Katie, Jonathan, Josh

Regular call time for next quarter: 10am Wednesdays

Data from MMI
- Ian Joint from AMT cruises (ian.joint@gmail.com)
- Working on contact at Tara (could be hard)
- Do we want to be connected with MEGX? (maybe Nicole) - YES

GBMF investigators announced

Guillaume is working on automatic updating code
- Will be ready to run when Tom is back in Jan

Update from Stephen
- Recall is low for SFams with large family size
  - Common in genomes
  - Also more diverse and more connected in family network
- LAST performs well and is fast (using non-stringent parameters)
  - Expanded simulation study to simulate 100 reads from every SFam to get family level performance statistics
  - About 12% of SFams have 0% recall (when doing leave-one-out analysis, better if perfect match is there)
    - For some due to strong similarity with a nearby SFam
    - For some due to low homology within SFam
  - Multiple best hits (ties) is probably playing a role
  - Similarity of consensus sequences does not predict misclassification of reads
  - Looking at all-v-all BLAST results to see if nearest neighbor sequence is a similar distance for sequences in own vs. other SFam
- Will look at TIGRFAMs to see if results are specific to SFams or are also found with other HMM-based family dbs

JE: if/when we build new families, we should think about what genomes go into the pipeline
- Should we include fragments from metagenomes
- What is the seed set?
- GBMF investigators might have some single-cell genomes and other new genomes that are not in dbs
- Metaproteomics data?

Update from Josh
- Working on code to do beta-diversity niche mapping
- Here predicting community similarity as a function of distance between samples
- Can make maps, e.g.,
  - Classify regions into ecological types (eco-regions)
  - Visualize community turnover in space
- Well-developed methods for macroorganisms can be applied to microorganisms pretty directly
- Could do cross-validation on marine macroorganism data (from grid cells where we have metagenomics data) to test the idea
- Model selection is computationally intensive - modifying to get it working on the cluster

Retrieved from "https://openwetware.org/mediawiki/index.php?title=Moore_Notes_12_5_12&oldid=990136"

Navigation menu