Moore Notes 12 5 12
From OpenWetWare
Jump to navigationJump to search
Group Call
- Participants: Stephen, Guillaume, Katie, Jonathan, Josh
- Regular call time for next quarter: 10am Wednesdays
- Data from MMI
- Ian Joint from AMT cruises (ian.joint@gmail.com)
- Working on contact at Tara (could be hard)
- Do we want to be connected with MEGX? (maybe Nicole) - YES
- GBMF investigators announced
- Guillaume is working on automatic updating code
- Will be ready to run when Tom is back in Jan
- Update from Stephen
- Recall is low for SFams with large family size
- Common in genomes
- Also more diverse and more connected in family network
- LAST performs well and is fast (using non-stringent parameters)
- Expanded simulation study to simulate 100 reads from every SFam to get family level performance statistics
- About 12% of SFams have 0% recall (when doing leave-one-out analysis, better if perfect match is there)
- For some due to strong similarity with a nearby SFam
- For some due to low homology within SFam
- Multiple best hits (ties) is probably playing a role
- Similarity of consensus sequences does not predict misclassification of reads
- Looking at all-v-all BLAST results to see if nearest neighbor sequence is a similar distance for sequences in own vs. other SFam
- Will look at TIGRFAMs to see if results are specific to SFams or are also found with other HMM-based family dbs
- Recall is low for SFams with large family size
- JE: if/when we build new families, we should think about what genomes go into the pipeline
- Should we include fragments from metagenomes
- What is the seed set?
- GBMF investigators might have some single-cell genomes and other new genomes that are not in dbs
- Metaproteomics data?
- Update from Josh
- Working on code to do beta-diversity niche mapping
- Here predicting community similarity as a function of distance between samples
- Can make maps, e.g.,
- Classify regions into ecological types (eco-regions)
- Visualize community turnover in space
- Well-developed methods for macroorganisms can be applied to microorganisms pretty directly
- Could do cross-validation on marine macroorganism data (from grid cells where we have metagenomics data) to test the idea
- Model selection is computationally intensive - modifying to get it working on the cluster