Moore Notes 12 15 10
From OpenWetWare
Jump to navigationJump to search
Group Call
- Call logistics
- Will use all calls for research topics
- Announce to the group if you want to lead a discussion on a particular project
- PIs will grab the time slot if needed, but we'll stop doing strict every other week (any week can be research group call)
- Taxa area relationships for niche modeling - James
- Code up and running
- Not much pattern at coarser resolution (phylum)
- Will contact Josh about finer taxonomic resolution data (genus)
- Tom has maxEnt up and running on linux machines
- Can run the analyses in parallel on the cluster now
- Opens up computational validations (e.g., jackknife analysis)
- Will start writing if validation looks good
- MaxEnt has another set of tuning parameters that are being investigated and can be used if needed
- Protein families - Guillaume (performance plots)
- Tried mapping seed sequences back to the families they were used to build (1 month compute)
- Recall and precision are perfect for most families, but there are long tails
- Only used top hit, second best hit might be the right one and a decent hit
- Sam: check how far off next best hit is
- Katie: could they actually belong in the other family and MCL put them in the wrong cluster?
- Tom: still a good validation for the well-differentiated families (when MCL clustering was clear)
- Analysis of "bad" families (imperfect recall and precision) - Tom
- Can we grow the good families by adding in some of the sequences from bad families using hmmsearch
- Or by growing MCL clusters?
- Redo de novo MCL clustering of bad sequences only
- Dongying: hmmsearch (in HMMR3, not HMMR2) is local search only
- unless you increase the coverage threshold, so MCL or BLAST is better
- can draw a cutoff on E-value cutoff on inside versus outside cluster hits
- Katie: Could you use the non-seed sequences from big families? Guillaume - no, all were used as seeds for most families
- What is next?
- Modify AMPHORA2 code to work on a this database
- In silico validations?
- Read classification for metagenomic libraries
- Qin et al microbiome
- GOS (versus the peptide predictions)
- HOT/ALOHA depth series - relative differences, common families would probably work
- Might need to deal with genome size based bias
- there's a correction available, but might not be generalizable
- 16S copy number metric could be used perhaps
- Could build trees for some families (e.g., with pplacer) and then do a PD analysis rather than abudance