Moore Notes 12 15 10

Group Call

Call logistics
- Will use all calls for research topics
- Announce to the group if you want to lead a discussion on a particular project
- PIs will grab the time slot if needed, but we'll stop doing strict every other week (any week can be research group call)

Taxa area relationships for niche modeling - James
- Code up and running
- Not much pattern at coarser resolution (phylum)
- Will contact Josh about finer taxonomic resolution data (genus)

Tom has maxEnt up and running on linux machines
- Can run the analyses in parallel on the cluster now
- Opens up computational validations (e.g., jackknife analysis)
- Will start writing if validation looks good
- MaxEnt has another set of tuning parameters that are being investigated and can be used if needed

Protein families - Guillaume (performance plots)
- Tried mapping seed sequences back to the families they were used to build (1 month compute)
- Recall and precision are perfect for most families, but there are long tails
  - Only used top hit, second best hit might be the right one and a decent hit
  - Sam: check how far off next best hit is
  - Katie: could they actually belong in the other family and MCL put them in the wrong cluster?
  - Tom: still a good validation for the well-differentiated families (when MCL clustering was clear)
- Analysis of "bad" families (imperfect recall and precision) - Tom
  - Can we grow the good families by adding in some of the sequences from bad families using hmmsearch
  - Or by growing MCL clusters?
  - Redo de novo MCL clustering of bad sequences only
  - Dongying: hmmsearch (in HMMR3, not HMMR2) is local search only
    - unless you increase the coverage threshold, so MCL or BLAST is better
    - can draw a cutoff on E-value cutoff on inside versus outside cluster hits
  - Katie: Could you use the non-seed sequences from big families? Guillaume - no, all were used as seeds for most families
- What is next?
  - Modify AMPHORA2 code to work on a this database
  - In silico validations?
  - Read classification for metagenomic libraries
    - Qin et al microbiome
    - GOS (versus the peptide predictions)
    - HOT/ALOHA depth series - relative differences, common families would probably work
    - Might need to deal with genome size based bias
      - there's a correction available, but might not be generalizable
      - 16S copy number metric could be used perhaps
    - Could build trees for some families (e.g., with pplacer) and then do a PD analysis rather than abudance

Moore Notes 12 15 10

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools