Moore Notes 12 15 10

From OpenWetWare
Jump to navigationJump to search

Group Call

  • Call logistics
    • Will use all calls for research topics
    • Announce to the group if you want to lead a discussion on a particular project
    • PIs will grab the time slot if needed, but we'll stop doing strict every other week (any week can be research group call)
  • Taxa area relationships for niche modeling - James
    • Code up and running
    • Not much pattern at coarser resolution (phylum)
    • Will contact Josh about finer taxonomic resolution data (genus)
  • Tom has maxEnt up and running on linux machines
    • Can run the analyses in parallel on the cluster now
    • Opens up computational validations (e.g., jackknife analysis)
    • Will start writing if validation looks good
    • MaxEnt has another set of tuning parameters that are being investigated and can be used if needed
  • Protein families - Guillaume (performance plots)
    • Tried mapping seed sequences back to the families they were used to build (1 month compute)
    • Recall and precision are perfect for most families, but there are long tails
      • Only used top hit, second best hit might be the right one and a decent hit
      • Sam: check how far off next best hit is
      • Katie: could they actually belong in the other family and MCL put them in the wrong cluster?
      • Tom: still a good validation for the well-differentiated families (when MCL clustering was clear)
    • Analysis of "bad" families (imperfect recall and precision) - Tom
      • Can we grow the good families by adding in some of the sequences from bad families using hmmsearch
      • Or by growing MCL clusters?
      • Redo de novo MCL clustering of bad sequences only
      • Dongying: hmmsearch (in HMMR3, not HMMR2) is local search only
        • unless you increase the coverage threshold, so MCL or BLAST is better
        • can draw a cutoff on E-value cutoff on inside versus outside cluster hits
      • Katie: Could you use the non-seed sequences from big families? Guillaume - no, all were used as seeds for most families
    • What is next?
      • Modify AMPHORA2 code to work on a this database
      • In silico validations?
      • Read classification for metagenomic libraries
        • Qin et al microbiome
        • GOS (versus the peptide predictions)
        • HOT/ALOHA depth series - relative differences, common families would probably work
        • Might need to deal with genome size based bias
          • there's a correction available, but might not be generalizable
          • 16S copy number metric could be used perhaps
        • Could build trees for some families (e.g., with pplacer) and then do a PD analysis rather than abudance