Moore Notes 1 16 13

Group Call

  • Participants: Jonathan, Katie, Dongying, Guillaume, Stephen, Josh
  • Guillaume: updating SFams
    • Downloading new genomes (>2000, about 5000 total)
    • Doing sifting
    • All-vs-all LAST for MCL of new sequences next
      • Stephen: LAST designed for "best hit" searches
      • Maybe the m-parameter will produce a distance matrix that is too sparse or not symmetric
      • Katie: Compare results to BLAST for sequences that we had previously
  • Stephen: SFam clans
    • Currently use consensus sequence (regardless of family size)
    • Another distance might be better
      • PRC is a good profile-vs-profile method, but very slow
      • SCOOP looks like a good option
      • Dongying: Could compute multiple consensus sequences per family for larger families
  • Stephen: classification thresholds
    • Min scoring true positive, max scoring false positive
    • Best if these don't overlap, but sometimes they do
    • In PFAMs, the false positives above the min true positive threshold are all from a related family (by definition)
    • Stephen investigated this in SFams (5%), PFAMs (6%), TIGRFAMs (3%)
    • Round 2 genomes analysis shows that these problematic families are very common
    • Family-specific thresholds might be a good idea, though makes it more complicated
  • Stephen: hmmsearch performance
    • More sensitive than BLAST for full-length sequences
    • But metagenomics data (i.e., short sequences) is better classified using BLAST/LAST
      • Not specific to SFams
      • Particularly true for large families
    • Need to increase read length to find inflection point
    • Sequencing error is not a major factor
  • Need a better and update web presence
    • Jonathan: Word Press has website/blogging software
      • Open, self-install version (.org) needs to be hosted on a server (e.g., dreamhost)
      • Commercial version (.com) is hosted, free for a little storage, $20/year for more
    • Will have point there
    • Keep notes and private pages on OWW for now
  • Dongying: Statistical question
    • How to compare clusters?