Moore Notes 8 21 13

Group Call

  • Participants: Katie, Jonathan, Guillaume, Josh, Ladan, Stephen, Sarah, Tom, Dongying
  • EMP sequencing
    • Re-requested sample list from Gilbert lab (Sarah Owen will send it soon)
      • Need to select samples to be complementary to Tera Oceans and Gulf of Mexico projects
      • E.g., fill in global coverage to enable a global map or pick a different ecosystem with interesting ecology
    • They are going to start doing rapid-seq on the Hi-seq machine, which will give 250PE
    • Stephen: can we get the insert short? Jonathan: Yes with a little variation in the insert length
    • Jonathan: showed BGI the Argon price list
  • SFams updating issues
    • See last week's notes for summary of the issues
    • Proposed fix:
      • Wipe round 2 results
      • Do a fast blast search before hmm search
    • How much time to invest in continuing to update the database?
      • How automated can we make this?
      • What about people who will come to depend on it?
      • Important to do it if we need it for our biogeographical studies
    • Any traction on getting some one else to take this on?
      • Jonathan: no one seems ready to take on a more rapid updating approach yet (need more money)
    • Stephen: lots of ideas for further improvements
      • Pangenome idea to ignore identical or similar genomes
      • Improving MCL step, or adding steps afterwards to filter, split, combine families
      • How to include metagenomic data in the build?
    • Tim will implement Stephen's solution to the duplicate sequence problem
      • Then reanalyze the round 2 genomes to see if this solves the problem
      • If two-step search is used in QC, then some other bad families might get better
      • Think about an automated system for handling bad families
      • Stephen: probably was also a problem between round 0 and round 1
      • Katie: Why is this happening?
        • Small families with a bogus member
        • Large families that are diverse and therefore have a low information content HMM
        • Sequence that HMM doesn't classify back is a bogus member (problem with MCL?)
        • Dongying: Connections can get added in inflation step of MCL
        • Ladan: What about using a lower inflation parameter? Tom: Trying to get larger clusters, also this didn't help much overall
      • Stephen: What fast blast tool do people recommend?
        • USearch and CDHit
        • Already looked at RAPSearch, LAST