Moore Notes 4 3 13

From OpenWetWare
Jump to navigationJump to search

Group Call

  • Participants: Katie, Tom, Guillaume, Dongying, Josh, Stephen
  • Marine niche mapping paper published in ISME Journal: http://www.nature.com/ismej/journal/vaop/ncurrent/abs/ismej201337a.html
    • Does the new data downloaded from CAMERA have a broad enough geographic/environmental distribution to do protein family niche modeling?
    • Josh & Guillaume will look into this
    • Tom: what data QC do we need to do?
    • Prioritizing data sets to process (at least until we can go over to Google)
  • Tom: MRC
    • Ready to process data
    • Infrastructural bottlenecks
      • 24 hours per Illumina data set if at full capacity on the QB3 cluster
      • Shared NFS mount between cluster and data server needs to be modified to allow more simultaneous hits
      • Other compute resources? Merlot, Google, new IHG cluster, OSU (next Fall, tentative)
    • Dongying: What method is being used for read search?
      • Default: RAPSEARCH
      • Others implemented: BLAST, LASTALL, HMMSEARCH, HMMSCAN
  • SFams updating
    • Guillaume will update repo
    • Tom will launch job
    • Tim L working on QC and functional annotation of families
    • Next: clan definitions
    • Dongying: How do I search only a subset of families? e.g.,
      • With >5 members
      • High quality
  • Katie: Janelia conference
    • Sean Eddy's idea about information content of profile models
      • Entropy scaling (default = 6) may be inappropriate for short reads
      • Stephen: fragthresh parameter handles query as fragment if <50% of target model, but does penalize if >50% (solution: set parameter to 1.0)
    • Tandy Warnow's SEPP
      • Divide-and-conquer approach to speed up pplacer and similar methods
      • Applications
        • Taxonomy identification of short reads (TIPP)
        • Building huge alignments (UPP)
    • Richard Durbin's suggestion to think about Ferragina-Manzini (FM) index to speed up MRC
      • Based on Burrough's-Wheeler transform, related to suffix arrays
      • Previously used for aligning short reads to genomes (BWA, Bowtie)
      • Recently applied to assembly (extension of SGA assembler, really sped it up)
  • Federico Lauro's Indian Ocean cruise
    • Ask Jonathan next time