Moore Notes 4 3 13
From OpenWetWare
Jump to navigationJump to search
Group Call
- Participants: Katie, Tom, Guillaume, Dongying, Josh, Stephen
- Marine niche mapping paper published in ISME Journal: http://www.nature.com/ismej/journal/vaop/ncurrent/abs/ismej201337a.html
- Does the new data downloaded from CAMERA have a broad enough geographic/environmental distribution to do protein family niche modeling?
- Josh & Guillaume will look into this
- Tom: what data QC do we need to do?
- Prioritizing data sets to process (at least until we can go over to Google)
- Tom: MRC
- Ready to process data
- Infrastructural bottlenecks
- 24 hours per Illumina data set if at full capacity on the QB3 cluster
- Shared NFS mount between cluster and data server needs to be modified to allow more simultaneous hits
- Other compute resources? Merlot, Google, new IHG cluster, OSU (next Fall, tentative)
- Dongying: What method is being used for read search?
- Default: RAPSEARCH
- Others implemented: BLAST, LASTALL, HMMSEARCH, HMMSCAN
- SFams updating
- Guillaume will update repo
- Tom will launch job
- Tim L working on QC and functional annotation of families
- Next: clan definitions
- Dongying: How do I search only a subset of families? e.g.,
- With >5 members
- High quality
- Katie: Janelia conference
- Sean Eddy's idea about information content of profile models
- Entropy scaling (default = 6) may be inappropriate for short reads
- Stephen: fragthresh parameter handles query as fragment if <50% of target model, but does penalize if >50% (solution: set parameter to 1.0)
- Tandy Warnow's SEPP
- Divide-and-conquer approach to speed up pplacer and similar methods
- Applications
- Taxonomy identification of short reads (TIPP)
- Building huge alignments (UPP)
- Richard Durbin's suggestion to think about Ferragina-Manzini (FM) index to speed up MRC
- Based on Burrough's-Wheeler transform, related to suffix arrays
- Previously used for aligning short reads to genomes (BWA, Bowtie)
- Recently applied to assembly (extension of SGA assembler, really sped it up)
- Sean Eddy's idea about information content of profile models
- Federico Lauro's Indian Ocean cruise
- Ask Jonathan next time