Moore Notes 8 21 13
From OpenWetWare
Jump to navigationJump to search
Group Call
- Participants: Katie, Jonathan, Guillaume, Josh, Ladan, Stephen, Sarah, Tom, Dongying
- EMP sequencing
- Re-requested sample list from Gilbert lab (Sarah Owen will send it soon)
- Need to select samples to be complementary to Tera Oceans and Gulf of Mexico projects
- E.g., fill in global coverage to enable a global map or pick a different ecosystem with interesting ecology
- They are going to start doing rapid-seq on the Hi-seq machine, which will give 250PE
- Stephen: can we get the insert short? Jonathan: Yes with a little variation in the insert length
- Jonathan: showed BGI the Argon price list
- Re-requested sample list from Gilbert lab (Sarah Owen will send it soon)
- SFams updating issues
- See last week's notes for summary of the issues
- Proposed fix:
- Wipe round 2 results
- Do a fast blast search before hmm search
- How much time to invest in continuing to update the database?
- How automated can we make this?
- What about people who will come to depend on it?
- Important to do it if we need it for our biogeographical studies
- Any traction on getting some one else to take this on?
- Jonathan: no one seems ready to take on a more rapid updating approach yet (need more money)
- Stephen: lots of ideas for further improvements
- Pangenome idea to ignore identical or similar genomes
- Improving MCL step, or adding steps afterwards to filter, split, combine families
- How to include metagenomic data in the build?
- Tim will implement Stephen's solution to the duplicate sequence problem
- Then reanalyze the round 2 genomes to see if this solves the problem
- If two-step search is used in QC, then some other bad families might get better
- Think about an automated system for handling bad families
- Stephen: probably was also a problem between round 0 and round 1
- Katie: Why is this happening?
- Small families with a bogus member
- Large families that are diverse and therefore have a low information content HMM
- Sequence that HMM doesn't classify back is a bogus member (problem with MCL?)
- Dongying: Connections can get added in inflation step of MCL
- Ladan: What about using a lower inflation parameter? Tom: Trying to get larger clusters, also this didn't help much overall
- Stephen: What fast blast tool do people recommend?
- USearch and CDHit
- Already looked at RAPSearch, LAST