Moore Notes 8 19 09
From OpenWetWare
Jump to navigationJump to search
Group Call
- Sam will email questions about simulations
- Update on svn or git server
- Srijak will follow up
- Problems with skype
- gizmo
- Gladstone teleconference system
- GOS OTUs
- (1) identify 16S reads
- mpiblast on genbeo
- large query and relatively small db isn't optimal situation for mpiblast
- Tom is running regular blast on one node for now
- STAP/reducing db size in initial pass (use all of greengenes for classification later)
- Dongying: get greengenes down to a few (e.g. 300 from STAP) sequences
- JE: two versions, use the smaller ~300 organism set (spans tree)
- switching roles of db and query
- how to deal with small fragments of 16S?
- mostly on ends of longer reads
- GOS is Sanger sequencing
- see paper from CAMERA (Bioinformatics, May 2009)
- mpiblast on genbeo
- (2) align small fragments
- Program to use?
- STAP
- mothur tools
- MAST alignment server or shorter fragment version (GAST?)
- infernal (via RDP?) - How automated?
- Small reads
- Dongying: <200bp is hard
- most reads are longer
- ignore tiny things
- replacing small read with a longer one from the db, if read is a near perfect match (just to assign, not define OTU)
- Use gap penalties to stop splitting of reads
- Program to use?
- (3) find OTUs
- Katie: do we need a maybe/unknown category (besides present/absent)?
- Dongying looked at which parts of molecule are most informative
- might want to use a minimum alignment length
- develop a reliability measure
- James: for abundance, don't want to throw away any data
- Jess: what about using other genes (i.e. proteins)?
- Hard to define similarity cutoffs
- Schloss paper is a first step
- Cuts on tree, rather than percent identity
- Marcel (Eisen lab) did cut on tree vs. percent identity on 16S already
- monophyletic groups correspond with 99% or 97% OTUs
- but wasn't tried on fragmentary data
- tree might be better for non-overlapping reads
- (1) identify 16S reads