Moore Notes 12 2 09
From OpenWetWare
Jump to navigationJump to search
Subgroup Call
- OTUs
- version 1 of pipeline is done!
- Sam's simulation results might lead to modifications
- JE will talk to CAMERA: Tom will follow up Thurs
- JE's question about FastTree
- FastTree psuedo-counts method (with max=3.0) and mutual overlap with a third sequence account for non-overlapping reads
- Sam: we need to emulate this in the simulation analyses
- we use phylogenetic not sequence distance in MOTHUR
- Tom will follow up
- FastTree psuedo-counts method (with max=3.0) and mutual overlap with a third sequence account for non-overlapping reads
- JG: Sogin and Welch talk
- did simulations to show that MOTHUR generates too many clusters with large pyrosequencing data sets
- new algorithm: pairwise alignment rather than multiple alignment
- James is using OTUs (computing metrics)
- posted raw and reformatted output of MOTHUR on edhar (see email)
- lots of noise: big variation in number of reads and number of OTUs across sites
- especially a few sites with few reads and/or OTUs
- maybe due to 1% cutoff, will try others
- idea from paper (Biers et al): get most similar full length sequence from greengenes http://www.citeulike.org/group/6072/article/4095375
- Josh: do we still need the wrapper script?
- not essential right now
- good to have for pipeline
- publication on pipeline
- focus on the identification of OTUs?
- maybe interesting if we did a whole bunch of datasets
- probably need software (CAMERA? Just Perl module OK for pub?)
- and/or James (Josh's?) analyses
- include simulations, e.g. just split up reference sequences
- comparison with Biers et al. results, Sogin paper? Jenna's data?
- focus on the identification of OTUs?
- version 1 of pipeline is done!
- Simulations
- Sam has simulated data on genbeo
- for a reference db with 20 sequences (5 chosen by maxPD, 15 randomly)
- all parameters on wiki
- 5 repeats of each combination
- this makes a lot of data sets
- directory name gives the details, within directories the names are the same (keep separate)
- will reorganize and put on edhar
- Tree building
- which methods? RAxML (two ways), FastTree, pplacer (?)
- which models?
- WAG (in AMPHORA) vs. JTT (more like FastTree)
- cat/cat+gamma
- Next steps
- Sam will make a wiki page with assignments
- Steve: let's set up name conventions (Sam will do)
- Sam will keep track of data sets that have same parameter values due to rounding
- Steve: should share code/scripts over svn
- Tom: data should be in svn too
- Sam has simulated data on genbeo