Moore Notes 2 16 10
- OTU pipeline
- Annual Report
- Update wiki, e.g. papers in progress
DESTINATION MANUSCRIPT on OTU pipeline: TOM'S DISCUSSION
Workflow has been completely packaged. Has been modulated. Input is raw Fasta file of reads. Output [sequences homologous to rRNA, alignments of homologous sequences to reference sequences, distance matrix, OTUs from Mothur.
ESPRIT does first round pass. It's fast because doesn't calculate pairwise distances that are outside the clustering threshold. Hierarchical algorithm the same in ESPRIT and MOTHUR. ESPRIT reduces search space.
Have tree, deconvolute tree to get OTUs. Along the way we develop a phylogenetic tree. Conventional methods do not but we do.
We need to show that phylogenetic based clustering performs reasonably well compared to traditional method.
Jonathan thinks that as we get to lower cutoffs, our methods will differ from ESPRIT
Furthest neighbor is default algorithm used by MOTHUR. Clustering algorithm, as Sogin's group predicts, makes very big impact on OTU measures.
Jonathan: is the lesson to use phylogeny and not clustering? Suggesting a different way of pulling out OTUs based on monophyletic groups. Why use distance matrix?
Tom: let's talk about how current implementation is working on specific data sets. GOS data on Rusch paper. GOS data has been dumped into deprecated bin of CAMERA.
Metagenomic data appears to capture slice of biosphere that we can't see with the PCR data.
Steve: would be useful to see rarefaction curve on top of one another [WGS & PCR] so we can really see if increased number of OTUs with metagenomic is not due to sample effort.
Jonathan: if you think one of the two approaches are missing parts of the biosphere, it would be useful to map both phylogenies on top of one another. There is a resource at RDP that should help with this, Srijak knows about this.
Distal Gut data: Metagenomic data and PCR data are getting binned into the same OTUs. Method is working. Seeing that many of the genes are not mapping onto what is currently known or identified in Greengenes.
Katie: if there really are things at the genus or family level that are not being identified, this is really important.
Dongying: Greengenes is missing a lot of information. Tom is saying that important to show that something is novel in both RDP and to Greengenes.
Short read archive is the best place to find 454 data. That is where the Knight lab is posting their own data, plus in their own data base.
Tom: tree structure of the phylogenies are looking pretty good. He can use some help, if you want to be an author discuss with Tom what you can contribute.
Annual Report Katie sent around template. Jonathan wikiified it. Fill it out.