Moore Notes 1 19 10
From OpenWetWare
Jump to navigationJump to search
Group Call: Steve, Tom, Morgan, Guillaume, Katie, Sam, Josh
- Reports
- Quarterly report - does Jonathan need anything from group?
- Annual report due end of Feb
- How to use conference call time?
- individual updates (1-2 topics per call)
- leave room for stuff that comes up
- project conversations (rather than separate time)
- Tracking system is set up - start using it!
- Protein family clustering of reference data (genomes)
- All vs. all blast analysis is hard computation (due to families with lots of paralogs)
- About 1000 genomes in microbeDb, extending to IMG (maybe unpublished GEBA genomes)
- Guillaume (w/ Morgan) is working through IMG genomes
- Could extend to eukaryotes
- Trying to deal with genomes that appear twice (draft and final) and other clean up issues
- Which sequence identifier to use (IMG vs. genbank etc)?
- Dongying has done MCL clustering with 100 genomes, all proteins
- How to add more genomes?
- Use Dongying's families to search for more copies in other genomes
- Or start from scratch
- Plan: build HMMs and trees out of these for downstream analyses
- Tom is working on pipeline to add reads to Dongying's families
- Morgan is going to look for sequences that don't hit these families and cluster them
- What features do people want? Talk to Tom et al.
- Currently planned features: sequences, annotation, alignments, phylogeny (ref seqs only), scores/metrics, rates of evolution, HMM profiles
- RNA genes (e.g. rRNA for OTU projects, tRNAs),
- maybe separate db if just SSU rRNA
- not always well annotated
- Separate part of db/objects for specific read data set analyses?
- Get edhar MySQL logins from Morgan if you want to use it
- OTU group update
- Tom: summary of pipeline
- What data (besides GOS) to analyze for manuscript?
- 56 projects in CAMERA (many samples per project), quite a few are new this year
- Story we'd like to tell: hit rare biome, find things you can't find without metagenomic data
- Considerations: complex/diverse community, 454 sequencing (vs. Sanger), PCR and metagenomic data for comparison (check Josh's table or try CAMERA SQL query), published (?)
- JL: Is a diverse data set needed to look at PCR bias? TS: competition for template higher/recurring in complex community
- Do we want a low diversity community (acid mine, selected GOS sample)? Or simulate different levels of diversity w/ and w/out PCR bias?
- ML: Total diversity or specific novel branches/clades? TS: Probably rarefaction type analysis, but will look for novel lineages
- ML: Could higher sequencing error in 454 lead to false signal of greater diversity? TS: try to compare metagenomic and PCR with same sequencing method. Also, we would use quality filtering.
- Tom and Steve will follow up on picking a data set or two
- Tom and Sam will follow up on simulations
- Next time:
- Josh
- ranges
- null models
- James
- others
- Josh