Moore Notes 1 19 10

From OpenWetWare

Jump to navigation Jump to search

Group Call: Steve, Tom, Morgan, Guillaume, Katie, Sam, Josh

Reports
- Quarterly report - does Jonathan need anything from group?
- Annual report due end of Feb

How to use conference call time?
- individual updates (1-2 topics per call)
- leave room for stuff that comes up
- project conversations (rather than separate time)

Tracking system is set up - start using it!

Protein family clustering of reference data (genomes)
- All vs. all blast analysis is hard computation (due to families with lots of paralogs)
- About 1000 genomes in microbeDb, extending to IMG (maybe unpublished GEBA genomes)
  - Guillaume (w/ Morgan) is working through IMG genomes
  - Could extend to eukaryotes
  - Trying to deal with genomes that appear twice (draft and final) and other clean up issues
  - Which sequence identifier to use (IMG vs. genbank etc)?
- Dongying has done MCL clustering with 100 genomes, all proteins
- How to add more genomes?
  - Use Dongying's families to search for more copies in other genomes
  - Or start from scratch
- Plan: build HMMs and trees out of these for downstream analyses
  - Tom is working on pipeline to add reads to Dongying's families
  - Morgan is going to look for sequences that don't hit these families and cluster them
- What features do people want? Talk to Tom et al.
  - Currently planned features: sequences, annotation, alignments, phylogeny (ref seqs only), scores/metrics, rates of evolution, HMM profiles
  - RNA genes (e.g. rRNA for OTU projects, tRNAs),
    - maybe separate db if just SSU rRNA
    - not always well annotated
  - Separate part of db/objects for specific read data set analyses?
- Get edhar MySQL logins from Morgan if you want to use it

OTU group update
- Tom: summary of pipeline
- What data (besides GOS) to analyze for manuscript?
  - 56 projects in CAMERA (many samples per project), quite a few are new this year
  - Story we'd like to tell: hit rare biome, find things you can't find without metagenomic data
  - Considerations: complex/diverse community, 454 sequencing (vs. Sanger), PCR and metagenomic data for comparison (check Josh's table or try CAMERA SQL query), published (?)
  - JL: Is a diverse data set needed to look at PCR bias? TS: competition for template higher/recurring in complex community
  - Do we want a low diversity community (acid mine, selected GOS sample)? Or simulate different levels of diversity w/ and w/out PCR bias?
  - ML: Total diversity or specific novel branches/clades? TS: Probably rarefaction type analysis, but will look for novel lineages
  - ML: Could higher sequencing error in 454 lead to false signal of greater diversity? TS: try to compare metagenomic and PCR with same sequencing method. Also, we would use quality filtering.
- Tom and Steve will follow up on picking a data set or two
- Tom and Sam will follow up on simulations

Next time:
- Josh
  - ranges
  - null models
- James
- others

Retrieved from "https://openwetware.org/mediawiki/index.php?title=Moore_Notes_1_19_10&oldid=990814"

Navigation menu