Moore Notes 1 19 10

From OpenWetWare
Jump to navigationJump to search

Group Call: Steve, Tom, Morgan, Guillaume, Katie, Sam, Josh

  • Reports
    • Quarterly report - does Jonathan need anything from group?
    • Annual report due end of Feb
  • How to use conference call time?
    • individual updates (1-2 topics per call)
    • leave room for stuff that comes up
    • project conversations (rather than separate time)
  • Tracking system is set up - start using it!
  • Protein family clustering of reference data (genomes)
    • All vs. all blast analysis is hard computation (due to families with lots of paralogs)
    • About 1000 genomes in microbeDb, extending to IMG (maybe unpublished GEBA genomes)
      • Guillaume (w/ Morgan) is working through IMG genomes
      • Could extend to eukaryotes
      • Trying to deal with genomes that appear twice (draft and final) and other clean up issues
      • Which sequence identifier to use (IMG vs. genbank etc)?
    • Dongying has done MCL clustering with 100 genomes, all proteins
    • How to add more genomes?
      • Use Dongying's families to search for more copies in other genomes
      • Or start from scratch
    • Plan: build HMMs and trees out of these for downstream analyses
      • Tom is working on pipeline to add reads to Dongying's families
      • Morgan is going to look for sequences that don't hit these families and cluster them
    • What features do people want? Talk to Tom et al.
      • Currently planned features: sequences, annotation, alignments, phylogeny (ref seqs only), scores/metrics, rates of evolution, HMM profiles
      • RNA genes (e.g. rRNA for OTU projects, tRNAs),
        • maybe separate db if just SSU rRNA
        • not always well annotated
      • Separate part of db/objects for specific read data set analyses?
    • Get edhar MySQL logins from Morgan if you want to use it
  • OTU group update
    • Tom: summary of pipeline
    • What data (besides GOS) to analyze for manuscript?
      • 56 projects in CAMERA (many samples per project), quite a few are new this year
      • Story we'd like to tell: hit rare biome, find things you can't find without metagenomic data
      • Considerations: complex/diverse community, 454 sequencing (vs. Sanger), PCR and metagenomic data for comparison (check Josh's table or try CAMERA SQL query), published (?)
      • JL: Is a diverse data set needed to look at PCR bias? TS: competition for template higher/recurring in complex community
      • Do we want a low diversity community (acid mine, selected GOS sample)? Or simulate different levels of diversity w/ and w/out PCR bias?
      • ML: Total diversity or specific novel branches/clades? TS: Probably rarefaction type analysis, but will look for novel lineages
      • ML: Could higher sequencing error in 454 lead to false signal of greater diversity? TS: try to compare metagenomic and PCR with same sequencing method. Also, we would use quality filtering.
    • Tom and Steve will follow up on picking a data set or two
    • Tom and Sam will follow up on simulations
  • Next time:
    • Josh
      • ranges
      • null models
    • James
    • others