Moore Notes 12 4 13

From OpenWetWare
Jump to navigationJump to search

Group Call

  • Participants: Katie, Jonathan, Tom, Stephen, Guillaume, Dongying, Ladan, Patrick
  • Jonathan: Microbial ecology in N Atlantic
    • Healy Hamilton (ex-Berkeley/Cal Academy)
    • Now at foundation that specializes in biodiversity databases
    • NOAA call for proposals for biodiversity databases in N Atlantic
    • She will try to convince NOAA to collect microbial samples on the N Atlantic ships
  • Tom/Guillaume: Talked to Morgan about PICRUSTs
    • Thinks that PICRUSTs failures are probably due to reference database gaps
    • Compare Shotmap vs. MG-RAST (SEED converted to KEGG) to confirm that Shotmap abundances agree with MG-RAST
    • Will collapse KEGG OGs into pathways
    • Will also check the software version
  • Josh: Any news on Tara Oceans?
  • Stephen: Relative abundance of plasmids across samples
    • Could they explain differences in average genome coverage?
      • JE: Yes, because copy number of plasmids can be very high (varies a lot)
    • JE: Very hard to identify plasmids, though there are a couple gene markers
      • Chromosome partitioning factors for plasmids (parA, parB) tend to be phylogenetically clustered, so can be identified
      • In some cases, there are genes that consistently occur on plasmids
      • Some people look for assemblies that circularize (need to be high coverage)
    • SN: What about mapping reads to plasmids in IMG?
      • JE: Thinks HMP did this
      • JE: Similar to problem of finding free phage DNA
  • Patrick: Correcting functional signals for taxonomic abundance
    • To what extent are differences in gene family abundance driven by taxonomic variation?
    • JE: PICRUSTs gives the expected distribution of functional genes for a taxonomic arrangement
  • Ladan: slides
    • Extreme set clustering of halophile protein families
    • Predicting GO annotations of genes based on clustering with annotated genes
    • Comparison of extreme sets vs. MCL, with two different rules for assigning GO to nodes within a cluster
    • How to determine precision and recall?
      • KP: Treat each annotation of a GO term to a gene (yes vs. no) as an independent test
      • TS/KP: Could partition genes based on characteristics (e.g., number of GO terms, extreme set properties) and check if performance differs across partitions
    • Stephen: Try extreme sets on a benchmark where MCL has already been validated