Moore Notes 1 15 14

From OpenWetWare
Jump to navigationJump to search

Group Call

  • Participants: Jonathan, Katie, Tom, Josh, Sarah, Dongying, Guillaume, Stephen, Patrick
  • PICRUSTs update http://edhar.genomecenter.ucdavis.edu/~gjospin/picrust_test/picrust_test/Jan_13_2014_modules/
    • Found a bug in mapping from Kegg Orthology Groups to Modules, but fixing it did not change results
      • Modules better correlated than individual KOs, but still not highly correlated between PICRUSTs and Shotmap
    • Fixing this made it run faster (now don't search all family members, just the module ID)
    • Ran full data set plus top 100
      • Correlations a bit higher for top 100
      • However residuals are pretty large
    • How bad would PICRUSTs output be for niche modeling
      • Josh recommends looking at logic transformed relative abundances
    • Patrick: could be useful to look at KO (or module) across samples to see if correlated between PICRUSTs and Shotmap
    • Let's put this on hold for now, and see how long it takes for metagenomic data to come
    • Patrick might want to apply PICRUSTs to estimate protein family abundances in mammalian microbiomes with metabolomics data
  • EFI collaboration
    • They do all analyses based on UniProt IDs (vs. genomes)
      • KB database as source of input sequences for current analysis pipeline
    • How many Sfams have no Pfam annotation?
      • Annotated Pfams: ~61% of Sfams have no Pfam
      • Pfam-B families (unannotated): TBD
    • What percent of InterPro has an Sfam?
      • For comparison, only ~19% of InterPro (UniProtKB) has no Pfam
    • Should annotate and score families
      • Jonathan: should use metrics to score these (phylogenetic breadth), not just family size
      • Tom: also how connected is Sfam in family network space
      • Stephen: from gut (genomes plus assembled proteins), consistently present, maybe correlated with a phenotype
      • Jonathan: antibiotic resistance and synthesis genes
      • This framework could be useful for multiple environments and future grants
      • Call it the "most wanted" list
    • Downstream analyses they can do
      • Genome neighborhood based functional prediction
      • For pathway analysis, start with input (e.g., solute carriers) and try to annotate the rest of the pathway
      • Synthesize/clone proteins
      • Crystal structures
      • In vitro enzyme activity assays
    • Blue Waters: 20 million integer processor hours
    • To do: get list of InterPro with no PFAM (or other annotation)
  • Tim Laurent is leaving
    • Get him to document what he did with Sfams several months ago (build 2 families)
    • Ad posted
    • Jonathan will follow up with Katie re: hire and support from his lab on EFI project