Moore Notes 1 15 14
From OpenWetWare
Jump to navigationJump to search
Group Call
- Participants: Jonathan, Katie, Tom, Josh, Sarah, Dongying, Guillaume, Stephen, Patrick
- PICRUSTs update http://edhar.genomecenter.ucdavis.edu/~gjospin/picrust_test/picrust_test/Jan_13_2014_modules/
- Found a bug in mapping from Kegg Orthology Groups to Modules, but fixing it did not change results
- Modules better correlated than individual KOs, but still not highly correlated between PICRUSTs and Shotmap
- Fixing this made it run faster (now don't search all family members, just the module ID)
- Ran full data set plus top 100
- Correlations a bit higher for top 100
- However residuals are pretty large
- How bad would PICRUSTs output be for niche modeling
- Josh recommends looking at logic transformed relative abundances
- Patrick: could be useful to look at KO (or module) across samples to see if correlated between PICRUSTs and Shotmap
- Let's put this on hold for now, and see how long it takes for metagenomic data to come
- Patrick might want to apply PICRUSTs to estimate protein family abundances in mammalian microbiomes with metabolomics data
- Found a bug in mapping from Kegg Orthology Groups to Modules, but fixing it did not change results
- EFI collaboration
- They do all analyses based on UniProt IDs (vs. genomes)
- KB database as source of input sequences for current analysis pipeline
- How many Sfams have no Pfam annotation?
- Annotated Pfams: ~61% of Sfams have no Pfam
- Pfam-B families (unannotated): TBD
- What percent of InterPro has an Sfam?
- For comparison, only ~19% of InterPro (UniProtKB) has no Pfam
- Should annotate and score families
- Jonathan: should use metrics to score these (phylogenetic breadth), not just family size
- Tom: also how connected is Sfam in family network space
- Stephen: from gut (genomes plus assembled proteins), consistently present, maybe correlated with a phenotype
- Jonathan: antibiotic resistance and synthesis genes
- This framework could be useful for multiple environments and future grants
- Call it the "most wanted" list
- Downstream analyses they can do
- Genome neighborhood based functional prediction
- For pathway analysis, start with input (e.g., solute carriers) and try to annotate the rest of the pathway
- Synthesize/clone proteins
- Crystal structures
- In vitro enzyme activity assays
- Blue Waters: 20 million integer processor hours
- To do: get list of InterPro with no PFAM (or other annotation)
- They do all analyses based on UniProt IDs (vs. genomes)
- Tim Laurent is leaving
- Get him to document what he did with Sfams several months ago (build 2 families)
- Ad posted
- Jonathan will follow up with Katie re: hire and support from his lab on EFI project