Moore Notes 1 15 14
From OpenWetWare
				
				
				Jump to navigationJump to search
				
				
Group Call
- Participants: Jonathan, Katie, Tom, Josh, Sarah, Dongying, Guillaume, Stephen, Patrick
 
- PICRUSTs update http://edhar.genomecenter.ucdavis.edu/~gjospin/picrust_test/picrust_test/Jan_13_2014_modules/
- Found a bug in mapping from Kegg Orthology Groups to Modules, but fixing it did not change results
- Modules better correlated than individual KOs, but still not highly correlated between PICRUSTs and Shotmap
 
 - Fixing this made it run faster (now don't search all family members, just the module ID)
 - Ran full data set plus top 100
- Correlations a bit higher for top 100
 - However residuals are pretty large
 
 - How bad would PICRUSTs output be for niche modeling
- Josh recommends looking at logic transformed relative abundances
 
 - Patrick: could be useful to look at KO (or module) across samples to see if correlated between PICRUSTs and Shotmap
 - Let's put this on hold for now, and see how long it takes for metagenomic data to come
 - Patrick might want to apply PICRUSTs to estimate protein family abundances in mammalian microbiomes with metabolomics data
 
 - Found a bug in mapping from Kegg Orthology Groups to Modules, but fixing it did not change results
 
- EFI collaboration
- They do all analyses based on UniProt IDs (vs. genomes)
- KB database as source of input sequences for current analysis pipeline
 
 - How many Sfams have no Pfam annotation?
- Annotated Pfams: ~61% of Sfams have no Pfam
 - Pfam-B families (unannotated): TBD
 
 - What percent of InterPro has an Sfam?
- For comparison, only ~19% of InterPro (UniProtKB) has no Pfam
 
 - Should annotate and score families
- Jonathan: should use metrics to score these (phylogenetic breadth), not just family size
 - Tom: also how connected is Sfam in family network space
 - Stephen: from gut (genomes plus assembled proteins), consistently present, maybe correlated with a phenotype
 - Jonathan: antibiotic resistance and synthesis genes
 - This framework could be useful for multiple environments and future grants
 - Call it the "most wanted" list
 
 - Downstream analyses they can do
- Genome neighborhood based functional prediction
 - For pathway analysis, start with input (e.g., solute carriers) and try to annotate the rest of the pathway
 - Synthesize/clone proteins
 - Crystal structures
 - In vitro enzyme activity assays
 
 - Blue Waters: 20 million integer processor hours
 - To do: get list of InterPro with no PFAM (or other annotation)
 
 - They do all analyses based on UniProt IDs (vs. genomes)
 
- Tim Laurent is leaving
- Get him to document what he did with Sfams several months ago (build 2 families)
 - Ad posted
 - Jonathan will follow up with Katie re: hire and support from his lab on EFI project