Moore Notes 7 6 11
From OpenWetWare
				
				
				Jump to navigationJump to search
				
				
Group Call
- Protein Db
- Update from Guillaume
- Re-running crashed jobs
 - PFAM clustering
- What distance and algorithm?
 - Compare to what? Homology similarities and families
 - Maybe save for a later paper
 - Need to compare to PFAM: provides functional info and comparison of amount of clustering
 - Should also compare to COGs (have phylogenetic context)
 
 
 - Why is this db/paper different from existing protein databases?
- Full length gene families
 - Derived from bacterial genomes
 - High-throughput, automated, easily updated with new genomes, open
 
 - Generation of full-length protein families and models
- Description of workflow
 - Description of database
 - Database accessibility
 
 - Statistical assessment of the families
- Family size distribution
 - Family PD distribution
 - Precision and recall distributions (local v. global)
 
 - The relationship between families in homology space.
- Which families have models that recruit the same sequences?
 - Cytoscape-like network map of family homology (see attached image)
 - Clusters may represent superfamilies
 
 - The relationship between families in functional space
- Hierarchical clustering of families by their pfam annotations
 - Can clusters be partitioned into broad-based functional groups?
 
 - The overlap between these relationships
- Can we quantify the amount of overlap between the homology clusters and the functional clusters?
 - What does this tell us about the evolution of function across superfamilies?
 
 - To do
- Make an outline (Tom, Katie)
 - Introduction (Morgan)
 - Describe workflow and metrics (Dongying, Guillaume, Jonathan)
 - Compare to PFAM
 - Compare to COGs or describe differences
 - Finish statistical analyses
 - Search vs. metagenomes and/or new genomes (compare to PFAM or COGs?)
 
 
 - Update from Guillaume
 
- GBMF proposal request
- Overhead issue
 - Katie will start outlining, Jonathan back next week