Moore Notes 8 3 11
From OpenWetWare
				
				
				Jump to navigationJump to search
				
				
Group Call - Protein Db Authors
- Sensitivity/Specificity analysis
- Results look much better now that inappropriate sequences have been removed from analysis
 - Precision better in global (coverage threshold) vs. local analysis - probably due to shared domains
 
 - Family Relationships
- parent = longest consensus sequence, child = shorter one
 - Corresponds essentially to super- and sub-families
 - This is an annotation, not a change to the family definitions
 - Most have 0 or 1 relationship, but there is a long tail in the histogram out to >2000 relationships
- Put histogram on log scale
 
 - Cytoscape visualization
- Try other renderings that separate out subnetworks
 - Calculate summary statistics (betweenness, degree/degree centrality)
 - Look for subnetworks, maybe show one or two as examples
 - Do we have phylogenetic relationships represented here?
 
 
 - Fragment analysis
- Everything that was filtered from MCL analysis and singletons
 - Compare fragments to families
 - Do we need a coverage threshold on the fragment? (had one on fragment and on family originally)
- Cuts number of hits in half
 - Goal: have we kicked out data that should be in the families?
 - Probably do want coverage threshold to ensure family membership
 
 - How does number of hits correlate with fragment length? e.g., short fragments might cover common domains or motifs
 - How does number of hits correlate with family coverage?
 
 - GOS analysis
- Used annotated peptides from CAMERA (hard to track down!)
 - Could use all predicted ORFs or assembled peptides from these ORF peptides (more manageable - 6 million proteins)
- Should take about 1 week on cluster to search vs. all families
 
 
 - In progress:
- PFAM analysis
 - Newly sequenced genomes scan
 
 - Next call: 2 weeks from now