Moore Notes 8 3 11

From OpenWetWare
Jump to navigationJump to search

Group Call - Protein Db Authors

  • Sensitivity/Specificity analysis
    • Results look much better now that inappropriate sequences have been removed from analysis
    • Precision better in global (coverage threshold) vs. local analysis - probably due to shared domains
  • Family Relationships
    • parent = longest consensus sequence, child = shorter one
    • Corresponds essentially to super- and sub-families
    • This is an annotation, not a change to the family definitions
    • Most have 0 or 1 relationship, but there is a long tail in the histogram out to >2000 relationships
      • Put histogram on log scale
    • Cytoscape visualization
      • Try other renderings that separate out subnetworks
      • Calculate summary statistics (betweenness, degree/degree centrality)
      • Look for subnetworks, maybe show one or two as examples
      • Do we have phylogenetic relationships represented here?
  • Fragment analysis
    • Everything that was filtered from MCL analysis and singletons
    • Compare fragments to families
    • Do we need a coverage threshold on the fragment? (had one on fragment and on family originally)
      • Cuts number of hits in half
      • Goal: have we kicked out data that should be in the families?
      • Probably do want coverage threshold to ensure family membership
    • How does number of hits correlate with fragment length? e.g., short fragments might cover common domains or motifs
    • How does number of hits correlate with family coverage?
  • GOS analysis
    • Used annotated peptides from CAMERA (hard to track down!)
    • Could use all predicted ORFs or assembled peptides from these ORF peptides (more manageable - 6 million proteins)
      • Should take about 1 week on cluster to search vs. all families
  • In progress:
    • PFAM analysis
    • Newly sequenced genomes scan
  • Next call: 2 weeks from now