Moore Notes 8 24 11

From OpenWetWare
Jump to navigationJump to search

Group Call - Protein Db project

  • Dongying ran all new genomes vs. all HMMs
    • Guillaume is parsing the data (takes 6 hours)
    • Found a bug in his code, but fixed now. Re-running today.
  • Guillaume working on annotation
    • Distance matrix is too huge for clustering families based on shared Pfams
    • Fuzzy clustering or network based on shared Pfams should be good enough
    • Pfam to GO versus InterPro (better? slow, but should be OK with one sequence per family)
      • Use consensus sequence for each family for annotation mapping
    • Goal: establish a rough guess at number of distinct functional groups
  • Tom ran GOS translated reads (~6 million) vs. all HMMs
    • About 75% are classified into a family
    • Next: compare to Rusch et al.
      • Numbers of families
      • How reads are distributed across the meta-proteome
  • Parent-child relationships computed
    • Still need to compute betweenness and other graph summary statistics
  • Meet next Tuesday in Davis
  • Ready to start writing manuscript
    • Morgan has input on introduction (other protein dbs)
    • Outline methods and results (what are the top results for figures)
  • Where will data be hosted?
    • Edhar would work
    • Do we want a domain name? Probably only if we make the user interface later