Group Call

  • Protein db
    • Data deliverables
      • Updates (e.g., as new genomes are sequenced)
      • Could be an SQL dump with an archive of flat files
      • Flat files (e.g., on Biotorrents)
        • alignments
        • mappings of sequences to families, which were used to build HMM
        • organize into meaningful, subsets (e.g., all with high universality, representative sequences for fast searches)
      • User interface?
        • Try to get CAMERA to do this, as proposed originally
    • Morgan's observations about other databases, why are we unique?
    • Journal
      • PLoS ONE
      • NAR database issue - slower and the issue is getting huge
      • BMC Bioinformatics
      • Briefings in Bioinformatics
    • Do we need to build trees? Seems like a good idea (Morgan)
      • Method? PhyML, FastTree
      • Should be pretty fast if we only build trees with the representative sequences
      • How to measure PD of a family from the tree? Needs to be comparable across families
    • Analyses to do
      • Finish cleaning up precision/recall analyses (Guillaume & Tom)
      • Scanning new genomes - 2500 genomes now vs. 1900 used to make db (Dongying)
      • Scanning GOS metagenomic data set - test compute time (Guillaume & Tom)