Moore Notes 7 13 11

From OpenWetWare

Jump to navigation Jump to search

Group Call

Protein db
- Data deliverables
  - Updates (e.g., as new genomes are sequenced)
  - Could be an SQL dump with an archive of flat files
  - Flat files (e.g., on Biotorrents)
    - alignments
    - mappings of sequences to families, which were used to build HMM
    - organize into meaningful, subsets (e.g., all with high universality, representative sequences for fast searches)
  - User interface?
    - Try to get CAMERA to do this, as proposed originally
- Morgan's observations about other databases, why are we unique?
- Journal
  - PLoS ONE
  - NAR database issue - slower and the issue is getting huge
  - BMC Bioinformatics
  - Briefings in Bioinformatics
- Do we need to build trees? Seems like a good idea (Morgan)
  - Method? PhyML, FastTree
  - Should be pretty fast if we only build trees with the representative sequences
  - How to measure PD of a family from the tree? Needs to be comparable across families
- Analyses to do
  - Finish cleaning up precision/recall analyses (Guillaume & Tom)
  - Scanning new genomes - 2500 genomes now vs. 1900 used to make db (Dongying)
  - Scanning GOS metagenomic data set - test compute time (Guillaume & Tom)

Retrieved from "https://openwetware.org/mediawiki/index.php?title=Moore_Notes_7_13_11&oldid=990362"

Navigation menu