Moore Notes 7 13 11
From OpenWetWare
Jump to navigationJump to search
Group Call
- Protein db
- Data deliverables
- Updates (e.g., as new genomes are sequenced)
- Could be an SQL dump with an archive of flat files
- Flat files (e.g., on Biotorrents)
- alignments
- mappings of sequences to families, which were used to build HMM
- organize into meaningful, subsets (e.g., all with high universality, representative sequences for fast searches)
- User interface?
- Try to get CAMERA to do this, as proposed originally
- Morgan's observations about other databases, why are we unique?
- Journal
- PLoS ONE
- NAR database issue - slower and the issue is getting huge
- BMC Bioinformatics
- Briefings in Bioinformatics
- Do we need to build trees? Seems like a good idea (Morgan)
- Method? PhyML, FastTree
- Should be pretty fast if we only build trees with the representative sequences
- How to measure PD of a family from the tree? Needs to be comparable across families
- Analyses to do
- Finish cleaning up precision/recall analyses (Guillaume & Tom)
- Scanning new genomes - 2500 genomes now vs. 1900 used to make db (Dongying)
- Scanning GOS metagenomic data set - test compute time (Guillaume & Tom)
- Data deliverables