Moore Notes 7 13 11
From OpenWetWare
				
				
				Jump to navigationJump to search
				
				
Group Call
- Protein db
- Data deliverables
- Updates (e.g., as new genomes are sequenced)
 - Could be an SQL dump with an archive of flat files
 - Flat files (e.g., on Biotorrents)
- alignments
 - mappings of sequences to families, which were used to build HMM
 - organize into meaningful, subsets (e.g., all with high universality, representative sequences for fast searches)
 
 - User interface?
- Try to get CAMERA to do this, as proposed originally
 
 
 - Morgan's observations about other databases, why are we unique?
 - Journal
- PLoS ONE
 - NAR database issue - slower and the issue is getting huge
 - BMC Bioinformatics
 - Briefings in Bioinformatics
 
 - Do we need to build trees? Seems like a good idea (Morgan)
- Method? PhyML, FastTree
 - Should be pretty fast if we only build trees with the representative sequences
 - How to measure PD of a family from the tree? Needs to be comparable across families
 
 - Analyses to do
- Finish cleaning up precision/recall analyses (Guillaume & Tom)
 - Scanning new genomes - 2500 genomes now vs. 1900 used to make db (Dongying)
 - Scanning GOS metagenomic data set - test compute time (Guillaume & Tom)
 
 
 - Data deliverables