Moore Notes 8 24 11
From OpenWetWare
Jump to navigationJump to search
Group Call - Protein Db project
- Dongying ran all new genomes vs. all HMMs
- Guillaume is parsing the data (takes 6 hours)
- Found a bug in his code, but fixed now. Re-running today.
- Guillaume working on annotation
- Distance matrix is too huge for clustering families based on shared Pfams
- Fuzzy clustering or network based on shared Pfams should be good enough
- Pfam to GO versus InterPro (better? slow, but should be OK with one sequence per family)
- Use consensus sequence for each family for annotation mapping
- Goal: establish a rough guess at number of distinct functional groups
- Tom ran GOS translated reads (~6 million) vs. all HMMs
- About 75% are classified into a family
- Next: compare to Rusch et al.
- Numbers of families
- How reads are distributed across the meta-proteome
- Parent-child relationships computed
- Still need to compute betweenness and other graph summary statistics
- Meet next Tuesday in Davis
- Ready to start writing manuscript
- Morgan has input on introduction (other protein dbs)
- Outline methods and results (what are the top results for figures)
- Where will data be hosted?
- Edhar would work
- Do we want a domain name? Probably only if we make the user interface later