Moore Notes 8 24 11

From OpenWetWare

Jump to navigation Jump to search

Group Call - Protein Db project

Dongying ran all new genomes vs. all HMMs
- Guillaume is parsing the data (takes 6 hours)
- Found a bug in his code, but fixed now. Re-running today.

Guillaume working on annotation
- Distance matrix is too huge for clustering families based on shared Pfams
- Fuzzy clustering or network based on shared Pfams should be good enough
- Pfam to GO versus InterPro (better? slow, but should be OK with one sequence per family)
  - Use consensus sequence for each family for annotation mapping
- Goal: establish a rough guess at number of distinct functional groups

Tom ran GOS translated reads (~6 million) vs. all HMMs
- About 75% are classified into a family
- Next: compare to Rusch et al.
  - Numbers of families
  - How reads are distributed across the meta-proteome

Parent-child relationships computed
- Still need to compute betweenness and other graph summary statistics

Meet next Tuesday in Davis

Ready to start writing manuscript
- Morgan has input on introduction (other protein dbs)
- Outline methods and results (what are the top results for figures)

Where will data be hosted?
- Edhar would work
- Do we want a domain name? Probably only if we make the user interface later

Retrieved from "https://openwetware.org/mediawiki/index.php?title=Moore_Notes_8_24_11&oldid=990364"

Navigation menu