Moore Notes 6 4 14
From OpenWetWare
Jump to navigationJump to search
Group Call
- Participants: Tom, Katie, Stacia, Stephen, Josh, Ladan, Sarah, Dongying, Guillaume
- Pre-call discussion about shotmap
- Stacia software problems, but one bug regarding use of /scratch remaining
- To do:
- Re-run L4 data (Tom: SFams, KEGG; Stacia: Figfam) with unclassified ORF rarefaction
- Maybe run it with all-reads rarefaction
- MetaHIT reclassification (Tom: SFams; coordinate with Stacia re: KEGG and Figfam)
- Guillaume found some samples on SRA
- Sequenced with Illumina
- In same area as a 2009 study with 16S sampling
- Also, Titus Brown has some soil shotgun metagenomic samples with 16S
- Ladan's project slides
- Protein sequence similarity network (weighted edges)
- Discovering "roles" of nodes (i.e., proteins) in the network
- Complementary to previous work discovering communities (i.e., sets of close proteins)
- Roles are based on structural features
- Local, neighborhood, regional
- Some recursive over neighbors
- Grouped to reduce dimensionality
- Roles are discovered by non-negative matrix factorization of genes x features matrix
- Resulting matrices are lower dimension (fewer roles than features) and typically sparse
- Complexity of algorithm is linear in number of protein network edges
- Hard to run with ~13 million edges in haloarchaea protein network (80 genomes)
- How to make sense of the roles that are output?
- Try overlaying known functions on the role-annotated network (or a projection thereof)
- Think about promiscuous domains (e.g., proteins with them might be cliquey)
- May want to look at many roles for functional annotation (if this is even possible)