Moore Notes 6 4 14

Group Call

  • Participants: Tom, Katie, Stacia, Stephen, Josh, Ladan, Sarah, Dongying, Guillaume
  • Pre-call discussion about shotmap
    • Stacia software problems, but one bug regarding use of /scratch remaining
    • To do:
      • Re-run L4 data (Tom: SFams, KEGG; Stacia: Figfam) with unclassified ORF rarefaction
      • Maybe run it with all-reads rarefaction
      • MetaHIT reclassification (Tom: SFams; coordinate with Stacia re: KEGG and Figfam)
  • Guillaume found some samples on SRA
    • Sequenced with Illumina
    • In same area as a 2009 study with 16S sampling
    • Also, Titus Brown has some soil shotgun metagenomic samples with 16S
  • Ladan's project slides
    • Protein sequence similarity network (weighted edges)
    • Discovering "roles" of nodes (i.e., proteins) in the network
    • Complementary to previous work discovering communities (i.e., sets of close proteins)
    • Roles are based on structural features
      • Local, neighborhood, regional
      • Some recursive over neighbors
      • Grouped to reduce dimensionality
    • Roles are discovered by non-negative matrix factorization of genes x features matrix
    • Resulting matrices are lower dimension (fewer roles than features) and typically sparse
    • Complexity of algorithm is linear in number of protein network edges
      • Hard to run with ~13 million edges in haloarchaea protein network (80 genomes)
    • How to make sense of the roles that are output?
      • Try overlaying known functions on the role-annotated network (or a projection thereof)
      • Think about promiscuous domains (e.g., proteins with them might be cliquey)
      • May want to look at many roles for functional annotation (if this is even possible)