Moore Notes 10 2 13

From OpenWetWare
Jump to navigationJump to search

Group Call

  • Participants: Katie, Stephen, Josh, Tom, Ladan, Sarah, Guillaume, Dongying, Patrick
  • Guillaume: Jenna has been working with PICRUSTs and QIIME
    • She will be helping Guillaume run analyses
  • Josh: Tara Oceans data
    • Shotgun metagenomics samples from Pacific and Atlantic
      • 243 samples, 68 locations
      • 5m depth
      • 40-450 million reads per sample
    • Will be available at the end of November
    • Katie: Any gaps in their sampling locations?
      • Josh: hard to find out, well distributed, some bias towards Mediterranean
      • Might have to wait until released to figure this out
    • Should get in touch with senior PIs: Bork, Acinas, Hingamp, Raes, Fallows, Sullivan
      • Tom will drop Sullivan a line copying Josh, Jonathan, Katie
  • Sarah: phylogeography background, has focused on species in vertebrate studies
    • Phylogeography looks at distribution of variants
    • How phylogenetic distribution of genetic variants scales across space and time
    • For bacteria, she would do phylogeography of function using SFams database
    • Blast samples versus SFams, focusing on ones where we know functions (e.g., photorhodopsin)
    • Tom: concerned about having enough resolution in a metagenomic read to do this
    • Katie: look at Sam's paper, maybe do some additional simulations
    • Maybe looking at reads that cluster together in tree would increase confidence
    • Tom: might be good to work with nucleotide sequences (for phylogeny, not for read classification)
    • Needs to think about how to address different sources of variants, in a family specific way, in order to set expectations/null distributions
    • How to synergize with Josh?
      • Use same ShotMap runs (e.g., on Tara Oceans)
      • Niche modeling plus phylogeography
        • To predict distributions of variants, which might be more important functionally than distributions of families
        • Example: nitrate reductase subclade that alters its enzymatic reaction (and output of the metabolic pathway)
  • Ladan: Predicting functions of SFams with no GO annotations
    • Mapped SFams into a network (weighted, nondirected edges) based on Pearson correlation of presence-absence across genomes
    • Tried to find tightly connected subnetworks with extreme sets algorithm
      • Finds groups of nodes where removing a piece has a cost greater than removing the whole group
      • There may be extreme sets within extreme sets (hierarchical, tree-like structure)
    • Could potentially map functions from annotated members of the extreme set
      • Most sets are nearly all annotated or mostly not annotated
    • Stephen: how does the phylogenetic distribution of the genomes impact the results?
    • More info here: https://docs.google.com/file/d/0B5MwVN20vJJzZjJJZW9nOFIzems/edit?usp=sharing
      • Dongying: tried a "phylogenetically independent correlation" (from the 80s), but didn't make sense and eliminated some good information
    • Stephen: could evaluated results based on known annotations
      • Ladan: tried this to compare her algorithm to MCL