Tara Oceans Analysis Plans

From OpenWetWare
Jump to navigationJump to search

Tara Oceans Analysis Ideas

  • Prevalence of novel SFams (diverse families with no annotated function) across Tara sampling locations.
    • Potentially correlate with annotated families and/or environmental data, location, etc.
    • Team: Aram, Stephen, Stacia
    • Approach: Pipeline of scripts using diamond and novel SFams db written by Stacia
    • Status: Complete - Stacia is writing up manuscript
  • Global niche modeling of distributions of specific gene families and pathways.
    • Genes/functions of interest for focused analyses:
      • Antibiotic resistance and synthesis
      • CRISPRs and related proteins
      • Photobiology
        • Photosynthesis
        • Light receptors
        • UV DNA damage protection and repair (Eisen)
        • Proteorhodopsins (Lizzy Wilbanks, Sarah Hird)
        • Circadian rhythms (Eisen)
      • Carbon fixation
      • Iron scavenging
      • Nitrogen cycle - nitrogen fixation, nitrification/denitrification, ammonification, anammox (Adrienne)
      • Biosynthetic pathways of biomedical relevance (e.g., http://elifesciences.org/content/4/e05048)
      • Petroleum Hydrocarbon and Plastics degradation proteins
    • Starch utilization system operons (selfish bacteria; Carol Arnosti) - do the genes in the operon co-occur? Do other genes co-occur with them?
    • Potentially explore historical environmental data versus current to look for lags similar to what we see in soil and to predict extinctions.
    • Possibly look at overall diversity metrics to predict hotspots.
    • Team: Stephen, Josh, Patrick?, Adrienne, Carol
    • Approach: ShotMAP with KEGG db, model selection, predictions/maps, interpretation
    • Status: ShotMAP done, rest to do
  • Strain-level analysis (copy number variants, single nucleotide variants) of prevalent species across Tara sampling locations.
    • Build phylogenies.
    • Potentially look at gene prevalence within species and correlate with environmental data, location, etc.
    • Team: Stephen, ?
    • Approach: PhyloCNV
    • Status: Started - Stephen ran PhyloCNV but probably needs help assimilating and interpreting results
  • Ecological annotation of protein families
    • How do proteins interact with the environment? Use KEGGs and SFAMs from above annotation frameworks and correlate each family with environmental data
    • Is ecological covariation a predictor of pathway interaction? Quantify correlation between KEGGs that linked through pathways across sites.
    • These ideas are related to several listed above and integrated analyses may be more efficient
    • Team: Tom
    • Status: To do