Tara Oceans Analysis Plans
From OpenWetWare
Jump to navigationJump to search
Tara Oceans Analysis Ideas
- Prevalence of novel SFams (diverse families with no annotated function) across Tara sampling locations.
- Potentially correlate with annotated families and/or environmental data, location, etc.
- Team: Aram, Stephen, Stacia
- Approach: Pipeline of scripts using diamond and novel SFams db written by Stacia
- Status: Complete - Stacia is writing up manuscript
- Global niche modeling of distributions of specific gene families and pathways.
- Genes/functions of interest for focused analyses:
- Antibiotic resistance and synthesis
- CRISPRs and related proteins
- Photobiology
- Photosynthesis
- Light receptors
- UV DNA damage protection and repair (Eisen)
- Proteorhodopsins (Lizzy Wilbanks, Sarah Hird)
- Circadian rhythms (Eisen)
- Carbon fixation
- Iron scavenging
- Nitrogen cycle - nitrogen fixation, nitrification/denitrification, ammonification, anammox (Adrienne)
- Biosynthetic pathways of biomedical relevance (e.g., http://elifesciences.org/content/4/e05048)
- Petroleum Hydrocarbon and Plastics degradation proteins
- Starch utilization system operons (selfish bacteria; Carol Arnosti) - do the genes in the operon co-occur? Do other genes co-occur with them?
- Potentially explore historical environmental data versus current to look for lags similar to what we see in soil and to predict extinctions.
- Possibly look at overall diversity metrics to predict hotspots.
- Team: Stephen, Josh, Patrick?, Adrienne, Carol
- Approach: ShotMAP with KEGG db, model selection, predictions/maps, interpretation
- Status: ShotMAP done, rest to do
- Genes/functions of interest for focused analyses:
- Strain-level analysis (copy number variants, single nucleotide variants) of prevalent species across Tara sampling locations.
- Build phylogenies.
- Potentially look at gene prevalence within species and correlate with environmental data, location, etc.
- Team: Stephen, ?
- Approach: PhyloCNV
- Status: Started - Stephen ran PhyloCNV but probably needs help assimilating and interpreting results
- Ecological annotation of protein families
- How do proteins interact with the environment? Use KEGGs and SFAMs from above annotation frameworks and correlate each family with environmental data
- Is ecological covariation a predictor of pathway interaction? Quantify correlation between KEGGs that linked through pathways across sites.
- These ideas are related to several listed above and integrated analyses may be more efficient
- Team: Tom
- Status: To do