User:Morgan G. I. Langille/Notebook/Project management
From OpenWetWare
Jump to navigationJump to search
Halophiles
Need to make list of things to be done for roche genome paper.
- Outline paper
- Organize files for roche genomes and new NCBI completed halophile genomes
- Re-do crispr analysis for new NCBI genomes.
- Look at homologs of genes identified in new Science metabolic paper.
Darpa
- map pfams from Xingpeng's analysis to GO cellular components to figure out which one is the most represented (are they membrane bound?)
- OR pull out proteins that had these pfams and run them through psortb
Erebus
- need to think about ways to identify pfams that have different counts to each other and to whole genomes.
- take pfam counts from all completed genomes, get a distribution, then ask if a single count is normal or not taking into account mutiple test correction
- Do we see pathways that are over/under-represented that are not expected based on:
- genomes that are predicted from the metagenomics sample by taxon assignment (e.g. megan, amphora, etc.). This lets us know if something is missing/different from the information provided by looking at only taxonomy assignment.
Pfam Subtraction Pipeline
- Obtain taxon assignments for metagenomics sample
- Retrieve taxon id from name
- Look up pfam assignments for reach taxon (pre calculated) and multiply by the number of taxon
- Somehow scale taxon assignments if they seem too large (this might happen with SEED or MEGAN predictions where each protein is counted as a taxon hit)
- subtract those pfams from total metagenomic pfam counts
- Look at leftover pfams and see what is interesting
- Possibly search through pre-computed pfam genomes to find genome with similar pfam composition
Protein family stuff with Steve
- chat with steve on Monday (ask about rarefaction curves)
Rough Ideas
Starting with PFAM counts across all GOS samples
- Looking at samples
- alpha diversity of GOS samples (measure total protein diversity in each sample)
- provide a listing of most diverse samples and indicated if those are environmentally related
- beta diversity of GOS samples (are the samples related...presumbly yes)
- show a tree and possible a network describing the relatedness of the samples
- estimate total number of different pfams in the ocean by generating rarefaction curve and using chao estimator
- alpha diversity of GOS samples (measure total protein diversity in each sample)
- Looking at families
- alpha diversity. what fams are the most rich (not that interesting), diverse (interesting and informative)
- provide list of most diverse families and maybe suggest why those are so diverse?
- beta diversity -> do the groupings tell us anything (e.g. are they similar function, similar localization, etc.)
- map to GO terms to see if similar function
- chao index
- estimate total number of proteins for each family in the ocean (what is the most prevalent)
- alpha diversity. what fams are the most rich (not that interesting), diverse (interesting and informative)
Collaboration with Steve would be a comparison between diversity measurements using taxon vs phylogenetic vs functional