Dec09 EisenNotes Eisen Notes
From OpenWetWareJump to navigationJump to search
- COuld do RNA families
- Some question as to which clustering methods to use to build families b/c dongying, morgan and others not doing things all the same
- Sam says might want to use a phylgoeny driven clustering method
- See MCL paper here
- Sam: could flag things that hit a superfamily but not any of the known subfamilies
- Finding protein families w/ no similarity to known families
- Then ranking these in various ways
- Should look at
- Marcotte methods
- Community profiling
Josh 2 Microbial Ranges
- Using distance decay relationships to measure microbial ranges
- Eisen notes - also see some stuff of "Field Guides to Microbes]
- Josh says need random sampling of space
- Eisen says need random sampling of niche
- Can randomize in lots of different ways
- For tree ranomdization can have all the tips be all reads or can collapse into OTUs
- Katie had some good comments about WHY do various null models - fixing one variable, or fixing both ...
- General benefits of a null model gives you default to test against
- Josh suggests that having an explicit alternative model to test against is very powerful
- Eisen says it would be good to have someone write a review about these issues int eh context of metagenomics
- Josh proposes a Likelihood ratio test for assessing models
- Can build a model where one assumes sampling from all taxa is independent but that is probably not the case
- POssibly related papers worth looking at
- Can we predict what organisms are missing? Are there parts of the tree we have not sampled
- Multiple things being varied
- Using AMPHORA as a backbone to some components of the simulations
- METASIM used as part of the simulation
- POssibly may be doing some weird things w/ which regions of the gene are covered but may not matter much
- Comparing trees w/ fasttree but not yet using bootstraps
- Many ways to compare trees
- Robinson-Foulds (partition) metric
- Path difference (nodal distance) metric
- Disagreement metric
- Absolute or normalized
What do we need to know about protein families?
- Phylogenetic informativeness
- Taxa ID
- PD calculation
- Relative abundance informativeness
- Evenness in copy # is key
- OTU identification value
- Joshs suggests a statistical model of how well we have sampled genomes and how likely they are to predict future data
- can do this by taxonomy
- or ecology
- Predictability is important not just "evenness" for example
- Aaron suggests integrating gene and species tree
- Possible statistical studies of copy number variation
- Do all parts of a gene give the same answer?
James, Microbial Diversity
- Spatially structured communtiy assemb;y
- DIstance decay
- Taxa area
- Do the same rules apply at small and large scales?
- Relative abundance is important in the model in terms of sampling individuals
- Some sources of noise
- Doing too wide a breadth of taxa at once
- Doing too many types of environments at once
- Can measure same divbersity parameters as w/ taxonomic diversity
- But use phylogenetic tree as guide
- Many reasons
- Measuring PD in GOS
- All genes give same answer?
- What variables correlated?
- Current distance?
- What is an OTU?
- Mostly looked at w/ ss-rRNA (16s bacteria, 18s eukaryotes)
- Problems w/ PCR
- TOm shows outline of new pipeline
- What is the purpose of the OTUs? Does that effect our workflow?
- Issues w/ chimeras
- Is Fastree the best way to build these trees? Are there issues w/ branch length?
- Sam's simulation
- Known data sets
- To Do
- How do you calculate cutoffs and shuold you sue cutoffs
- Can we use some statistical approach to determine what parameters to use for each gene assuming everything should give the same answer
- Incorporating ecotypes
- Diversity gets purged by periodic selection events
- Most of the time this is w/in niches/ecotypes
- Occasionally you get a new ecotype
- COuld run ecotype analysis and find OTUs consistent w/ being ecotypes
- Also see http://www.ncbi.nlm.nih.gov/pubmed/18435746?itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum&ordinalpos=6
- How does barcoding compare?
- Barcoding folks are dealing w/ similar issues
- See http://www.ncbi.nlm.nih.gov/pubmed/18522916?itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum&ordinalpos=6
- See also http://www.ncbi.nlm.nih.gov/pubmed/19900305?itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum&ordinalpos=1