Bodega Phylogenetics Workshop Day 1
Lecture notes for the phyolgenetics workshop. These are taken in real time, so may contain errors. I'll see if I keep this up throughout the course.
See course website.
Huelsenbeck
9a11a
 Parameters Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \theta}
 Observation Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle X }
 Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle p(X\theta) }
is known as the likelihood. Adjust parameter to maximize likelihood.
 Coin toss example
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \theta}
= prob of heads on a single toss of a coin, X is # of heads observed on n tosses. Choose a model: binomial:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle p(x\theta) = {n \choose{x}} \theta^x ( 1\theta)^{nx} }
 Maximize likelihood, find Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat \theta = \frac{x}{n} }
 Bayesian
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle p(\theta  x) = \frac{p(x\theta)p(\theta)}{p(x)} }
likelihood times prior over marginal equals the posterior. The prior is the source of controversy, though mathematically required if you want a posterior probability.
The marginal can be hard to solve.
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \int_0^1 p(x\theta) p(\theta) d\theta }
So Bayesians solved trivial problems or just estimated moments (mean, etc), until along came MCMC and they could do these integrals...
Phylogenetic Model
 Parameters: Tree. (comments on notation). Life is more simple than you think: all phylogenetic models are continuous time Markov models
 "Observations:" could be:
 Alignment (note that it's not actually observed, it's an inference from the chronogram from the sequencer! we'll ignore that.)
 Character matrix
Calculate the probability of alignment given a tree?
 (Lecture moves to slides, will probably ve available on Bodega Wiki?)
 Summing over all possible combinations literally is crazy. Felsenstein Pruning algorithm (sumproduct algorithm from graphical models) makes this much faster (many of those trees are related).
 "This is the exponential distribution, it is the only distribution you'll see/need in this course"
moments later...
 "This is the gamma distribution, it is the only distribution you'll need"
 "Oh, and this is the Poisson distribution."
Dice activity: Exact simulation of finitestate continuous time Markov process
 Interesting handson activity to demonstrate Gillespie algorithm simulation.
Codon models.
 (Note, this wiki entry should be expanded).
 Sequence model, sparse matrices with arbitrary correlations. Approximate pruning via MCMC.
 Rate variation across sites.
 Approximate the gamma by likelihood catagories, using the mean of a catagory.
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle l_i = \sum_k p(x_i  x ) /k }
 Heterotachy  different models on different parts of the tree.
 Nielsen & Yang 1998
Jeremy Brown, Mr. Bayes
Using Mr. Bayes
 Can't calculate the marginal probability. Luckily, in ratio that normalizing constant cancels out: So we can find the ratio of posterior probabilities. This is where MCMC comes in.
See the slides
Rannala, Divergence Times
 Molecular clock and fossil dating
 branch lengths as estimates
 Bayesian approaches to contain this uncertainty, given sequences times and fossil calibrations:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f(\theta, t, u  X) = \frac{f(Xt,u,\theta)f(u\theta) f(t \theta) f(\theta) }{f(X)} }
substitution rate (clock) u, branch length v, age t.
Prior for the divergence times. Priors from fossils. Dangers of hard bounds (will confidently predict the wrong answer).
When do you have enough DNA sequence that your divergence estimate becomes limited only by fossil age uncertainty? Regress width of posterior distribution for divergence times against the mean. Once that regression is sufficiently tight, more DNA data won't help.
 Irregular Molecular clocks
Tom Near, Rocks for Gel Jocks
(a.k.a) BEAST tutorial
slides
Ideas/Topics
 Reaching the developers
 Help lists.
 uservoice.
 stackoverflow for phylogenetics?
 Poll faculty for useful software,tools or tricks. Attempt a talk: 5 minutes, 5 tools to change your life.
 Mendeley: an itunes for your pdfs, pandora for the literature.
Thoughts/things to investigate or share
 When, if ever, can you actually fit a model with trending rate changes over time. i.e. can we really detect accelerating BM in character evolution?
 Common intuitive mechanisms that generate the familiar distributions. Exponential, Poisson, Gamma, Normal, Log Normal, Binomial, Uniform, Dirchlet.
