Drummond:PopGen

Introduction
Here I will treat some basic questions in population genetics. For personal reasons, I tend to include all the algebra.

Per-generation and instantaneous growth rates
What is the relationship between per-generation growth rates and the Malthusian parameter, the instantaneous rate of growth? Let $$n_i(t)$$ be the number of organisms of type $$i$$ at time $$t$$, and let $$R$$ be the per-capita reproductive rate per generation. If $$t$$ counts generations, then
 * $$n_i(t+1) = n_i(t)R\!$$

and
 * $$n_i(t) = n_i(0)R^t.\!$$

Now we wish to move to the case where $$t$$ is continuous and real-valued. As before,
 * $$n_i(t+1) = n_i(t)R\!$$

but now

where the last simplification follows from L'Hôpital's rule. Explicitly, let $$\epsilon=\Delta t$$. Then
 * align="right" |$$n_i(t+\Delta t)\!$$
 * $$=n_i(t)R^{\Delta t}\!$$
 * align="right" |$$n_i(t+\Delta t) - n_i(t)\!$$
 * $$= n_i(t)R^{\Delta t} - n_i(t)\!$$
 * align="right" |$$\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}$$
 * $$=\frac{n_i(t)R^{\Delta t} - n_i(t)}{\Delta t}$$
 * align="right" |$$\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}$$
 * $$=n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}$$
 * align="right" |$$\lim_{\Delta t \to 0} \left[{n_i(t+\Delta t) - n_i(t) \over \Delta t}\right]$$
 * $$=\lim_{\Delta t \to 0} \left[ n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}\right]$$
 * align="right" |$$\frac{d n_i(t)}{dt}$$
 * $$=n_i(t) \lim_{\Delta t \to 0} \left[\frac{R^{\Delta t} - 1}{\Delta t}\right]$$
 * align="right" |$$\frac{d n_i(t)}{dt}$$
 * $$=n_i(t) \ln R\!$$
 * }
 * align="right" |$$\frac{d n_i(t)}{dt}$$
 * $$=n_i(t) \lim_{\Delta t \to 0} \left[\frac{R^{\Delta t} - 1}{\Delta t}\right]$$
 * align="right" |$$\frac{d n_i(t)}{dt}$$
 * $$=n_i(t) \ln R\!$$
 * }
 * }


 * $$\lim_{\Delta t \to 0} \left[{R^{\Delta t} - 1 \over \Delta t}\right]$$
 * $$= \lim_{\epsilon \to 0} \left[\frac{R^{\epsilon} - 1}{\epsilon}\right]$$
 * $$=\lim_{\epsilon \to 0} \left[\frac{\frac{d}{d\epsilon}\left(R^{\epsilon} - 1\right)}{\frac{d}{d\epsilon}\epsilon}\right]$$
 * $$=\lim_{\epsilon \to 0} \left[\frac{R^{\epsilon}\ln R}{1}\right]$$
 * $$=\ln R \lim_{\epsilon \to 0} \left[R^{\epsilon}\right]$$
 * $$=\ln R\!$$
 * }
 * $$=\lim_{\epsilon \to 0} \left[\frac{R^{\epsilon}\ln R}{1}\right]$$
 * $$=\ln R \lim_{\epsilon \to 0} \left[R^{\epsilon}\right]$$
 * $$=\ln R\!$$
 * }
 * $$=\ln R \lim_{\epsilon \to 0} \left[R^{\epsilon}\right]$$
 * $$=\ln R\!$$
 * }
 * $$=\ln R\!$$
 * }

The solution to the equation
 * $$\frac{d n_i(t)}{dt} = n_i(t) \ln R$$

is
 * $$n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\!$$

Note that the continuous case and the original discrete-generation case agree for all integer values of $$t$$. We can define the instantaneous growth rate $$r = \ln R$$ for convenience.

Continuous rate of change
If two organisms grow at different rates, how do their proportions in the population change over time? Let $$r_1$$ and $$r_2$$ be the instantaneous rates of increase of type 1 and type 2, respectively. Then
 * $${dn_i(t) \over dt} = r_i n_i(t).$$

With the total population size
 * $$n(t) = n_1(t) + n_2(t)\!$$

we have the proportion of type 1
 * $$p(t) = {n_1(t) \over n(t)}$$

Define the fitness advantage
 * $$s \equiv s_{12} = r_1 - r_2\!$$

Given our interest in understanding the change in gene frequencies, our goal is to compute the rate of change of $$p(t)$$.

This result says that the proportion of type 1, $$p$$, changes most rapidly when $$p=0.5$$ and most slowly when $$p$$ is very close to 0 or 1.
 * $${\partial p(t) \over \partial t}$$
 * $$= {\partial \over \partial t}\left({n_1(t) \over n(t)}\right)$$
 * $$= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}{\partial n(t) \over \partial t}$$
 * $$= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}\left({\partial n_1(t) \over \partial t} + {\partial n_2(t) \over \partial t}\right)$$
 * $$= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + r_2 n_2(t)\right)$$
 * $$= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + (r_1-s)(n(t)-n_1(t))\right)$$
 * $$= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n(t) -s n(t) + s n_1(t))\right)$$
 * $$= {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right)$$
 * $$= s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right)$$
 * $$= s p(t)(1-p(t))\!$$
 * }
 * $$= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + (r_1-s)(n(t)-n_1(t))\right)$$
 * $$= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n(t) -s n(t) + s n_1(t))\right)$$
 * $$= {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right)$$
 * $$= s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right)$$
 * $$= s p(t)(1-p(t))\!$$
 * }
 * $$= {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right)$$
 * $$= s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right)$$
 * $$= s p(t)(1-p(t))\!$$
 * }
 * $$= s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right)$$
 * $$= s p(t)(1-p(t))\!$$
 * }
 * $$= s p(t)(1-p(t))\!$$
 * }
 * }

Evolution is linear on a log-odds scale
The logit function $$\mathrm{logit} (p) = \ln {p \over 1-p}$$, which takes $$p \in [0,1] \to \mathbb{R}$$, induces a more natural space for considering changes in frequencies. Rather than tracking the proportion of type 1 or 2, we instead track their log odds. In logit terms, with $$L_p(t) \equiv \mathrm{logit} (p(t))\!$$,




 * $${\partial L_p(t) \over \partial t} $$
 * $$= {\partial \over \partial t}\left(\ln {p(t) \over 1-p(t)}\right)$$
 * $$= {\partial \over \partial t}\left(\ln {n_1(t) \over n_2(t)}\right)$$
 * $$= {\partial \over \partial t}\left(\ln {n_1(0) \over n_2(0)} e^{st}\right)$$
 * $$= s. \!$$
 * }
 * $$= {\partial \over \partial t}\left(\ln {n_1(0) \over n_2(0)} e^{st}\right)$$
 * $$= s. \!$$
 * }
 * $$= s. \!$$
 * }
 * }

This differential equation $$L_p'(t) = s$$ has the solution


 * $$L_p(t) = L_p(0) + st\!$$

showing that the log-odds of finding type 1 changes linearly in time, increasing if $$s>0$$ and decreasing if $$s<0$$.

Diffusion approximation
Insert math here.

Statistical analysis of relative growth rates
We have three strains, $$i$$, $$j$$ and $$r$$, where $$r$$ is a reference strain. Strains $$i$$ and $$j$$ have fitness $$w_i = e^{r_i}$$ and $$w_j=e^{r_j}$$. Define the selection coefficient $$s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j$$ as usual. We have data consisting of triples ($$g=$$number of generations, $$n_i=$$number of cells of type $$i$$, $$n_r=$$number of cells of type $$r$$). We have data consisting of pairs ($$g=$$number of generations, $$p_{ir}= n_i/n_r$$) where $$n_i$$=number of cells of type $$i$$ and $$n_r=$$number of cells of type $$r$$.

What is the best estimate, and error, on $$s_{ij}$$?

Model
Assuming exponential growth, $$\ln p_{ir} = $$

Let $$\Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij})$$.

Maximum-likelihood approach
Add text.

Bayesian approach
Add text.