Drummond:PopGen

From OpenWetWare
Revision as of 09:50, 22 April 2010 by Dadrummond (talk | contribs) (Statistical analysis of relative growth rates)
Jump to: navigation, search

Introduction

Here I will treat some basic questions in population genetics. For personal reasons, I tend to include all the algebra.

Per-generation and instantaneous growth rates

What is the relationship between per-generation growth rates and the Malthusian parameter, the instantaneous rate of growth?

Let [math]n_i(t)[/math] be the number of organisms of type [math]i[/math] at time [math]t[/math], and let [math]R[/math] be the per-capita reproductive rate per generation. If [math]t[/math] counts generations, then

[math]n_i(t+1) = n_i(t)R\![/math]
and
[math]n_i(t) = n_i(0)R^t.\![/math]

Now we wish to move to the case where [math]t[/math] is continuous and real-valued. As before,

[math]n_i(t+1) = n_i(t)R\![/math]
but now
[math]n_i(t+\Delta t)\![/math] [math]=n_i(t)R^{\Delta t}\![/math]
[math]n_i(t+\Delta t) - n_i(t)\![/math] [math]= n_i(t)R^{\Delta t} - n_i(t)\![/math]
[math]\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}[/math] [math]=\frac{n_i(t)R^{\Delta t} - n_i(t)}{\Delta t}[/math]
[math]\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}[/math] [math]=n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}[/math]
[math]\lim_{\Delta t \to 0} \left[{n_i(t+\Delta t) - n_i(t) \over \Delta t}\right][/math] [math]=\lim_{\Delta t \to 0} \left[ n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}\right][/math]
[math]\frac{d n_i(t)}{dt}[/math] [math]=n_i(t) \lim_{\Delta t \to 0} \left[\frac{R^{\Delta t} - 1}{\Delta t}\right][/math]
[math]\frac{d n_i(t)}{dt}[/math] [math]=n_i(t) \ln R\![/math]

where the last simplification follows from L'Hôpital's rule. Explicitly, let [math]\epsilon=\Delta t[/math]. Then

[math]\lim_{\Delta t \to 0} \left[{R^{\Delta t} - 1 \over \Delta t}\right][/math] [math]= \lim_{\epsilon \to 0} \left[\frac{R^{\epsilon} - 1}{\epsilon}\right][/math]
[math]=\lim_{\epsilon \to 0} \left[\frac{\frac{d}{d\epsilon}\left(R^{\epsilon} - 1\right)}{\frac{d}{d\epsilon}\epsilon}\right][/math]
[math]=\lim_{\epsilon \to 0} \left[\frac{R^{\epsilon}\ln R}{1}\right][/math]
[math]=\ln R \lim_{\epsilon \to 0} \left[R^{\epsilon}\right][/math]
[math]=\ln R\![/math]

The solution to the equation

[math]\frac{d n_i(t)}{dt} = n_i(t) \ln R[/math]
is
[math]n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\![/math]
Note that the continuous case and the original discrete-generation case agree for all integer values of [math]t[/math]. We can define the instantaneous growth rate [math]r = \ln R[/math] for convenience.

Continuous rate of change

If two organisms grow at different rates, how do their proportions in the population change over time?

Let [math]r_1[/math] and [math]r_2[/math] be the instantaneous rates of increase of type 1 and type 2, respectively. Then

[math]{dn_i(t) \over dt} = r_i n_i(t).[/math]
With the total population size
[math]n(t) = n_1(t) + n_2(t)\![/math]
we have the proportion of type 1
[math]p(t) = {n_1(t) \over n(t)}[/math]
Define the fitness advantage
[math]s \equiv s_{12} = r_1 - r_2\![/math]
Given our interest in understanding the change in gene frequencies, our goal is to compute the rate of change of [math]p(t)[/math].
[math]{\partial p(t) \over \partial t}[/math] [math]= {\partial \over \partial t}\left({n_1(t) \over n(t)}\right)[/math]
[math]= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}{\partial n(t) \over \partial t}[/math]
[math]= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}\left({\partial n_1(t) \over \partial t} + {\partial n_2(t) \over \partial t}\right)[/math]
[math]= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + r_2 n_2(t)\right)[/math]
[math]= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + (r_1-s)(n(t)-n_1(t))\right)[/math]
[math]= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n(t) -s n(t) + s n_1(t))\right)[/math]
[math]= {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right)[/math]
[math]= s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right)[/math]
[math]= s p(t)(1-p(t))\![/math]

This result says that the proportion of type 1 [math]p[/math] changes most rapidly when [math]p=0.5[/math] and most slowly when [math]p[/math] is very close to 0 or 1.

Evolution is linear on a log-odds scale

The logit function [math]\mathrm{logit} (p) = \ln {p \over 1-p}[/math], which takes [math]p \in [0,1] \to \mathbb{R}[/math], induces a more natural space for considering changes in frequencies. Rather than tracking the proportion of type 1 or 2, we instead track their log odds. In logit terms, with [math]L_p(t) \equiv \mathrm{logit} (p(t))\![/math],

[math]{\partial L_p(t) \over \partial t} [/math] [math]= {\partial \over \partial t}\left(\ln {p(t) \over 1-p(t)}\right)[/math]
[math]= {\partial \over \partial t}\left(\ln {n_1(t) \over n_2(t)}\right)[/math]
[math]= {\partial \over \partial t}\left(\ln {n_1(0) \over n_2(0)} e^{st}\right)[/math]
[math]= s. \![/math]

This differential equation [math]L_p'(t) = s[/math] has the solution

[math]L_p(t) = L_p(0) + st\![/math]

showing that the log-odds of finding type 1 changes linearly in time, increasing if [math]s\gt 0[/math] and decreasing if [math]s\lt 0[/math].

Diffusion approximation

Insert math here.

Statistical analysis of relative growth rates

We have three strains, [math]i[/math], [math]j[/math] and [math]r[/math], where [math]r[/math] is a reference strain. Strains [math]i[/math] and [math]j[/math] have fitness [math]w_i = e^{r_i}[/math] and [math]w_j=e^{r_j}[/math]. Define the selection coefficient [math]s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j[/math] as usual. We have data consisting of triples ([math]g=[/math]number of generations, [math]n_i=[/math]number of cells of type [math]i[/math], [math]n_r=[/math]number of cells of type [math]r[/math]). We have data consisting of pairs[math]\left(g=[/math]number of generations, [math]p_{ir}= \frac{n_i}{n_r}\right)[/math] where [math]n_i[/math]=number of cells of type [math]i[/math] and [math]n_r=[/math]number of cells of type [math]r[/math].

What is the best estimate, and error, on [math]s_{ij}[/math]?

Model

Assuming exponential growth, [math]\ln p_{ir} = Let \lt math\gt \Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij})[/math].

Maximum-likelihood approach

Add text.

Bayesian approach

Add text.