Difference between revisions of "Drummond:PopGen"

From OpenWetWare
Jump to: navigation, search
(Per-generation and instantaneous growth rates)
(Continuous rate of change)
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{Drummond_Top}}
 
{{Drummond_Top}}
 
<div style="width: 750px">
 
<div style="width: 750px">
 +
==Introduction==
 +
Here I will treat some basic questions in population genetics.  For personal reasons, I tend to include all the algebra.
 +
 
==Per-generation and instantaneous growth rates==
 
==Per-generation and instantaneous growth rates==
 
<p>
 
<p>
Let <math>n_i(t)</math> be the number of organisms of type <math>i</math> at time <math>t</math>, and let <math>R</math> be the ''per-capita reproductive rate'' per generation.  If <math>t</math> counts generations, then
+
What is the relationship between per-generation growth rates and the Malthusian parameter, the instantaneous rate of growth?
 +
</p>
 +
<p>
 +
Let <math>n_i(t)</math> be the number of organisms of type <math>i</math> at time <math>t</math>, and let <math>R</math> be the ''per-capita reproductive rate per generation''.  If <math>t</math> counts generations, then
 
:<math>n_i(t+1) = n_i(t)R\!</math>
 
:<math>n_i(t+1) = n_i(t)R\!</math>
 
and
 
and
Line 60: Line 66:
 
is
 
is
 
:<math>n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\!</math>
 
:<math>n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\!</math>
Note that the continuous case and the original discrete-generation case agree for all values of <math>t</math>.  We can define the ''instantaneous rate of increase'' <math>r = \ln R</math> for convenience.
+
Note that the continuous case and the original discrete-generation case agree for all integer values of <math>t</math>.  We can define the ''instantaneous growth rate'' <math>r = \ln R</math> for convenience.
 
</p>
 
</p>
  
 
==Continuous rate of change==
 
==Continuous rate of change==
 
+
<p>
 +
If two organisms grow at different rates, how do their proportions in the population change over time?
 +
</p>
 +
<p>
 
Let <math>r_1</math> and <math>r_2</math> be the instantaneous rates of increase of type 1 and type 2, respectively.  Then
 
Let <math>r_1</math> and <math>r_2</math> be the instantaneous rates of increase of type 1 and type 2, respectively.  Then
 
:<math>{dn_i(t) \over dt} = r_i n_i(t).</math>
 
:<math>{dn_i(t) \over dt} = r_i n_i(t).</math>
 
With the total population size
 
With the total population size
:<math>n(t) = n_1(t) + n_2(t)</math>
+
:<math>n(t) = n_1(t) + n_2(t)\!</math>
 
we have the proportion of type 1
 
we have the proportion of type 1
 
:<math>p(t) = {n_1(t) \over n(t)}</math>
 
:<math>p(t) = {n_1(t) \over n(t)}</math>
Line 86: Line 95:
 
|-
 
|-
 
|
 
|
|<math>= {\left({r_1 n_1(t) \over n(t)}\right) - {n_1(t) \over n(t)^2}\left({\partial n_1(t) \over \partial t} + {\partial n_2(t) \over \partial t}\right)</math>
+
|<math>= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + r_2 n_2(t)\right)</math>
 +
|-
 +
|
 +
|<math>= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + (r_1-s)(n(t)-n_1(t))\right)</math>
 +
|-
 +
|
 +
|<math>= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n(t) -s n(t) + s n_1(t))\right)</math>
 +
|-
 +
|
 +
|<math>= {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right)</math>
 +
|-
 +
|
 +
|<math>= s{n_1(t) \over n(t)}\left(1  - {n_1(t) \over n(t)}\right)</math>
 +
|-
 +
|
 +
|<math>= s p(t)(1-p(t))\!</math>
 +
|}
 +
This result says that the proportion of type 1, <math>p</math>, changes most rapidly when <math>p=0.5</math> and most slowly when <math>p</math> is very close to 0 or 1.
 +
 
 +
==Evolution is linear on a log-odds scale==
 +
The logit function <math>\mathrm{logit} (p) = \ln {p \over 1-p}</math>, which takes <math>p \in [0,1] \to \mathbb{R}</math>, induces a more natural space for considering changes in frequencies.  Rather than tracking the proportion of type 1 or 2, we instead track their log odds.  In logit terms, with <math>L_p(t) \equiv \mathrm{logit} (p(t))\!</math>,
 +
 
 +
:{|
 +
|<math>{\partial L_p(t) \over \partial t} </math>
 +
|<math>= {\partial  \over \partial t}\left(\ln {p(t) \over 1-p(t)}\right)</math>
 +
|-
 +
|
 +
|<math>= {\partial \over \partial t}\left(\ln {n_1(t) \over n_2(t)}\right)</math>
 +
|-
 +
|
 +
|<math>= {\partial  \over \partial t}\left(\ln {n_1(0) \over n_2(0)} e^{st}\right)</math>
 +
|-
 +
|
 +
|<math>= s. \!</math>
 
|}
 
|}
 +
 +
This differential equation <math>L_p'(t) = s</math> has the solution
 +
 +
:<math>L_p(t) = L_p(0) + st\!</math>
 +
 +
showing that the log-odds of finding type 1 changes linearly in time, increasing if <math>s>0</math> and decreasing if <math>s<0</math>.
  
 
==Diffusion approximation==
 
==Diffusion approximation==
 +
Insert math here.
  
==Diffusion approximation==
+
==Statistical analysis of relative growth rates==
 +
We have three strains, <math>i</math>, <math>j</math> and <math>r</math>, where <math>r</math> is a reference strain.
 +
Strains <math>i</math> and <math>j</math> have fitness <math>w_i = e^{r_i}</math> and <math>w_j=e^{r_j}</math>.  Define the selection coefficient <math>s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j</math> as usual.
 +
We have data consisting of triples (<math>g=</math>number of generations, <math>n_i=</math>number of cells of type <math>i</math>, <math>n_r=</math>number of cells of type <math>r</math>).
 +
We have data consisting of pairs (<math>g=</math>number of generations, <math>p_{ir}= n_i/n_r</math>) where <math>n_i</math>=number of cells of type <math>i</math> and <math>n_r=</math>number of cells of type <math>r</math>.
 +
 
 +
What is the best estimate, and error, on <math>s_{ij}</math>?
 +
 
 +
===Model===
 +
Assuming exponential growth, <math>\ln p_{ir} = </math>
 +
 
 +
Let <math>\Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij})</math>.
 +
 
 +
===Maximum-likelihood approach===
 +
Add text.
 +
 
 +
===Bayesian approach===
 +
Add text.

Latest revision as of 19:40, 28 March 2011

Introduction

Here I will treat some basic questions in population genetics. For personal reasons, I tend to include all the algebra.

Per-generation and instantaneous growth rates

What is the relationship between per-generation growth rates and the Malthusian parameter, the instantaneous rate of growth?

Let [math]n_i(t)[/math] be the number of organisms of type [math]i[/math] at time [math]t[/math], and let [math]R[/math] be the per-capita reproductive rate per generation. If [math]t[/math] counts generations, then

[math]n_i(t+1) = n_i(t)R\![/math]
and
[math]n_i(t) = n_i(0)R^t.\![/math]

Now we wish to move to the case where [math]t[/math] is continuous and real-valued. As before,

[math]n_i(t+1) = n_i(t)R\![/math]
but now
[math]n_i(t+\Delta t)\![/math] [math]=n_i(t)R^{\Delta t}\![/math]
[math]n_i(t+\Delta t) - n_i(t)\![/math] [math]= n_i(t)R^{\Delta t} - n_i(t)\![/math]
[math]\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}[/math] [math]=\frac{n_i(t)R^{\Delta t} - n_i(t)}{\Delta t}[/math]
[math]\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}[/math] [math]=n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}[/math]
[math]\lim_{\Delta t \to 0} \left[{n_i(t+\Delta t) - n_i(t) \over \Delta t}\right][/math] [math]=\lim_{\Delta t \to 0} \left[ n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}\right][/math]
[math]\frac{d n_i(t)}{dt}[/math] [math]=n_i(t) \lim_{\Delta t \to 0} \left[\frac{R^{\Delta t} - 1}{\Delta t}\right][/math]
[math]\frac{d n_i(t)}{dt}[/math] [math]=n_i(t) \ln R\![/math]

where the last simplification follows from L'Hôpital's rule. Explicitly, let [math]\epsilon=\Delta t[/math]. Then

[math]\lim_{\Delta t \to 0} \left[{R^{\Delta t} - 1 \over \Delta t}\right][/math] [math]= \lim_{\epsilon \to 0} \left[\frac{R^{\epsilon} - 1}{\epsilon}\right][/math]
[math]=\lim_{\epsilon \to 0} \left[\frac{\frac{d}{d\epsilon}\left(R^{\epsilon} - 1\right)}{\frac{d}{d\epsilon}\epsilon}\right][/math]
[math]=\lim_{\epsilon \to 0} \left[\frac{R^{\epsilon}\ln R}{1}\right][/math]
[math]=\ln R \lim_{\epsilon \to 0} \left[R^{\epsilon}\right][/math]
[math]=\ln R\![/math]

The solution to the equation

[math]\frac{d n_i(t)}{dt} = n_i(t) \ln R[/math]
is
[math]n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\![/math]
Note that the continuous case and the original discrete-generation case agree for all integer values of [math]t[/math]. We can define the instantaneous growth rate [math]r = \ln R[/math] for convenience.

Continuous rate of change

If two organisms grow at different rates, how do their proportions in the population change over time?

Let [math]r_1[/math] and [math]r_2[/math] be the instantaneous rates of increase of type 1 and type 2, respectively. Then

[math]{dn_i(t) \over dt} = r_i n_i(t).[/math]
With the total population size
[math]n(t) = n_1(t) + n_2(t)\![/math]
we have the proportion of type 1
[math]p(t) = {n_1(t) \over n(t)}[/math]
Define the fitness advantage
[math]s \equiv s_{12} = r_1 - r_2\![/math]
Given our interest in understanding the change in gene frequencies, our goal is to compute the rate of change of [math]p(t)[/math].
[math]{\partial p(t) \over \partial t}[/math] [math]= {\partial \over \partial t}\left({n_1(t) \over n(t)}\right)[/math]
[math]= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}{\partial n(t) \over \partial t}[/math]
[math]= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}\left({\partial n_1(t) \over \partial t} + {\partial n_2(t) \over \partial t}\right)[/math]
[math]= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + r_2 n_2(t)\right)[/math]
[math]= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + (r_1-s)(n(t)-n_1(t))\right)[/math]
[math]= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n(t) -s n(t) + s n_1(t))\right)[/math]
[math]= {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right)[/math]
[math]= s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right)[/math]
[math]= s p(t)(1-p(t))\![/math]

This result says that the proportion of type 1, [math]p[/math], changes most rapidly when [math]p=0.5[/math] and most slowly when [math]p[/math] is very close to 0 or 1.

Evolution is linear on a log-odds scale

The logit function [math]\mathrm{logit} (p) = \ln {p \over 1-p}[/math], which takes [math]p \in [0,1] \to \mathbb{R}[/math], induces a more natural space for considering changes in frequencies. Rather than tracking the proportion of type 1 or 2, we instead track their log odds. In logit terms, with [math]L_p(t) \equiv \mathrm{logit} (p(t))\![/math],

[math]{\partial L_p(t) \over \partial t} [/math] [math]= {\partial \over \partial t}\left(\ln {p(t) \over 1-p(t)}\right)[/math]
[math]= {\partial \over \partial t}\left(\ln {n_1(t) \over n_2(t)}\right)[/math]
[math]= {\partial \over \partial t}\left(\ln {n_1(0) \over n_2(0)} e^{st}\right)[/math]
[math]= s. \![/math]

This differential equation [math]L_p'(t) = s[/math] has the solution

[math]L_p(t) = L_p(0) + st\![/math]

showing that the log-odds of finding type 1 changes linearly in time, increasing if [math]s\gt 0[/math] and decreasing if [math]s\lt 0[/math].

Diffusion approximation

Insert math here.

Statistical analysis of relative growth rates

We have three strains, [math]i[/math], [math]j[/math] and [math]r[/math], where [math]r[/math] is a reference strain. Strains [math]i[/math] and [math]j[/math] have fitness [math]w_i = e^{r_i}[/math] and [math]w_j=e^{r_j}[/math]. Define the selection coefficient [math]s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j[/math] as usual. We have data consisting of triples ([math]g=[/math]number of generations, [math]n_i=[/math]number of cells of type [math]i[/math], [math]n_r=[/math]number of cells of type [math]r[/math]). We have data consisting of pairs ([math]g=[/math]number of generations, [math]p_{ir}= n_i/n_r[/math]) where [math]n_i[/math]=number of cells of type [math]i[/math] and [math]n_r=[/math]number of cells of type [math]r[/math].

What is the best estimate, and error, on [math]s_{ij}[/math]?

Model

Assuming exponential growth, [math]\ln p_{ir} = [/math]

Let [math]\Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij})[/math].

Maximum-likelihood approach

Add text.

Bayesian approach

Add text.