Difference between revisions of "User:Timothee Flutre/Notebook/Postdoc/2011/11/10"

From OpenWetWare
Jump to: navigation, search
(Entry title: first version)
(Bayesian model of univariate linear regression for QTL detection: add conditional posterior of B)
Line 15: Line 15:
  
  
* '''Goal''': we want (i) to assess the evidence in the data for an effect of the genotype on the phenotype, and (ii) estimate the posterior distribution of this effect.
+
* '''Goal''': we want to assess the evidence in the data for an effect of the genotype on the phenotype.
  
  
Line 45: Line 45:
  
 
<math>B | \tau \sim \mathcal{N}(\vec{0}, \, \tau^{-1} \Sigma_B) \text{ with } \Sigma_B = diag(\sigma_{\mu}^2, \sigma_a^2, \sigma_d^2)</math>
 
<math>B | \tau \sim \mathcal{N}(\vec{0}, \, \tau^{-1} \Sigma_B) \text{ with } \Sigma_B = diag(\sigma_{\mu}^2, \sigma_a^2, \sigma_d^2)</math>
 +
 +
 +
* '''Joint posterior''':
 +
 +
<math>\mathsf{P}(\tau, B | Y, X) = \mathsf{P}(\tau | Y, X) \mathsf{P}(B | Y, X, \tau)</math>
 +
 +
 +
* '''Conditional posterior of B''':
 +
 +
<math>\mathsf{P}(B | Y, X, \tau) = \mathsf{P}(B, Y | X, \tau)</math>
 +
 +
<math>\mathsf{P}(B | Y, X, \tau) = \frac{\mathsf{P}(B, Y | X, \tau)}{\mathsf{P}(Y | X, \tau)}</math>
 +
 +
<math>\mathsf{P}(B | Y, X, \tau) = \frac{\mathsf{P}(B | \tau) \mathsf{P}(Y | X, B, \tau)}{\int \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B) \mathsf{d}B}</math>
 +
 +
Here and in the following, we neglect all constants (e.g. normalization constant, <math>Y^TY</math>, etc):
 +
 +
<math>\mathsf{P}(B | Y, X, \tau) \propto \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B)</math>
 +
 +
We use the prior and likelihood and keep only the terms in <math>B</math>:
 +
 +
<math>\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B) exp((Y-XB)^T(Y-XB))</math>
 +
 +
We expand:
 +
 +
<math>\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B - Y^TXB -B^TX^TY + B^TX^TXB)</math>
 +
 +
We factorize some terms:
 +
 +
<math>\mathsf{P}(B | Y, X, \tau) \propto exp(B^T (\Sigma_B^{-1} + X^TX) B - Y^TXB -B^TX^TY)</math>
 +
 +
Let's define <math>\Omega = (\Sigma_B^{-1} + X^TX)^{-1}</math>. We can see that <math>\Omega^T=\Omega</math>, which means that <math>\Omega</math> is a [http://en.wikipedia.org/wiki/Symmetric_matrix symmetric matrix].
 +
This is particularly useful here because we can use the following equality: <math>\Omega^{-1}\Omega^T=I</math>.
 +
 +
<math>\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Omega^{-1} B - (X^TY)^T\Omega^{-1}\Omega^TB -B^T\Omega^{-1}\Omega^TX^TY)</math>
 +
 +
This now becomes easy to factorizes totally:
 +
 +
<math>\mathsf{P}(B | Y, X, \tau) \propto exp((B^T - \Omega X^TY)^T\Omega^{-1}(B - \Omega X^TY))</math>
 +
 +
We recognize the [http://en.wikipedia.org/wiki/Kernel_%28statistics%29 kernel] of a Normal distribution, allowing us to write the conditional posterior as:
 +
 +
<math>B | Y, X, \tau \sim \mathcal{N}(\Omega X^TY, \tau^{-1} \Omega)</math>
  
 
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
 
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 10:30, 21 November 2012

Owwnotebook icon.png Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Bayesian model of univariate linear regression for QTL detection

See Servin & Stephens (PLoS Genetics, 2007).


  • Data: let's assume that we obtained data from N individuals. We note [math]y_1,\ldots,y_N[/math] the (quantitative) phenotypes (e.g. expression level at a given gene), and [math]g_1,\ldots,g_N[/math] the genotypes at a given SNP (as allele dose, 0, 1 or 2).


  • Goal: we want to assess the evidence in the data for an effect of the genotype on the phenotype.


  • Assumptions: the relationship between genotype and phenotype is linear; the individuals are not genetically related; there is no hidden confounding factors in the phenotypes.


  • Likelihood:

[math]\forall i \in \{1,\ldots,N\}, \; y_i = \mu + \beta_1 g_i + \beta_2 \mathbf{1}_{g_i=1} + \epsilon_i[/math]

with: [math]\epsilon_i \overset{i.i.d}{\sim} \mathcal{N}(0,\tau^{-1})[/math]

where [math]\beta_1[/math] is in fact the additive effect of the SNP, noted [math]a[/math] from now on, and [math]\beta_2[/math] is the dominance effect of the SNP, [math]d = a k[/math].

Let's now write in matrix notation:

[math]Y = X B + E[/math]

where [math]B = [ \mu \; a \; d ]^T[/math]

which gives the following conditional distribution for the phenotypes:

[math]Y | X, B, \tau \sim \mathcal{N}(XB, \tau^{-1} I_N)[/math]


  • Priors: conjugate

[math]\tau \sim \Gamma(\kappa/2, \, \lambda/2)[/math]

[math]B | \tau \sim \mathcal{N}(\vec{0}, \, \tau^{-1} \Sigma_B) \text{ with } \Sigma_B = diag(\sigma_{\mu}^2, \sigma_a^2, \sigma_d^2)[/math]


  • Joint posterior:

[math]\mathsf{P}(\tau, B | Y, X) = \mathsf{P}(\tau | Y, X) \mathsf{P}(B | Y, X, \tau)[/math]


  • Conditional posterior of B:

[math]\mathsf{P}(B | Y, X, \tau) = \mathsf{P}(B, Y | X, \tau)[/math]

[math]\mathsf{P}(B | Y, X, \tau) = \frac{\mathsf{P}(B, Y | X, \tau)}{\mathsf{P}(Y | X, \tau)}[/math]

[math]\mathsf{P}(B | Y, X, \tau) = \frac{\mathsf{P}(B | \tau) \mathsf{P}(Y | X, B, \tau)}{\int \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B) \mathsf{d}B}[/math]

Here and in the following, we neglect all constants (e.g. normalization constant, [math]Y^TY[/math], etc):

[math]\mathsf{P}(B | Y, X, \tau) \propto \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B)[/math]

We use the prior and likelihood and keep only the terms in [math]B[/math]:

[math]\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B) exp((Y-XB)^T(Y-XB))[/math]

We expand:

[math]\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B - Y^TXB -B^TX^TY + B^TX^TXB)[/math]

We factorize some terms:

[math]\mathsf{P}(B | Y, X, \tau) \propto exp(B^T (\Sigma_B^{-1} + X^TX) B - Y^TXB -B^TX^TY)[/math]

Let's define [math]\Omega = (\Sigma_B^{-1} + X^TX)^{-1}[/math]. We can see that [math]\Omega^T=\Omega[/math], which means that [math]\Omega[/math] is a symmetric matrix. This is particularly useful here because we can use the following equality: [math]\Omega^{-1}\Omega^T=I[/math].

[math]\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Omega^{-1} B - (X^TY)^T\Omega^{-1}\Omega^TB -B^T\Omega^{-1}\Omega^TX^TY)[/math]

This now becomes easy to factorizes totally:

[math]\mathsf{P}(B | Y, X, \tau) \propto exp((B^T - \Omega X^TY)^T\Omega^{-1}(B - \Omega X^TY))[/math]

We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:

[math]B | Y, X, \tau \sim \mathcal{N}(\Omega X^TY, \tau^{-1} \Omega)[/math]