Difference between revisions of "User:Timothee Flutre/Notebook/Postdoc/2011/11/10"

From OpenWetWare
Jump to: navigation, search
(Bayesian model of univariate linear regression for QTL detection: finish posterior tau)
(Bayesian model of univariate linear regression for QTL detection: fix typo)
Line 12: Line 12:
  
  
* '''Data''': let's assume that we obtained data from N individuals. We note <math>y_1,\ldots,y_N</math> the (quantitative) phenotypes (e.g. expression level at a given gene), and <math>g_1,\ldots,g_N</math> the genotypes at a given SNP (as allele dose, 0, 1 or 2).
+
* '''Data''': let's assume that we obtained data from N individuals. We note <math>y_1,\ldots,y_N</math> the (quantitative) phenotypes (e.g. expression levels at a given gene), and <math>g_1,\ldots,g_N</math> the genotypes at a given SNP (encoded as allele dose: 0, 1 or 2).
  
  
Line 21: Line 21:
  
  
* '''Likelihood''': <math>\forall i \in \{1,\ldots,N\}, \; y_i = \mu + \beta_1 g_i + \beta_2 \mathbf{1}_{g_i=1} + \epsilon_i \text{ with } \epsilon_i \overset{i.i.d}{\sim} \mathcal{N}(0,\tau^{-1})</math>
+
* '''Likelihood''': we start by writing the usual linear regression for one individual
 +
 
 +
<math>\forall i \in \{1,\ldots,N\}, \; y_i = \mu + \beta_1 g_i + \beta_2 \mathbf{1}_{g_i=1} + \epsilon_i \text{ with } \epsilon_i \overset{i.i.d}{\sim} \mathcal{N}(0,\tau^{-1})</math>
  
 
where <math>\beta_1</math> is in fact the additive effect of the SNP, noted <math>a</math> from now on, and <math>\beta_2</math> is the dominance effect of the SNP, <math>d = a k</math>.
 
where <math>\beta_1</math> is in fact the additive effect of the SNP, noted <math>a</math> from now on, and <math>\beta_2</math> is the dominance effect of the SNP, <math>d = a k</math>.
  
Let's now write in matrix notation:
+
Let's now write the model in matrix notation:
  
 
<math>Y = X B + E \text{ where } B = [ \mu \; a \; d ]^T</math>
 
<math>Y = X B + E \text{ where } B = [ \mu \; a \; d ]^T</math>
  
which gives the following conditional distribution for the phenotypes:
+
This gives the following [http://en.wikipedia.org/wiki/Multivariate_normal_distribution multivariate Normal distribution] for the phenotypes:
  
<math>Y | X, B, \tau \sim \mathcal{N}(XB, \tau^{-1} I_N)</math>
+
<math>Y | X, \tau, B \sim \mathcal{N}(XB, \tau^{-1} I_N)</math>
  
 
The likelihood of the parameters given the data is therefore:
 
The likelihood of the parameters given the data is therefore:
Line 37: Line 39:
 
<math>\mathcal{L}(\tau, B) = \mathsf{P}(Y | X, \tau, B)</math>
 
<math>\mathcal{L}(\tau, B) = \mathsf{P}(Y | X, \tau, B)</math>
  
<math>\mathcal{L}(\tau, B) = \left(\frac{\tau}{2 \pi}\right)^{N/2} exp \left( -\frac{\tau}{2} (Y - XB)^T (Y - XB) \right)</math>
+
<math>\mathcal{L}(\tau, B) = \left(\frac{\tau}{2 \pi}\right)^{\frac{N}{2}} exp \left( -\frac{\tau}{2} (Y - XB)^T (Y - XB) \right)</math>
  
  
Line 43: Line 45:
  
 
<math>\mathsf{P}(\tau, B) = \mathsf{P}(\tau) \mathsf{P}(B | \tau)</math>
 
<math>\mathsf{P}(\tau, B) = \mathsf{P}(\tau) \mathsf{P}(B | \tau)</math>
 +
 +
A [http://en.wikipedia.org/wiki/Gamma_distribution Gamma distribution] for <math>\tau</math>:
  
 
<math>\tau \sim \Gamma(\kappa/2, \, \lambda/2)</math>
 
<math>\tau \sim \Gamma(\kappa/2, \, \lambda/2)</math>
 +
 +
which means:
 +
 +
<math>\mathsf{P}(\tau) = \frac{\frac{\lambda}{2}^{\kappa/2}}{\Gamma(\frac{\kappa}{2})} \tau^{\frac{\kappa}{2}-1} e^{-\frac{\lambda}{2} \tau}</math>
 +
 +
And a multivariate Normal distribution for <math>B</math>:
  
 
<math>B | \tau \sim \mathcal{N}(\vec{0}, \, \tau^{-1} \Sigma_B) \text{ with } \Sigma_B = diag(\sigma_{\mu}^2, \sigma_a^2, \sigma_d^2)</math>
 
<math>B | \tau \sim \mathcal{N}(\vec{0}, \, \tau^{-1} \Sigma_B) \text{ with } \Sigma_B = diag(\sigma_{\mu}^2, \sigma_a^2, \sigma_d^2)</math>
 +
 +
which means:
 +
 +
<math>\mathsf{P}(B | \tau) = \left(\frac{\tau}{2 \pi}\right)^{\frac{3}{2}} |\Sigma_B|^{-\frac{1}{2}} exp \left(-\frac{\tau}{2} B^T \Sigma_B^{-1} B \right)</math>
  
  
Line 104: Line 118:
 
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on <math>B</math>!):
 
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on <math>B</math>!):
  
<math>\mathsf{P}(\tau | Y, X) \propto \tau^{\frac{\kappa}{2} - 1} e^{-\frac{\lambda}{2} \tau} \int \tau^{1/2} \tau^{N/2} exp(-\frac{\tau}{2} B^T \Sigma_B^{-1} B) exp(-\frac{\tau}{2} (Y - XB)^T (Y - XB)) \mathsf{d}B</math>
+
<math>\mathsf{P}(\tau | Y, X) \propto \tau^{\frac{\kappa}{2} - 1} e^{-\frac{\lambda}{2} \tau} \int \tau^{3/2} \tau^{N/2} exp(-\frac{\tau}{2} B^T \Sigma_B^{-1} B) exp(-\frac{\tau}{2} (Y - XB)^T (Y - XB)) \mathsf{d}B</math>
  
 
As we used a conjugate prior for <math>\tau</math>, we know that we expect a Gamma distribution for the posterior.
 
As we used a conjugate prior for <math>\tau</math>, we know that we expect a Gamma distribution for the posterior.
Line 110: Line 124:
 
We also factorize inside the exponential:
 
We also factorize inside the exponential:
  
<math>\mathsf{P}(\tau | Y, X) \propto \tau^{\frac{N+\kappa}{2} - 1} e^{-\frac{\lambda}{2} \tau} \int \tau^{1/2} exp \left[-\frac{\tau}{2} \left( (B - \Omega X^T Y)^T \Omega^{-1} (B - \Omega X^T Y) - Y^T X \Omega X^T Y + Y^T Y \right) \right] \mathsf{d}B</math>
+
<math>\mathsf{P}(\tau | Y, X) \propto \tau^{\frac{N+\kappa}{2} - 1} e^{-\frac{\lambda}{2} \tau} \int \tau^{3/2} exp \left[-\frac{\tau}{2} \left( (B - \Omega X^T Y)^T \Omega^{-1} (B - \Omega X^T Y) - Y^T X \Omega X^T Y + Y^T Y \right) \right] \mathsf{d}B</math>
  
 
We recognize the conditional posterior of <math>B</math>.
 
We recognize the conditional posterior of <math>B</math>.
Line 117: Line 131:
 
<math>\mathsf{P}(\tau | Y, X) \propto \tau^{\frac{N+\kappa}{2} - 1} e^{-\frac{\lambda}{2} \tau} exp\left[-\frac{\tau}{2} (Y^T X \Omega X^T Y + Y^T Y) \right]</math>
 
<math>\mathsf{P}(\tau | Y, X) \propto \tau^{\frac{N+\kappa}{2} - 1} e^{-\frac{\lambda}{2} \tau} exp\left[-\frac{\tau}{2} (Y^T X \Omega X^T Y + Y^T Y) \right]</math>
  
We finally recognize the following Gamma distribution:
+
We finally recognize a Gamma distribution, allowing us to write the posterior as:
  
 
<math>\tau | Y, X \sim \Gamma \left( \frac{N+\kappa}{2}, \; \frac{1}{2} (Y^T X \Omega X^T Y + Y^T Y + \lambda) \right)</math>
 
<math>\tau | Y, X \sim \Gamma \left( \frac{N+\kappa}{2}, \; \frac{1}{2} (Y^T X \Omega X^T Y + Y^T Y + \lambda) \right)</math>

Revision as of 19:24, 21 November 2012

Owwnotebook icon.png Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Bayesian model of univariate linear regression for QTL detection

See Servin & Stephens (PLoS Genetics, 2007).


  • Data: let's assume that we obtained data from N individuals. We note the (quantitative) phenotypes (e.g. expression levels at a given gene), and the genotypes at a given SNP (encoded as allele dose: 0, 1 or 2).


  • Goal: we want to assess the evidence in the data for an effect of the genotype on the phenotype.


  • Assumptions: the relationship between genotype and phenotype is linear; the individuals are not genetically related; there is no hidden confounding factors in the phenotypes.


  • Likelihood: we start by writing the usual linear regression for one individual

where is in fact the additive effect of the SNP, noted from now on, and is the dominance effect of the SNP, .

Let's now write the model in matrix notation:

This gives the following multivariate Normal distribution for the phenotypes:

The likelihood of the parameters given the data is therefore:


  • Priors: we use the usual conjugate prior

A Gamma distribution for :

which means:

And a multivariate Normal distribution for :

which means:


  • Joint posterior:


  • Conditional posterior of B:

Here and in the following, we neglect all constants (e.g. normalization constant, , etc):

We use the prior and likelihood and keep only the terms in :

We expand:

We factorize some terms:

Let's define . We can see that , which means that is a symmetric matrix. This is particularly useful here because we can use the following equality: .

This now becomes easy to factorizes totally:

We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:


  • Posterior of :

Similarly to the equations above:

But now, to handle the second term, we need to integrate over , thus effectively taking into account the uncertainty in :

Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on !):

As we used a conjugate prior for , we know that we expect a Gamma distribution for the posterior. Therefore, we can take out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:

We recognize the conditional posterior of . This allows us to use the fact that the pdf of the Normal distribution integrates to one:

We finally recognize a Gamma distribution, allowing us to write the posterior as: