User:Timothee Flutre/Notebook/Postdoc/2011/11/10
From OpenWetWare
(→Bayesian model of univariate linear regression for QTL detection: finish posterior tau) 
(→Bayesian model of univariate linear regression for QTL detection: fix typo) 

Line 12:  Line 12:  
  * '''Data''': let's assume that we obtained data from N individuals. We note <math>y_1,\ldots,y_N</math> the (quantitative) phenotypes (e.g. expression  +  * '''Data''': let's assume that we obtained data from N individuals. We note <math>y_1,\ldots,y_N</math> the (quantitative) phenotypes (e.g. expression levels at a given gene), and <math>g_1,\ldots,g_N</math> the genotypes at a given SNP (encoded as allele dose: 0, 1 or 2). 
Line 21:  Line 21:  
  * '''Likelihood''': <math>\forall i \in \{1,\ldots,N\}, \; y_i = \mu + \beta_1 g_i + \beta_2 \mathbf{1}_{g_i=1} + \epsilon_i \text{ with } \epsilon_i \overset{i.i.d}{\sim} \mathcal{N}(0,\tau^{1})</math>  +  * '''Likelihood''': we start by writing the usual linear regression for one individual 
+  
+  <math>\forall i \in \{1,\ldots,N\}, \; y_i = \mu + \beta_1 g_i + \beta_2 \mathbf{1}_{g_i=1} + \epsilon_i \text{ with } \epsilon_i \overset{i.i.d}{\sim} \mathcal{N}(0,\tau^{1})</math>  
where <math>\beta_1</math> is in fact the additive effect of the SNP, noted <math>a</math> from now on, and <math>\beta_2</math> is the dominance effect of the SNP, <math>d = a k</math>.  where <math>\beta_1</math> is in fact the additive effect of the SNP, noted <math>a</math> from now on, and <math>\beta_2</math> is the dominance effect of the SNP, <math>d = a k</math>.  
  Let's now write in matrix notation:  +  Let's now write the model in matrix notation: 
<math>Y = X B + E \text{ where } B = [ \mu \; a \; d ]^T</math>  <math>Y = X B + E \text{ where } B = [ \mu \; a \; d ]^T</math>  
  +  This gives the following [http://en.wikipedia.org/wiki/Multivariate_normal_distribution multivariate Normal distribution] for the phenotypes:  
  <math>Y  X  +  <math>Y  X, \tau, B \sim \mathcal{N}(XB, \tau^{1} I_N)</math> 
The likelihood of the parameters given the data is therefore:  The likelihood of the parameters given the data is therefore:  
Line 37:  Line 39:  
<math>\mathcal{L}(\tau, B) = \mathsf{P}(Y  X, \tau, B)</math>  <math>\mathcal{L}(\tau, B) = \mathsf{P}(Y  X, \tau, B)</math>  
  <math>\mathcal{L}(\tau, B) = \left(\frac{\tau}{2 \pi}\right)^{N  +  <math>\mathcal{L}(\tau, B) = \left(\frac{\tau}{2 \pi}\right)^{\frac{N}{2}} exp \left( \frac{\tau}{2} (Y  XB)^T (Y  XB) \right)</math> 
Line 43:  Line 45:  
<math>\mathsf{P}(\tau, B) = \mathsf{P}(\tau) \mathsf{P}(B  \tau)</math>  <math>\mathsf{P}(\tau, B) = \mathsf{P}(\tau) \mathsf{P}(B  \tau)</math>  
+  
+  A [http://en.wikipedia.org/wiki/Gamma_distribution Gamma distribution] for <math>\tau</math>:  
<math>\tau \sim \Gamma(\kappa/2, \, \lambda/2)</math>  <math>\tau \sim \Gamma(\kappa/2, \, \lambda/2)</math>  
+  
+  which means:  
+  
+  <math>\mathsf{P}(\tau) = \frac{\frac{\lambda}{2}^{\kappa/2}}{\Gamma(\frac{\kappa}{2})} \tau^{\frac{\kappa}{2}1} e^{\frac{\lambda}{2} \tau}</math>  
+  
+  And a multivariate Normal distribution for <math>B</math>:  
<math>B  \tau \sim \mathcal{N}(\vec{0}, \, \tau^{1} \Sigma_B) \text{ with } \Sigma_B = diag(\sigma_{\mu}^2, \sigma_a^2, \sigma_d^2)</math>  <math>B  \tau \sim \mathcal{N}(\vec{0}, \, \tau^{1} \Sigma_B) \text{ with } \Sigma_B = diag(\sigma_{\mu}^2, \sigma_a^2, \sigma_d^2)</math>  
+  
+  which means:  
+  
+  <math>\mathsf{P}(B  \tau) = \left(\frac{\tau}{2 \pi}\right)^{\frac{3}{2}} \Sigma_B^{\frac{1}{2}} exp \left(\frac{\tau}{2} B^T \Sigma_B^{1} B \right)</math>  
Line 104:  Line 118:  
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on <math>B</math>!):  Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on <math>B</math>!):  
  <math>\mathsf{P}(\tau  Y, X) \propto \tau^{\frac{\kappa}{2}  1} e^{\frac{\lambda}{2} \tau} \int \tau^{  +  <math>\mathsf{P}(\tau  Y, X) \propto \tau^{\frac{\kappa}{2}  1} e^{\frac{\lambda}{2} \tau} \int \tau^{3/2} \tau^{N/2} exp(\frac{\tau}{2} B^T \Sigma_B^{1} B) exp(\frac{\tau}{2} (Y  XB)^T (Y  XB)) \mathsf{d}B</math> 
As we used a conjugate prior for <math>\tau</math>, we know that we expect a Gamma distribution for the posterior.  As we used a conjugate prior for <math>\tau</math>, we know that we expect a Gamma distribution for the posterior.  
Line 110:  Line 124:  
We also factorize inside the exponential:  We also factorize inside the exponential:  
  <math>\mathsf{P}(\tau  Y, X) \propto \tau^{\frac{N+\kappa}{2}  1} e^{\frac{\lambda}{2} \tau} \int \tau^{  +  <math>\mathsf{P}(\tau  Y, X) \propto \tau^{\frac{N+\kappa}{2}  1} e^{\frac{\lambda}{2} \tau} \int \tau^{3/2} exp \left[\frac{\tau}{2} \left( (B  \Omega X^T Y)^T \Omega^{1} (B  \Omega X^T Y)  Y^T X \Omega X^T Y + Y^T Y \right) \right] \mathsf{d}B</math> 
We recognize the conditional posterior of <math>B</math>.  We recognize the conditional posterior of <math>B</math>.  
Line 117:  Line 131:  
<math>\mathsf{P}(\tau  Y, X) \propto \tau^{\frac{N+\kappa}{2}  1} e^{\frac{\lambda}{2} \tau} exp\left[\frac{\tau}{2} (Y^T X \Omega X^T Y + Y^T Y) \right]</math>  <math>\mathsf{P}(\tau  Y, X) \propto \tau^{\frac{N+\kappa}{2}  1} e^{\frac{\lambda}{2} \tau} exp\left[\frac{\tau}{2} (Y^T X \Omega X^T Y + Y^T Y) \right]</math>  
  We finally recognize  +  We finally recognize a Gamma distribution, allowing us to write the posterior as: 
<math>\tau  Y, X \sim \Gamma \left( \frac{N+\kappa}{2}, \; \frac{1}{2} (Y^T X \Omega X^T Y + Y^T Y + \lambda) \right)</math>  <math>\tau  Y, X \sim \Gamma \left( \frac{N+\kappa}{2}, \; \frac{1}{2} (Y^T X \Omega X^T Y + Y^T Y + \lambda) \right)</math> 
Revision as of 22:24, 21 November 2012
Project name  Main project page Previous entry Next entry 
Bayesian model of univariate linear regression for QTL detectionSee Servin & Stephens (PLoS Genetics, 2007).
where β_{1} is in fact the additive effect of the SNP, noted a from now on, and β_{2} is the dominance effect of the SNP, d = ak. Let's now write the model in matrix notation:
This gives the following multivariate Normal distribution for the phenotypes:
The likelihood of the parameters given the data is therefore:
A Gamma distribution for τ:
which means:
And a multivariate Normal distribution for B:
which means:
Here and in the following, we neglect all constants (e.g. normalization constant, Y^{T}Y, etc):
We use the prior and likelihood and keep only the terms in B:
We expand:
We factorize some terms:
Let's define . We can see that Ω^{T} = Ω, which means that Ω is a symmetric matrix. This is particularly useful here because we can use the following equality: Ω^{ − 1}Ω^{T} = I.
This now becomes easy to factorizes totally:
We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:
Similarly to the equations above:
But now, to handle the second term, we need to integrate over B, thus effectively taking into account the uncertainty in B:
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on B!):
As we used a conjugate prior for τ, we know that we expect a Gamma distribution for the posterior. Therefore, we can take τ^{N / 2} out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:
We recognize the conditional posterior of B. This allows us to use the fact that the pdf of the Normal distribution integrates to one:
We finally recognize a Gamma distribution, allowing us to write the posterior as:
