User:Timothee Flutre/Notebook/Postdoc/2011/11/10
From OpenWetWare
(→Bayesian model of univariate linear regression for QTL detection: start adding marginal posterior of B) 
(→Bayesian model of univariate linear regression for QTL detection: add what rest to be done) 

(One intermediate revision not shown.)  
Line 34:  Line 34:  
<math>Y  X, \tau, B \sim \mathcal{N}(XB, \tau^{1} I_N)</math>  <math>Y  X, \tau, B \sim \mathcal{N}(XB, \tau^{1} I_N)</math>  
+  
+  Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the covariance matrix of <math>Y  X, \tau, B</math> remains a single real number, <math>\tau</math>.  
The likelihood of the parameters given the data is therefore:  The likelihood of the parameters given the data is therefore:  
Line 42:  Line 44:  
  * '''Priors''': we use the usual conjugate prior  +  * '''Priors''': we use the usual [http://en.wikipedia.org/wiki/Conjugate_prior conjugate prior] 
<math>\mathsf{P}(\tau, B) = \mathsf{P}(\tau) \mathsf{P}(B  \tau)</math>  <math>\mathsf{P}(\tau, B) = \mathsf{P}(\tau) \mathsf{P}(B  \tau)</math>  
Line 92:  Line 94:  
<math>\mathsf{P}(B  Y, X, \tau) \propto exp(B^T (\Sigma_B^{1} + X^TX) B  Y^TXB B^TX^TY)</math>  <math>\mathsf{P}(B  Y, X, \tau) \propto exp(B^T (\Sigma_B^{1} + X^TX) B  Y^TXB B^TX^TY)</math>  
  +  Importantly, let's define:  
+  
+  <math>\Omega = (\Sigma_B^{1} + X^TX)^{1}</math>  
+  
+  We can see that <math>\Omega^T=\Omega</math>, which means that <math>\Omega</math> is a [http://en.wikipedia.org/wiki/Symmetric_matrix symmetric matrix].  
This is particularly useful here because we can use the following equality: <math>\Omega^{1}\Omega^T=I</math>.  This is particularly useful here because we can use the following equality: <math>\Omega^{1}\Omega^T=I</math>.  
Line 138:  Line 144:  
* '''Joint posterior (2)''': sometimes it is said that the joint posterior follows a Normal Inverse Gamma distribution:  * '''Joint posterior (2)''': sometimes it is said that the joint posterior follows a Normal Inverse Gamma distribution:  
  <math>B, \tau  Y, X \sim \mathcal{N}IG(\Omega X^TY, \tau^{1}\Omega, \frac{N+\kappa}{2}, \frac{\lambda^\ast}{2})</math>  +  <math>B, \tau  Y, X \sim \mathcal{N}IG(\Omega X^TY, \; \tau^{1}\Omega, \; \frac{N+\kappa}{2}, \; \frac{\lambda^\ast}{2})</math> 
where <math>\lambda^\ast = (Y^T X \Omega X^T Y + Y^T Y + \lambda)</math>  where <math>\lambda^\ast = (Y^T X \Omega X^T Y + Y^T Y + \lambda)</math>  
Line 152:  Line 158:  
<math>\mathsf{P}(B  Y, X) = \frac{\frac{\lambda^\ast}{2}^{\frac{N+\kappa}{2}} \Gamma(\frac{N+\kappa+3}{2})}{(2\pi)^\frac{3}{2} \Omega^{\frac{1}{2}} \Gamma(\frac{N+\kappa}{2})} \left( \frac{\lambda^\ast}{2} + (B  \Omega X^TY)^T \Omega^{1} (B  \Omega X^TY) \right)^{\frac{N+\kappa+3}{2}}</math>  <math>\mathsf{P}(B  Y, X) = \frac{\frac{\lambda^\ast}{2}^{\frac{N+\kappa}{2}} \Gamma(\frac{N+\kappa+3}{2})}{(2\pi)^\frac{3}{2} \Omega^{\frac{1}{2}} \Gamma(\frac{N+\kappa}{2})} \left( \frac{\lambda^\ast}{2} + (B  \Omega X^TY)^T \Omega^{1} (B  \Omega X^TY) \right)^{\frac{N+\kappa+3}{2}}</math>  
+  
+  And we now recognize a [http://en.wikipedia.org/wiki/Multivariate_tdistribution multivariate Student's tdistribution]:  
+  
+  <math>\mathsf{P}(B  Y, X) = \frac{\Gamma(\frac{N+\kappa+3}{2})}{\Gamma(\frac{N+\kappa}{2}) \pi^\frac{3}{2} \lambda^\ast \Omega^{\frac{1}{2}} } \left( 1 + \frac{(B  \Omega X^TY)^T \Omega^{1} (B  \Omega X^TY)}{\lambda^\ast} \right)^{\frac{N+\kappa+3}{2}}</math>  
+  
+  We hence can write:  
+  
+  <math>B  Y, X \sim \mathcal{S}_{N+\kappa}(\Omega X^TY, \; (Y^T X \Omega X^T Y + Y^T Y + \lambda) \Omega)</math>  
+  
+  
+  * '''Bayes Factor''': to do  
+  
+  
+  * '''In practice''': to do  
+  
+  invariance properties motivate the use of limits for some "unimportant" hyperparameters  
+  
+  average BF over grid  
+  
+  
+  * '''R code''': to do  
<! ##### DO NOT edit below this line unless you know what you are doing. ##### >  <! ##### DO NOT edit below this line unless you know what you are doing. ##### > 
Revision as of 01:34, 22 November 2012
Project name  Main project page Previous entry Next entry 
Bayesian model of univariate linear regression for QTL detectionSee Servin & Stephens (PLoS Genetics, 2007).
where β_{1} is in fact the additive effect of the SNP, noted a from now on, and β_{2} is the dominance effect of the SNP, d = ak. Let's now write the model in matrix notation:
This gives the following multivariate Normal distribution for the phenotypes:
Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the covariance matrix of Y  X,τ,B remains a single real number, τ. The likelihood of the parameters given the data is therefore:
A Gamma distribution for τ:
which means:
And a multivariate Normal distribution for B:
which means:
Here and in the following, we neglect all constants (e.g. normalization constant, Y^{T}Y, etc):
We use the prior and likelihood and keep only the terms in B:
We expand:
We factorize some terms:
Importantly, let's define:
We can see that Ω^{T} = Ω, which means that Ω is a symmetric matrix. This is particularly useful here because we can use the following equality: Ω^{ − 1}Ω^{T} = I.
This now becomes easy to factorizes totally:
We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:
Similarly to the equations above:
But now, to handle the second term, we need to integrate over B, thus effectively taking into account the uncertainty in B:
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on B!):
As we used a conjugate prior for τ, we know that we expect a Gamma distribution for the posterior. Therefore, we can take τ^{N / 2} out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:
We recognize the conditional posterior of B. This allows us to use the fact that the pdf of the Normal distribution integrates to one:
We finally recognize a Gamma distribution, allowing us to write the posterior as:
where
Here we recognize the formula to integrate the Gamma function:
And we now recognize a multivariate Student's tdistribution:
We hence can write:
invariance properties motivate the use of limits for some "unimportant" hyperparameters average BF over grid
