User:Timothee Flutre/Notebook/Postdoc/2011/11/10
From OpenWetWare
(→Bayesian model of univariate linear regression for QTL detection: add what rest to be done) |
(→Bayesian model of univariate linear regression for QTL detection: fix typo + simplify) |
||
| Line 35: | Line 35: | ||
<math>Y | X, \tau, B \sim \mathcal{N}(XB, \tau^{-1} I_N)</math> | <math>Y | X, \tau, B \sim \mathcal{N}(XB, \tau^{-1} I_N)</math> | ||
| - | Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the covariance matrix of <math>Y | X, \tau, B</math> | + | Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the covariance matrix of <math>Y | X, \tau, B</math> is in fact parametrized by a single real number, <math>\tau</math>. |
The likelihood of the parameters given the data is therefore: | The likelihood of the parameters given the data is therefore: | ||
| Line 71: | Line 71: | ||
* '''Conditional posterior of B''': | * '''Conditional posterior of B''': | ||
| - | |||
| - | |||
<math>\mathsf{P}(B | Y, X, \tau) = \frac{\mathsf{P}(B, Y | X, \tau)}{\mathsf{P}(Y | X, \tau)}</math> | <math>\mathsf{P}(B | Y, X, \tau) = \frac{\mathsf{P}(B, Y | X, \tau)}{\mathsf{P}(Y | X, \tau)}</math> | ||
| - | + | Let's neglect the normalization constant for now: | |
| - | + | ||
| - | + | ||
<math>\mathsf{P}(B | Y, X, \tau) \propto \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B)</math> | <math>\mathsf{P}(B | Y, X, \tau) \propto \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B)</math> | ||
| - | + | Similarly, let's keep only the terms in <math>B</math> for the moment: | |
<math>\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B) exp((Y-XB)^T(Y-XB))</math> | <math>\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B) exp((Y-XB)^T(Y-XB))</math> | ||
| Line 151: | Line 147: | ||
* '''Marginal posterior of B''': we can now integrate out <math>\tau</math>: | * '''Marginal posterior of B''': we can now integrate out <math>\tau</math>: | ||
| - | <math>\mathsf{P}(B | Y, X) = \int \mathsf{P}( | + | <math>\mathsf{P}(B | Y, X) = \int \mathsf{P}(\tau) \mathsf{P}(B | Y, X, \tau) \mathsf{d}\tau</math> |
<math>\mathsf{P}(B | Y, X) = \frac{\frac{\lambda^\ast}{2}^{\frac{N+\kappa}{2}}}{(2\pi)^\frac{3}{2} |\Omega|^{\frac{1}{2}} \Gamma(\frac{N+\kappa}{2})} \int \tau^{\frac{N+\kappa+3}{2}-1} exp \left[-\tau \left( \frac{\lambda^\ast}{2} + (B - \Omega X^TY)^T \Omega^{-1} (B - \Omega X^TY) \right) \right] \mathsf{d}\tau</math> | <math>\mathsf{P}(B | Y, X) = \frac{\frac{\lambda^\ast}{2}^{\frac{N+\kappa}{2}}}{(2\pi)^\frac{3}{2} |\Omega|^{\frac{1}{2}} \Gamma(\frac{N+\kappa}{2})} \int \tau^{\frac{N+\kappa+3}{2}-1} exp \left[-\tau \left( \frac{\lambda^\ast}{2} + (B - \Omega X^TY)^T \Omega^{-1} (B - \Omega X^TY) \right) \right] \mathsf{d}\tau</math> | ||
Revision as of 13:34, 22 November 2012
Main project page Previous entry Next entry
| |
Bayesian model of univariate linear regression for QTL detectionSee Servin & Stephens (PLoS Genetics, 2007).
where β1 is in fact the additive effect of the SNP, noted a from now on, and β2 is the dominance effect of the SNP, d = ak. Let's now write the model in matrix notation:
This gives the following multivariate Normal distribution for the phenotypes:
Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the covariance matrix of Y | X,τ,B is in fact parametrized by a single real number, τ. The likelihood of the parameters given the data is therefore:
A Gamma distribution for τ:
which means:
And a multivariate Normal distribution for B:
which means:
Let's neglect the normalization constant for now:
Similarly, let's keep only the terms in B for the moment:
We expand:
We factorize some terms:
Importantly, let's define:
We can see that ΩT = Ω, which means that Ω is a symmetric matrix. This is particularly useful here because we can use the following equality: Ω − 1ΩT = I.
This now becomes easy to factorizes totally:
We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:
Similarly to the equations above:
But now, to handle the second term, we need to integrate over B, thus effectively taking into account the uncertainty in B:
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on B!):
As we used a conjugate prior for τ, we know that we expect a Gamma distribution for the posterior. Therefore, we can take τN / 2 out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:
We recognize the conditional posterior of B. This allows us to use the fact that the pdf of the Normal distribution integrates to one:
We finally recognize a Gamma distribution, allowing us to write the posterior as:
where
Here we recognize the formula to integrate the Gamma function:
And we now recognize a multivariate Student's t-distribution:
We hence can write:
invariance properties motivate the use of limits for some "unimportant" hyperparameters average BF over grid
| |

the (quantitative) phenotypes (e.g. expression levels at a given gene), and
the genotypes at a given SNP (encoded as allele dose: 0, 1 or 2).


