User:Timothee Flutre/Notebook/Postdoc/2011/11/10
From OpenWetWare
(→Bayesian model of univariate linear regression for QTL detection: start adding formula for BF) |
(→Bayesian model of univariate linear regression for QTL detection: finish BF) |
||
| Line 200: | Line 200: | ||
<math>\mathsf{P}(Y | X) = (2\pi)^{-\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} \left( \frac{\lambda}{2} \right)^{\frac{\kappa}{2}} \frac{\Gamma(\frac{N+\kappa}{2})}{\Gamma(\frac{\kappa}{2})} \left( \frac{Y^TY - Y^TX\Omega X^TY + \lambda}{2} \right)^{-\frac{N+\kappa}{2}}</math> | <math>\mathsf{P}(Y | X) = (2\pi)^{-\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} \left( \frac{\lambda}{2} \right)^{\frac{\kappa}{2}} \frac{\Gamma(\frac{N+\kappa}{2})}{\Gamma(\frac{\kappa}{2})} \left( \frac{Y^TY - Y^TX\Omega X^TY + \lambda}{2} \right)^{-\frac{N+\kappa}{2}}</math> | ||
| + | |||
| + | We can use this expression also under the null. | ||
| + | In this case, as we need neither <math>a</math> nor <math>d</math>, <math>B</math> is simply <math>\mu</math>, <math>\Sigma_B</math> is <math>\sigma_{\mu}^2</math> and <math>X</math> is a vector of 1's. | ||
| + | We can also defines <math>\Omega_0 = ((\sigma_{\mu}^2)^{-1} + N)^{-1}</math>. | ||
| + | In the end, this gives: | ||
| + | |||
| + | <math>\mathsf{P}_0(Y) = (2\pi)^{-\frac{N}{2}} \frac{|\Omega_0|^{\frac{1}{2}}}{\sigma_{\mu}} \left( \frac{\lambda}{2} \right)^{\frac{\kappa}{2}} \frac{\Gamma(\frac{N+\kappa}{2})}{\Gamma(\frac{\kappa}{2})} \left( \frac{Y^TY - \Omega_0 N^2 \bar{Y}^2 + \lambda}{2} \right)^{-\frac{N+\kappa}{2}}</math> | ||
| + | |||
| + | We can therefore write the Bayes factor: | ||
| + | |||
| + | <math>BF = \left( \frac{|\Omega|}{\Omega_0} \right)^{\frac{1}{2}} \frac{1}{\sigma_a \sigma_d} \left( \frac{Y^TY - Y^TX\Omega X^TY + \lambda}{Y^TY - \Omega_0 N^2 \bar{Y}^2 + \lambda} \right)^{-\frac{N+\kappa}{2}}</math> | ||
Revision as of 17:47, 22 November 2012
Main project page Previous entry Next entry
| |
Bayesian model of univariate linear regression for QTL detectionSee Servin & Stephens (PLoS Genetics, 2007).
where β1 is in fact the additive effect of the SNP, noted a from now on, and β2 is the dominance effect of the SNP, d = ak. Let's now write the model in matrix notation:
This gives the following multivariate Normal distribution for the phenotypes:
Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the regression has a single response, Y. It is usual to keep the term "multivariate" for the case where there is a matrix of responses (i.e. multiple phenotypes). The likelihood of the parameters given the data is therefore:
A Gamma distribution for τ:
which means:
And a multivariate Normal distribution for B:
which means:
Let's neglect the normalization constant for now:
Similarly, let's keep only the terms in B for the moment:
We expand:
We factorize some terms:
Importantly, let's define:
We can see that ΩT = Ω, which means that Ω is a symmetric matrix. This is particularly useful here because we can use the following equality: Ω − 1ΩT = I.
This now becomes easy to factorizes totally:
We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:
Similarly to the equations above:
But now, to handle the second term, we need to integrate over B, thus effectively taking into account the uncertainty in B:
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on B!):
As we used a conjugate prior for τ, we know that we expect a Gamma distribution for the posterior. Therefore, we can take τN / 2 out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:
We recognize the conditional posterior of B. This allows us to use the fact that the pdf of the Normal distribution integrates to one:
We finally recognize a Gamma distribution, allowing us to write the posterior as:
where
Here we recognize the formula to integrate the Gamma function:
And we now recognize a multivariate Student's t-distribution:
We hence can write:
We want to test the following null hypothesis:
In Bayesian modeling, hypothesis testing is performed with a Bayes factor, which in our case can be written as:
We can shorten this into:
Note that, compare to frequentist hypothesis testing which focuses on the null, the Bayes factor requires to explicitly model the data under the alternative. Let's start with the numerator:
First, let's calculate what is inside the integral:
Using the formula obtained previously and doing some algebra gives:
Now we can integrate out τ (note the small typo in equation 9 of supplementary text S1 of Servin & Stephens):
Inside the integral, we recognize the almost-complete pdf of a Gamma distribution. As it has to integrate to one, we get:
We can use this expression also under the null.
In this case, as we need neither a nor d, B is simply μ, ΣB is
We can therefore write the Bayes factor:
invariance properties motivate the use of limits for some "unimportant" hyperparameters average BF over grid
to do | |

the (quantitative) phenotypes (e.g. expression levels at a given gene), and
the genotypes at a given SNP (encoded as allele dose: 0, 1 or 2).
and
.
In the end, this gives:


