# Difference between revisions of "User:Timothee Flutre/Notebook/Postdoc/2011/11/10"

Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

## Bayesian model of univariate linear regression for QTL detection

See Servin & Stephens (PLoS Genetics, 2007).

• Data: let's assume that we obtained data from N individuals. We note the (quantitative) phenotypes (e.g. expression levels at a given gene), and the genotypes at a given SNP (encoded as allele dose: 0, 1 or 2).

• Goal: we want to assess the evidence in the data for an effect of the genotype on the phenotype.

• Assumptions: the relationship between genotype and phenotype is linear; the individuals are not genetically related; there is no hidden confounding factors in the phenotypes.

• Likelihood: we start by writing the usual linear regression for one individual

where is in fact the additive effect of the SNP, noted from now on, and is the dominance effect of the SNP, .

Let's now write the model in matrix notation:

This gives the following multivariate Normal distribution for the phenotypes:

Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the covariance matrix of remains a single real number, .

The likelihood of the parameters given the data is therefore:

A Gamma distribution for :

which means:

And a multivariate Normal distribution for :

which means:

• Joint posterior (1):

• Conditional posterior of B:

Here and in the following, we neglect all constants (e.g. normalization constant, , etc):

We use the prior and likelihood and keep only the terms in :

We expand:

We factorize some terms:

Importantly, let's define:

We can see that , which means that is a symmetric matrix. This is particularly useful here because we can use the following equality: .

This now becomes easy to factorizes totally:

We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:

• Posterior of :

Similarly to the equations above:

But now, to handle the second term, we need to integrate over , thus effectively taking into account the uncertainty in :

Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on !):

As we used a conjugate prior for , we know that we expect a Gamma distribution for the posterior. Therefore, we can take out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:

We recognize the conditional posterior of . This allows us to use the fact that the pdf of the Normal distribution integrates to one:

We finally recognize a Gamma distribution, allowing us to write the posterior as:

• Joint posterior (2): sometimes it is said that the joint posterior follows a Normal Inverse Gamma distribution:

where

• Marginal posterior of B: we can now integrate out :

Here we recognize the formula to integrate the Gamma function:

And we now recognize a multivariate Student's t-distribution:

We hence can write:

• Bayes Factor: to do

• In practice: to do

invariance properties motivate the use of limits for some "unimportant" hyperparameters

average BF over grid

• R code: to do