Difference between revisions of "User:Timothee Flutre/Notebook/Postdoc/2011/11/10"

From OpenWetWare
Jump to: navigation, search
(Bayesian model of univariate linear regression for QTL detection: fix typo sign lambda*)
(Bayesian model of univariate linear regression for QTL detection: start adding formula for BF)
Line 35: Line 35:
 
<math>Y | X, \tau, B \sim \mathcal{N}(XB, \tau^{-1} I_N)</math>
 
<math>Y | X, \tau, B \sim \mathcal{N}(XB, \tau^{-1} I_N)</math>
  
Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the covariance matrix of <math>Y | X, \tau, B</math> is in fact parametrized by a single real number, <math>\tau</math>.
+
Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the regression has a single response, <math>Y</math>.
 +
It is usual to keep the term "multivariate" for the case where there is a matrix of responses (i.e. multiple phenotypes).
  
 
The likelihood of the parameters given the data is therefore:
 
The likelihood of the parameters given the data is therefore:
Line 164: Line 165:
  
  
* '''Bayes Factor''': to do
+
* '''Bayes Factor''': one way to answer our goal above ("is there an effect of the genotype on the phenotype?") is to do [http://en.wikipedia.org/wiki/Hypothesis_testing hypothesis testing].
 +
We want to test the following [http://en.wikipedia.org/wiki/Null_hypothesis null hypothesis]:
  
 +
<math>H_0: \; a = d = 0</math>
  
* '''In practice''': to do
+
In Bayesian modeling, hypothesis testing is performed with a [http://en.wikipedia.org/wiki/Bayes_factor Bayes factor], which in our case can be written as:
 +
 
 +
<math>BF = \frac{\mathsf{P}(Y | X, a \neq 0, d \neq 0)}{\mathsf{P}(Y | X, a = 0, d = 0)}</math>
 +
 
 +
We can shorten this into:
 +
 
 +
<math>BF = \frac{\mathsf{P}(Y | X)}{\mathsf{P}_0(Y)}</math>
 +
 
 +
Note that, compare to frequentist hypothesis testing which focuses on the null, the Bayes factor requires to explicitly model the data under the alternative.
 +
 
 +
Let's start with the numerator:
 +
 
 +
<math>\mathsf{P}(Y | X) = \int \mathsf{P}(\tau) \mathsf{P}(Y | X, \tau) \mathsf{d}\tau</math>
 +
 
 +
First, let's calculate what is inside the integral:
 +
 
 +
<math>\mathsf{P}(Y | X, \tau) = \frac{\mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B)}{\mathsf{P}(B | Y, X, \tau)}</math>
 +
 
 +
Using the formula obtained previously and doing some algebra gives:
 +
 
 +
<math>\mathsf{P}(Y | X, \tau) = \left( \frac{\tau}{2 \pi} \right)^{\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} exp\left( -\frac{\tau}{2} (Y^TY - Y^TX\Omega X^TY) \right)</math>
 +
 
 +
Now we can integrate out <math>\tau</math> (note the small typo in equation 9 of supplementary text S1 of Servin & Stephens):
 +
 
 +
<math>\mathsf{P}(Y | X) = (2\pi)^{-\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} \frac{\frac{\lambda}{2}^{\frac{\kappa}{2}}}{\Gamma(\frac{\kappa}{2})} \int \tau^{\frac{N+\kappa}{2}-1} exp \left( -\frac{\tau}{2} (Y^TY - Y^TX\Omega X^TY + \lambda) \right)</math>
 +
 
 +
Inside the integral, we recognize the almost-complete pdf of a Gamma distribution.
 +
As it has to integrate to one, we get:
 +
 
 +
<math>\mathsf{P}(Y | X) = (2\pi)^{-\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} \left( \frac{\lambda}{2} \right)^{\frac{\kappa}{2}} \frac{\Gamma(\frac{N+\kappa}{2})}{\Gamma(\frac{\kappa}{2})} \left( \frac{Y^TY - Y^TX\Omega X^TY + \lambda}{2} \right)^{-\frac{N+\kappa}{2}}</math>
 +
 
 +
 
 +
* '''Choosing the hyperparameters''':
  
 
invariance properties motivate the use of limits for some "unimportant" hyperparameters
 
invariance properties motivate the use of limits for some "unimportant" hyperparameters
Line 174: Line 209:
  
  
* '''R code''': to do
+
* '''R code''':
 +
 
 +
to do
  
 
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
 
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 11:09, 22 November 2012

Owwnotebook icon.png Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Bayesian model of univariate linear regression for QTL detection

See Servin & Stephens (PLoS Genetics, 2007).


  • Data: let's assume that we obtained data from N individuals. We note Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle y_1,\ldots,y_N} the (quantitative) phenotypes (e.g. expression levels at a given gene), and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle g_1,\ldots,g_N} the genotypes at a given SNP (encoded as allele dose: 0, 1 or 2).


  • Goal: we want to assess the evidence in the data for an effect of the genotype on the phenotype.


  • Assumptions: the relationship between genotype and phenotype is linear; the individuals are not genetically related; there is no hidden confounding factors in the phenotypes.


  • Likelihood: we start by writing the usual linear regression for one individual

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \forall i \in \{1,\ldots,N\}, \; y_i = \mu + \beta_1 g_i + \beta_2 \mathbf{1}_{g_i=1} + \epsilon_i \text{ with } \epsilon_i \overset{i.i.d}{\sim} \mathcal{N}(0,\tau^{-1})}

where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \beta_1} is in fact the additive effect of the SNP, noted Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle a} from now on, and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \beta_2} is the dominance effect of the SNP, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle d = a k} .

Let's now write the model in matrix notation:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle Y = X B + E \text{ where } B = [ \mu \; a \; d ]^T}

This gives the following multivariate Normal distribution for the phenotypes:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle Y | X, \tau, B \sim \mathcal{N}(XB, \tau^{-1} I_N)}

Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the regression has a single response, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle Y} . It is usual to keep the term "multivariate" for the case where there is a matrix of responses (i.e. multiple phenotypes).

The likelihood of the parameters given the data is therefore:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathcal{L}(\tau, B) = \mathsf{P}(Y | X, \tau, B)}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathcal{L}(\tau, B) = \left(\frac{\tau}{2 \pi}\right)^{\frac{N}{2}} exp \left( -\frac{\tau}{2} (Y - XB)^T (Y - XB) \right)}


Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(\tau, B) = \mathsf{P}(\tau) \mathsf{P}(B | \tau)}

A Gamma distribution for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \tau} :

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \tau \sim \Gamma(\kappa/2, \, \lambda/2)}

which means:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(\tau) = \frac{\frac{\lambda}{2}^{\kappa/2}}{\Gamma(\frac{\kappa}{2})} \tau^{\frac{\kappa}{2}-1} e^{-\frac{\lambda}{2} \tau}}

And a multivariate Normal distribution for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B} :

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B | \tau \sim \mathcal{N}(\vec{0}, \, \tau^{-1} \Sigma_B) \text{ with } \Sigma_B = diag(\sigma_{\mu}^2, \sigma_a^2, \sigma_d^2)}

which means:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | \tau) = \left(\frac{\tau}{2 \pi}\right)^{\frac{3}{2}} |\Sigma_B|^{-\frac{1}{2}} exp \left(-\frac{\tau}{2} B^T \Sigma_B^{-1} B \right)}


  • Joint posterior (1):

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(\tau, B | Y, X) = \mathsf{P}(\tau | Y, X) \mathsf{P}(B | Y, X, \tau)}


  • Conditional posterior of B:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X, \tau) = \frac{\mathsf{P}(B, Y | X, \tau)}{\mathsf{P}(Y | X, \tau)}}

Let's neglect the normalization constant for now:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X, \tau) \propto \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B)}

Similarly, let's keep only the terms in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B} for the moment:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B) exp((Y-XB)^T(Y-XB))}

We expand:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B - Y^TXB -B^TX^TY + B^TX^TXB)}

We factorize some terms:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X, \tau) \propto exp(B^T (\Sigma_B^{-1} + X^TX) B - Y^TXB -B^TX^TY)}

Importantly, let's define:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \Omega = (\Sigma_B^{-1} + X^TX)^{-1}}

We can see that Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \Omega^T=\Omega} , which means that Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \Omega} is a symmetric matrix. This is particularly useful here because we can use the following equality: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \Omega^{-1}\Omega^T=I} .

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Omega^{-1} B - (X^TY)^T\Omega^{-1}\Omega^TB -B^T\Omega^{-1}\Omega^TX^TY)}

This now becomes easy to factorizes totally:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X, \tau) \propto exp((B^T - \Omega X^TY)^T\Omega^{-1}(B - \Omega X^TY))}

We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B | Y, X, \tau \sim \mathcal{N}(\Omega X^TY, \tau^{-1} \Omega)}


  • Posterior of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \tau} :

Similarly to the equations above:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(\tau | Y, X) \propto \mathsf{P}(\tau) \mathsf{P}(Y | X, \tau)}

But now, to handle the second term, we need to integrate over Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B} , thus effectively taking into account the uncertainty in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B} :

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(\tau | Y, X) \propto \mathsf{P}(\tau) \int \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B) \mathsf{d}B}

Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B} !):

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(\tau | Y, X) \propto \tau^{\frac{\kappa}{2} - 1} e^{-\frac{\lambda}{2} \tau} \int \tau^{3/2} \tau^{N/2} exp(-\frac{\tau}{2} B^T \Sigma_B^{-1} B) exp(-\frac{\tau}{2} (Y - XB)^T (Y - XB)) \mathsf{d}B}

As we used a conjugate prior for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \tau} , we know that we expect a Gamma distribution for the posterior. Therefore, we can take Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \tau^{N/2}} out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(\tau | Y, X) \propto \tau^{\frac{N+\kappa}{2} - 1} e^{-\frac{\lambda}{2} \tau} \int \tau^{3/2} exp \left[-\frac{\tau}{2} \left( (B - \Omega X^T Y)^T \Omega^{-1} (B - \Omega X^T Y) - Y^T X \Omega X^T Y + Y^T Y \right) \right] \mathsf{d}B}

We recognize the conditional posterior of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B} . This allows us to use the fact that the pdf of the Normal distribution integrates to one:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(\tau | Y, X) \propto \tau^{\frac{N+\kappa}{2} - 1} e^{-\frac{\lambda}{2} \tau} exp\left[-\frac{\tau}{2} (Y^T Y - Y^T X \Omega X^T Y) \right]}

We finally recognize a Gamma distribution, allowing us to write the posterior as:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \tau | Y, X \sim \Gamma \left( \frac{N+\kappa}{2}, \; \frac{1}{2} (Y^T Y - Y^T X \Omega X^T Y + \lambda) \right)}


  • Joint posterior (2): sometimes it is said that the joint posterior follows a Normal Inverse Gamma distribution:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B, \tau | Y, X \sim \mathcal{N}IG(\Omega X^TY, \; \tau^{-1}\Omega, \; \frac{N+\kappa}{2}, \; \frac{\lambda^\ast}{2})}

where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda^\ast = Y^T Y - Y^T X \Omega X^T Y + \lambda}


  • Marginal posterior of B: we can now integrate out Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \tau} :

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X) = \int \mathsf{P}(\tau) \mathsf{P}(B | Y, X, \tau) \mathsf{d}\tau}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X) = \frac{\frac{\lambda^\ast}{2}^{\frac{N+\kappa}{2}}}{(2\pi)^\frac{3}{2} |\Omega|^{\frac{1}{2}} \Gamma(\frac{N+\kappa}{2})} \int \tau^{\frac{N+\kappa+3}{2}-1} exp \left[-\tau \left( \frac{\lambda^\ast}{2} + (B - \Omega X^TY)^T \Omega^{-1} (B - \Omega X^TY) \right) \right] \mathsf{d}\tau}

Here we recognize the formula to integrate the Gamma function:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X) = \frac{\frac{\lambda^\ast}{2}^{\frac{N+\kappa}{2}} \Gamma(\frac{N+\kappa+3}{2})}{(2\pi)^\frac{3}{2} |\Omega|^{\frac{1}{2}} \Gamma(\frac{N+\kappa}{2})} \left( \frac{\lambda^\ast}{2} + (B - \Omega X^TY)^T \Omega^{-1} (B - \Omega X^TY) \right)^{-\frac{N+\kappa+3}{2}}}

And we now recognize a multivariate Student's t-distribution:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(B | Y, X) = \frac{\Gamma(\frac{N+\kappa+3}{2})}{\Gamma(\frac{N+\kappa}{2}) \pi^\frac{3}{2} |\lambda^\ast \Omega|^{\frac{1}{2}} } \left( 1 + \frac{(B - \Omega X^TY)^T \Omega^{-1} (B - \Omega X^TY)}{\lambda^\ast} \right)^{-\frac{N+\kappa+3}{2}}}

We hence can write:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B | Y, X \sim \mathcal{S}_{N+\kappa}(\Omega X^TY, \; (Y^T Y - Y^T X \Omega X^T Y + \lambda) \Omega)}


  • Bayes Factor: one way to answer our goal above ("is there an effect of the genotype on the phenotype?") is to do hypothesis testing.

We want to test the following null hypothesis:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle H_0: \; a = d = 0}

In Bayesian modeling, hypothesis testing is performed with a Bayes factor, which in our case can be written as:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle BF = \frac{\mathsf{P}(Y | X, a \neq 0, d \neq 0)}{\mathsf{P}(Y | X, a = 0, d = 0)}}

We can shorten this into:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle BF = \frac{\mathsf{P}(Y | X)}{\mathsf{P}_0(Y)}}

Note that, compare to frequentist hypothesis testing which focuses on the null, the Bayes factor requires to explicitly model the data under the alternative.

Let's start with the numerator:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(Y | X) = \int \mathsf{P}(\tau) \mathsf{P}(Y | X, \tau) \mathsf{d}\tau}

First, let's calculate what is inside the integral:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(Y | X, \tau) = \frac{\mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B)}{\mathsf{P}(B | Y, X, \tau)}}

Using the formula obtained previously and doing some algebra gives:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(Y | X, \tau) = \left( \frac{\tau}{2 \pi} \right)^{\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} exp\left( -\frac{\tau}{2} (Y^TY - Y^TX\Omega X^TY) \right)}

Now we can integrate out Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \tau} (note the small typo in equation 9 of supplementary text S1 of Servin & Stephens):

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(Y | X) = (2\pi)^{-\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} \frac{\frac{\lambda}{2}^{\frac{\kappa}{2}}}{\Gamma(\frac{\kappa}{2})} \int \tau^{\frac{N+\kappa}{2}-1} exp \left( -\frac{\tau}{2} (Y^TY - Y^TX\Omega X^TY + \lambda) \right)}

Inside the integral, we recognize the almost-complete pdf of a Gamma distribution. As it has to integrate to one, we get:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mathsf{P}(Y | X) = (2\pi)^{-\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} \left( \frac{\lambda}{2} \right)^{\frac{\kappa}{2}} \frac{\Gamma(\frac{N+\kappa}{2})}{\Gamma(\frac{\kappa}{2})} \left( \frac{Y^TY - Y^TX\Omega X^TY + \lambda}{2} \right)^{-\frac{N+\kappa}{2}}}


  • Choosing the hyperparameters:

invariance properties motivate the use of limits for some "unimportant" hyperparameters

average BF over grid


  • R code:

to do