User:Timothee Flutre/Notebook/Postdoc/2011/06/28: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(→‎Entry title: first version)
Line 45: Line 45:
</math>
</math>


Let's now define 4 summary statistics:
Let's now define 4 summary statistics, very easy to compute:


<math>\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i</math>
<math>\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i</math>
Line 54: Line 54:


<math>g^T y = \sum_{i=1}^n g_i y_i</math>
<math>g^T y = \sum_{i=1}^n g_i y_i</math>
This allows to obtain the estimate of the effect size only by having the summary statistics available:


<math>\hat{\beta} = \frac{g^T y - n \bar{g} \bar{y}}{g^T g - n \bar{g}^2}</math>
<math>\hat{\beta} = \frac{g^T y - n \bar{g} \bar{y}}{g^T g - n \bar{g}^2}</math>
The same works for the estimate of the standard deviation of the errors:
<math>\hat{\sigma}^2 = \frac{1}{n-r}(y - X\hat{\theta})^T(y - X\hat{\theta})</math>
We can also benefit from this for the standard error of the parameters:
<math>V(\hat{\theta}) = \hat{\sigma}^2 (X^T X)^{-1}</math>
<math>V(\hat{\theta}) = \hat{\sigma}^2 \frac{1}{n g^T g - n^2 \bar{g}^2}
\begin{bmatrix} g^Tg & -n\bar{g} \\ -n\bar{g} & n \end{bmatrix}
</math>
<math>V(\hat{\beta}) = \frac{\hat{\sigma}^2}{g^Tg - n\bar{g}^2}</math>


<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 15:17, 28 March 2012

Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Calculate OLS estimates with summary statistics for simple linear regression

We obtained data from [math]\displaystyle{ n }[/math] individuals. Let be [math]\displaystyle{ y_1,\ldots,y_n }[/math] the (quantitative) phenotypes (eg. expression level at a given gene), and [math]\displaystyle{ g_1,\ldots,g_n }[/math] the genotypes at a given SNP.

We want to assess the linear relationship between phenotype and genotype. For this with use a simple linear regression:

[math]\displaystyle{ y_i = \mu + \beta x_i + \epsilon_i }[/math] with [math]\displaystyle{ \epsilon_i \rightarrow N(0,\sigma^2) }[/math] and for [math]\displaystyle{ i \in {1,\ldots,n} }[/math]

In vector-matrix notation:

[math]\displaystyle{ y = X \theta + \epsilon }[/math] with [math]\displaystyle{ \epsilon \rightarrow N_n(0,\sigma^2 I) }[/math] and [math]\displaystyle{ \theta^T = (\mu, \beta) }[/math]

Here is the ordinary-least-square (OLS) estimator of [math]\displaystyle{ \theta }[/math]:

[math]\displaystyle{ \hat{\theta} = (X^T X)^{-1} X^T Y }[/math]

[math]\displaystyle{ \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \left( \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix} \begin{bmatrix} 1 & g_1 \\ \vdots & \vdots \\ 1 & g_n \end{bmatrix} \right)^{-1} \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix} \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix} }[/math]

[math]\displaystyle{ \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \begin{bmatrix} n & \sum_i g_i \\ \sum_i g_i & \sum_i g_i^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix} }[/math]

[math]\displaystyle{ \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2} \begin{bmatrix} \sum_i g_i^2 & - \sum_i g_i \\ - \sum_i g_i & n \end{bmatrix} \begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix} }[/math]

[math]\displaystyle{ \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2} \begin{bmatrix} \sum_i g_i^2 \sum_i y_i - \sum_i g_i \sum_i g_i y_i \\ - \sum_i g_i \sum_i y_i + n \sum_i g_i y_i \end{bmatrix} }[/math]

Let's now define 4 summary statistics, very easy to compute:

[math]\displaystyle{ \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i }[/math]

[math]\displaystyle{ \bar{g} = \frac{1}{n} \sum_{i=1}^n g_i }[/math]

[math]\displaystyle{ g^T g = \sum_{i=1}^n g_i^2 }[/math]

[math]\displaystyle{ g^T y = \sum_{i=1}^n g_i y_i }[/math]

This allows to obtain the estimate of the effect size only by having the summary statistics available:

[math]\displaystyle{ \hat{\beta} = \frac{g^T y - n \bar{g} \bar{y}}{g^T g - n \bar{g}^2} }[/math]

The same works for the estimate of the standard deviation of the errors:

[math]\displaystyle{ \hat{\sigma}^2 = \frac{1}{n-r}(y - X\hat{\theta})^T(y - X\hat{\theta}) }[/math]

We can also benefit from this for the standard error of the parameters:

[math]\displaystyle{ V(\hat{\theta}) = \hat{\sigma}^2 (X^T X)^{-1} }[/math]

[math]\displaystyle{ V(\hat{\theta}) = \hat{\sigma}^2 \frac{1}{n g^T g - n^2 \bar{g}^2} \begin{bmatrix} g^Tg & -n\bar{g} \\ -n\bar{g} & n \end{bmatrix} }[/math]

[math]\displaystyle{ V(\hat{\beta}) = \frac{\hat{\sigma}^2}{g^Tg - n\bar{g}^2} }[/math]