User:Timothee Flutre/Notebook/Postdoc/2011/06/28

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(Autocreate 2011/06/28 Entry for User:Timothee_Flutre/Notebook/Postdoc)
(Entry title: first version)
Line 6: Line 6:
| colspan="2"|
| colspan="2"|
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
-
==Entry title==
+
==Calculate OLS estimates with summary statistics for simple linear regression==
-
* Insert content here...
+
 +
We obtained data from <math>n</math> individuals. Let be <math>y_1,\ldots,y_n</math> the (quantitative) phenotypes (eg. expression level at a given gene), and <math>g_1,\ldots,g_n</math> the genotypes at a given SNP.
 +
 +
We want to assess the linear relationship between phenotype and genotype. For this with use a simple linear regression:
 +
 +
<math>y_i = \mu + \beta x_i + \epsilon_i</math> with <math>\epsilon_i \rightarrow N(0,\sigma^2)</math> and for <math>i \in {1,\ldots,n}</math>
 +
 +
In vector-matrix notation:
 +
 +
<math>y = X \theta + \epsilon</math> with <math>\epsilon \rightarrow N_n(0,\sigma^2 I)</math> and <math>\theta^T = (\mu, \beta)</math>
 +
 +
Here is the ordinary-least-square (OLS) estimator of <math>\theta</math>:
 +
 +
<math>\hat{\theta} = (X^T X)^{-1} X^T Y</math>
 +
 +
<math>\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} =
 +
\left( \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix}
 +
\begin{bmatrix} 1 & g_1 \\ \vdots & \vdots \\ 1 & g_n \end{bmatrix} \right)^{-1}
 +
\begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix}
 +
\begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}
 +
</math>
 +
 +
<math>\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} =
 +
\begin{bmatrix} n & \sum_i g_i \\ \sum_i g_i & \sum_i g_i^2 \end{bmatrix}^{-1}
 +
\begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix}
 +
</math>
 +
 +
<math>\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} =
 +
\frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2}
 +
\begin{bmatrix} \sum_i g_i^2 & - \sum_i g_i \\ - \sum_i g_i & n \end{bmatrix}
 +
\begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix}
 +
</math>
 +
 +
<math>\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} =
 +
\frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2}
 +
\begin{bmatrix} \sum_i g_i^2 \sum_i y_i - \sum_i g_i \sum_i g_i y_i \\ - \sum_i g_i \sum_i y_i + n \sum_i g_i y_i \end{bmatrix}
 +
</math>
 +
 +
Let's now define 4 summary statistics:
 +
 +
<math>\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i</math>
 +
 +
<math>\bar{g} = \frac{1}{n} \sum_{i=1}^n g_i</math>
 +
 +
<math>g^T g = \sum_{i=1}^n g_i^2</math>
 +
 +
<math>g^T y = \sum_{i=1}^n g_i y_i</math>
 +
 +
<math>\hat{\beta} = \frac{g^T y - n \bar{g} \bar{y}}{g^T g - n \bar{g}^2}</math>
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 17:27, 28 March 2012

Project name Main project page
Next entry

Calculate OLS estimates with summary statistics for simple linear regression

We obtained data from n individuals. Let be y_1,\ldots,y_n the (quantitative) phenotypes (eg. expression level at a given gene), and g_1,\ldots,g_n the genotypes at a given SNP.

We want to assess the linear relationship between phenotype and genotype. For this with use a simple linear regression:

yi = μ + βxi + εi with \epsilon_i \rightarrow N(0,\sigma^2) and for i \in {1,\ldots,n}

In vector-matrix notation:

y = Xθ + ε with \epsilon \rightarrow N_n(0,\sigma^2 I) and θT = (μ,β)

Here is the ordinary-least-square (OLS) estimator of θ:

\hat{\theta} = (X^T X)^{-1} X^T Y

\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} =
\left( \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix}
\begin{bmatrix} 1 & g_1 \\ \vdots & \vdots \\ 1 & g_n \end{bmatrix} \right)^{-1}
\begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix}
\begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}

\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} =
\begin{bmatrix} n & \sum_i g_i \\ \sum_i g_i & \sum_i g_i^2 \end{bmatrix}^{-1}
\begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix}

\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} =
\frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2}
\begin{bmatrix} \sum_i g_i^2 & - \sum_i g_i \\ - \sum_i g_i & n \end{bmatrix}
\begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix}

\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} =
\frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2}
\begin{bmatrix} \sum_i g_i^2 \sum_i y_i - \sum_i g_i \sum_i g_i y_i \\ - \sum_i g_i \sum_i y_i + n \sum_i g_i y_i \end{bmatrix}

Let's now define 4 summary statistics:

\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i

\bar{g} = \frac{1}{n} \sum_{i=1}^n g_i

g^T g = \sum_{i=1}^n g_i^2

g^T y = \sum_{i=1}^n g_i y_i

\hat{\beta} = \frac{g^T y - n \bar{g} \bar{y}}{g^T g - n \bar{g}^2}


Personal tools