# User:Timothee Flutre/Notebook/Postdoc/2011/06/28

(Difference between revisions)
 Revision as of 16:48, 28 March 2012 (view source) (Autocreate 2011/06/28 Entry for User:Timothee_Flutre/Notebook/Postdoc)← Previous diff Revision as of 17:27, 28 March 2012 (view source) (→Entry title: first version)Next diff → Line 6: Line 6: | colspan="2"| | colspan="2"| - ==Entry title== + ==Calculate OLS estimates with summary statistics for simple linear regression== - * Insert content here... + + We obtained data from $n$ individuals. Let be $y_1,\ldots,y_n$ the (quantitative) phenotypes (eg. expression level at a given gene), and $g_1,\ldots,g_n$ the genotypes at a given SNP. + + We want to assess the linear relationship between phenotype and genotype. For this with use a simple linear regression: + + $y_i = \mu + \beta x_i + \epsilon_i$ with $\epsilon_i \rightarrow N(0,\sigma^2)$ and for $i \in {1,\ldots,n}$ + + In vector-matrix notation: + + $y = X \theta + \epsilon$ with $\epsilon \rightarrow N_n(0,\sigma^2 I)$ and $\theta^T = (\mu, \beta)$ + + Here is the ordinary-least-square (OLS) estimator of $\theta$: + + $\hat{\theta} = (X^T X)^{-1} X^T Y$ + + $\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = + \left( \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix} + \begin{bmatrix} 1 & g_1 \\ \vdots & \vdots \\ 1 & g_n \end{bmatrix} \right)^{-1} + \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix} + \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix} +$ + + $\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = + \begin{bmatrix} n & \sum_i g_i \\ \sum_i g_i & \sum_i g_i^2 \end{bmatrix}^{-1} + \begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix} +$ + + $\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = + \frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2} + \begin{bmatrix} \sum_i g_i^2 & - \sum_i g_i \\ - \sum_i g_i & n \end{bmatrix} + \begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix} +$ + + $\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = + \frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2} + \begin{bmatrix} \sum_i g_i^2 \sum_i y_i - \sum_i g_i \sum_i g_i y_i \\ - \sum_i g_i \sum_i y_i + n \sum_i g_i y_i \end{bmatrix} +$ + + Let's now define 4 summary statistics: + + $\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i$ + + $\bar{g} = \frac{1}{n} \sum_{i=1}^n g_i$ + + $g^T g = \sum_{i=1}^n g_i^2$ + + $g^T y = \sum_{i=1}^n g_i y_i$ + + $\hat{\beta} = \frac{g^T y - n \bar{g} \bar{y}}{g^T g - n \bar{g}^2}$

## Revision as of 17:27, 28 March 2012

Project name Main project page
Next entry

## Calculate OLS estimates with summary statistics for simple linear regression

We obtained data from n individuals. Let be $y_1,\ldots,y_n$ the (quantitative) phenotypes (eg. expression level at a given gene), and $g_1,\ldots,g_n$ the genotypes at a given SNP.

We want to assess the linear relationship between phenotype and genotype. For this with use a simple linear regression:

yi = μ + βxi + εi with $\epsilon_i \rightarrow N(0,\sigma^2)$ and for $i \in {1,\ldots,n}$

In vector-matrix notation:

y = Xθ + ε with $\epsilon \rightarrow N_n(0,\sigma^2 I)$ and θT = (μ,β)

Here is the ordinary-least-square (OLS) estimator of θ:

$\hat{\theta} = (X^T X)^{-1} X^T Y$

$\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \left( \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix} \begin{bmatrix} 1 & g_1 \\ \vdots & \vdots \\ 1 & g_n \end{bmatrix} \right)^{-1} \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_n \end{bmatrix} \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}$

$\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \begin{bmatrix} n & \sum_i g_i \\ \sum_i g_i & \sum_i g_i^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix}$

$\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2} \begin{bmatrix} \sum_i g_i^2 & - \sum_i g_i \\ - \sum_i g_i & n \end{bmatrix} \begin{bmatrix} \sum_i y_i \\ \sum_i g_i y_i \end{bmatrix}$

$\begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \frac{1}{n \sum_i g_i^2 - (\sum_i g_i)^2} \begin{bmatrix} \sum_i g_i^2 \sum_i y_i - \sum_i g_i \sum_i g_i y_i \\ - \sum_i g_i \sum_i y_i + n \sum_i g_i y_i \end{bmatrix}$

Let's now define 4 summary statistics:

$\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i$

$\bar{g} = \frac{1}{n} \sum_{i=1}^n g_i$

$g^T g = \sum_{i=1}^n g_i^2$

$g^T y = \sum_{i=1}^n g_i y_i$

$\hat{\beta} = \frac{g^T y - n \bar{g} \bar{y}}{g^T g - n \bar{g}^2}$