Difference between revisions of "User:Timothee Flutre/Notebook/Postdoc/2011/06/28"

From OpenWetWare
Jump to: navigation, search
(Simple linear regression: add simulation with PVE and R code)
m (Simple linear regression)
Line 88: Line 88:
 
set.seed(1859)
 
set.seed(1859)
 
N <- 100
 
N <- 100
 +
mu <- 5
 
g <- sample(x=0:2, size=N, replace=TRUE, prob=c(0.5, 0.3, 0.2)) # MAF=0.2
 
g <- sample(x=0:2, size=N, replace=TRUE, prob=c(0.5, 0.3, 0.2)) # MAF=0.2
mu <- 5
 
 
beta <- 0.5
 
beta <- 0.5
 
pve <- 0.8
 
pve <- 0.8
 
beta.g.bar <- mean(beta * g)
 
beta.g.bar <- mean(beta * g)
sigma <- sqrt((1/N) * sum((beta * g - beta.g.bar)^2) * (1-pve) / pve) # 0.778
+
sigma <- sqrt((1/N) * sum((beta * g - beta.g.bar)^2) * (1-pve) / pve) # 0.18
 
y <- mu + beta * g + rnorm(n=N, mean=0, sd=sigma)
 
y <- mu + beta * g + rnorm(n=N, mean=0, sd=sigma)
 
plot(x=0, type="n", xlim=range(g), ylim=range(y),
 
plot(x=0, type="n", xlim=range(g), ylim=range(y),

Revision as of 06:10, 30 July 2012

Owwnotebook icon.png Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Simple linear regression

  • Data: Let's assume that we obtained data from Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle N} individuals. We note Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle y_1,\ldots,y_N} the (quantitative) phenotypes (eg. expression level at a given gene), and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle g_1,\ldots,g_N} the genotypes at a given SNP. We want to assess their linear relationship.
  • Model: for this we use a simple linear regression (univariate phenotype, single predictor).

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \forall n \in {1,\ldots,N}, \; y_n = \mu + \beta g_n + \epsilon_n \text{ with } \epsilon_n \sim N(0,\sigma^2)}

In matrix notation:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle y = X \theta + \epsilon} with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \epsilon \sim N_N(0,\sigma^2 I_N)} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \theta^T = (\mu, \beta)}

  • Use only summary statistics: most importantly, we want the following estimates: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat{\beta}} , Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle se(\hat{\beta})} (its standard error) and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat{\sigma}} . In the case where we don't have access to the original data (eg. because genotypes are confidential) but only to some summary statistics (see below), it is still possible to calculate the estimates.

Here is the ordinary-least-square (OLS) estimator of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \theta} :

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat{\theta} = (X^T X)^{-1} X^T Y}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \left( \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_N \end{bmatrix} \begin{bmatrix} 1 & g_1 \\ \vdots & \vdots \\ 1 & g_N \end{bmatrix} \right)^{-1} \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_N \end{bmatrix} \begin{bmatrix} y_1 \\ \vdots \\ y_N \end{bmatrix} }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \begin{bmatrix} N & \sum_n g_n \\ \sum_n g_n & \sum_n g_n^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum_n y_n \\ \sum_n g_n y_n \end{bmatrix} }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \frac{1}{N \sum_n g_n^2 - (\sum_n g_n)^2} \begin{bmatrix} \sum_n g_n^2 & - \sum_n g_n \\ - \sum_n g_n & N \end{bmatrix} \begin{bmatrix} \sum_n y_n \\ \sum_n g_n y_n \end{bmatrix} }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \frac{1}{N \sum_n g_n^2 - (\sum_n g_n)^2} \begin{bmatrix} \sum_n g_n^2 \sum_n y_n - \sum_n g_n \sum_n g_n y_n \\ - \sum_n g_n \sum_n y_n + N \sum_n g_n y_n \end{bmatrix} }

Let's now define 4 summary statistics, very easy to compute:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \bar{y} = \frac{1}{N} \sum_{n=1}^N y_n}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \bar{g} = \frac{1}{N} \sum_{n=1}^N g_n}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle g^T g = \sum_{n=1}^N g_n^2}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle g^T y = \sum_{n=1}^N g_n y_n}

This allows to obtain the estimate of the effect size only by having the summary statistics available:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat{\beta} = \frac{g^T y - N \bar{g} \bar{y}}{g^T g - N \bar{g}^2}}

The same works for the estimate of the standard deviation of the errors:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat{\sigma}^2 = \frac{1}{N-r}(y - X\hat{\theta})^T(y - X\hat{\theta})}

We can also benefit from this for the standard error of the parameters:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\hat{\theta}) = \hat{\sigma}^2 (X^T X)^{-1}}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\hat{\theta}) = \hat{\sigma}^2 \frac{1}{N g^T g - N^2 \bar{g}^2} \begin{bmatrix} g^Tg & -N\bar{g} \\ -N\bar{g} & N \end{bmatrix} }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\hat{\beta}) = \frac{\hat{\sigma}^2}{g^Tg - N\bar{g}^2}}

  • Simulation with a given PVE: when testing an inference model, the first step is usually to simulate data. However, how do we choose the parameters? In our case (linear regression: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle y = \mu + \beta g + \epsilon} ), it is frequent to fix the proportion of variance in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle y} explained by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \beta g} :

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle PVE = \frac{V(\beta g)}{V(y)} = \frac{V(\beta g)}{V(\beta g) + V(\epsilon)}} with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\beta g) = \frac{1}{N}\sum_{n=1}^N (\beta g_n - \bar{\beta g})^2} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\epsilon) = \sigma^2}

This way, by also fixing Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \beta} , it is easy to calculate the corresponding Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma} :

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma = \sqrt{\frac{1}{N}\sum_{n=1}^N (\beta g_n - \bar{\beta g})^2 \frac{1 - PVE}{PVE}}}

Here is some R code implementing this:

set.seed(1859)
N <- 100
mu <- 5
g <- sample(x=0:2, size=N, replace=TRUE, prob=c(0.5, 0.3, 0.2)) # MAF=0.2
beta <- 0.5
pve <- 0.8
beta.g.bar <- mean(beta * g)
sigma <- sqrt((1/N) * sum((beta * g - beta.g.bar)^2) * (1-pve) / pve) # 0.18
y <- mu + beta * g + rnorm(n=N, mean=0, sd=sigma)
plot(x=0, type="n", xlim=range(g), ylim=range(y),
     xlab="genotypes (allele dose)", ylab="phenotypes",
     main="Simple linear regression")
for(i in unique(g))
  points(x=jitter(g[g == i]), y=y[g == i], col=i+1, pch=19)
ols <- lm(y ~ g)
summary(ols) # muhat=5.01, betahat=0.46, R2=0.779
abline(a=coefficients(ols)[1], b=coefficients(ols)[2])