Difference between revisions of "User:Timothee Flutre/Notebook/Postdoc/2011/06/28"
(→Simple linear regression: add simulation with PVE and R code) 
m (→Simple linear regression) 

Line 88:  Line 88:  
set.seed(1859)  set.seed(1859)  
N < 100  N < 100  
+  mu < 5  
g < sample(x=0:2, size=N, replace=TRUE, prob=c(0.5, 0.3, 0.2)) # MAF=0.2  g < sample(x=0:2, size=N, replace=TRUE, prob=c(0.5, 0.3, 0.2)) # MAF=0.2  
−  
beta < 0.5  beta < 0.5  
pve < 0.8  pve < 0.8  
beta.g.bar < mean(beta * g)  beta.g.bar < mean(beta * g)  
−  sigma < sqrt((1/N) * sum((beta * g  beta.g.bar)^2) * (1pve) / pve) # 0.  +  sigma < sqrt((1/N) * sum((beta * g  beta.g.bar)^2) * (1pve) / pve) # 0.18 
y < mu + beta * g + rnorm(n=N, mean=0, sd=sigma)  y < mu + beta * g + rnorm(n=N, mean=0, sd=sigma)  
plot(x=0, type="n", xlim=range(g), ylim=range(y),  plot(x=0, type="n", xlim=range(g), ylim=range(y), 
Revision as of 06:10, 30 July 2012
Project name  <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html> 
Simple linear regression
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \forall n \in {1,\ldots,N}, \; y_n = \mu + \beta g_n + \epsilon_n \text{ with } \epsilon_n \sim N(0,\sigma^2)} In matrix notation: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle y = X \theta + \epsilon} with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \epsilon \sim N_N(0,\sigma^2 I_N)} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \theta^T = (\mu, \beta)}
Here is the ordinaryleastsquare (OLS) estimator of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \theta} : Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat{\theta} = (X^T X)^{1} X^T Y} Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \left( \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_N \end{bmatrix} \begin{bmatrix} 1 & g_1 \\ \vdots & \vdots \\ 1 & g_N \end{bmatrix} \right)^{1} \begin{bmatrix} 1 & \ldots & 1 \\ g_1 & \ldots & g_N \end{bmatrix} \begin{bmatrix} y_1 \\ \vdots \\ y_N \end{bmatrix} } Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \begin{bmatrix} N & \sum_n g_n \\ \sum_n g_n & \sum_n g_n^2 \end{bmatrix}^{1} \begin{bmatrix} \sum_n y_n \\ \sum_n g_n y_n \end{bmatrix} } Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \frac{1}{N \sum_n g_n^2  (\sum_n g_n)^2} \begin{bmatrix} \sum_n g_n^2 &  \sum_n g_n \\  \sum_n g_n & N \end{bmatrix} \begin{bmatrix} \sum_n y_n \\ \sum_n g_n y_n \end{bmatrix} } Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \begin{bmatrix} \hat{\mu} \\ \hat{\beta} \end{bmatrix} = \frac{1}{N \sum_n g_n^2  (\sum_n g_n)^2} \begin{bmatrix} \sum_n g_n^2 \sum_n y_n  \sum_n g_n \sum_n g_n y_n \\  \sum_n g_n \sum_n y_n + N \sum_n g_n y_n \end{bmatrix} } Let's now define 4 summary statistics, very easy to compute: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \bar{y} = \frac{1}{N} \sum_{n=1}^N y_n} Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \bar{g} = \frac{1}{N} \sum_{n=1}^N g_n} Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle g^T g = \sum_{n=1}^N g_n^2} Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle g^T y = \sum_{n=1}^N g_n y_n} This allows to obtain the estimate of the effect size only by having the summary statistics available: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat{\beta} = \frac{g^T y  N \bar{g} \bar{y}}{g^T g  N \bar{g}^2}} The same works for the estimate of the standard deviation of the errors: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat{\sigma}^2 = \frac{1}{Nr}(y  X\hat{\theta})^T(y  X\hat{\theta})} We can also benefit from this for the standard error of the parameters: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\hat{\theta}) = \hat{\sigma}^2 (X^T X)^{1}} Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\hat{\theta}) = \hat{\sigma}^2 \frac{1}{N g^T g  N^2 \bar{g}^2} \begin{bmatrix} g^Tg & N\bar{g} \\ N\bar{g} & N \end{bmatrix} } Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\hat{\beta}) = \frac{\hat{\sigma}^2}{g^Tg  N\bar{g}^2}}
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle PVE = \frac{V(\beta g)}{V(y)} = \frac{V(\beta g)}{V(\beta g) + V(\epsilon)}} with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\beta g) = \frac{1}{N}\sum_{n=1}^N (\beta g_n  \bar{\beta g})^2} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle V(\epsilon) = \sigma^2} This way, by also fixing Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \beta} , it is easy to calculate the corresponding Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma} : Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma = \sqrt{\frac{1}{N}\sum_{n=1}^N (\beta g_n  \bar{\beta g})^2 \frac{1  PVE}{PVE}}} Here is some R code implementing this: set.seed(1859) N < 100 mu < 5 g < sample(x=0:2, size=N, replace=TRUE, prob=c(0.5, 0.3, 0.2)) # MAF=0.2 beta < 0.5 pve < 0.8 beta.g.bar < mean(beta * g) sigma < sqrt((1/N) * sum((beta * g  beta.g.bar)^2) * (1pve) / pve) # 0.18 y < mu + beta * g + rnorm(n=N, mean=0, sd=sigma) plot(x=0, type="n", xlim=range(g), ylim=range(y), xlab="genotypes (allele dose)", ylab="phenotypes", main="Simple linear regression") for(i in unique(g)) points(x=jitter(g[g == i]), y=y[g == i], col=i+1, pch=19) ols < lm(y ~ g) summary(ols) # muhat=5.01, betahat=0.46, R2=0.779 abline(a=coefficients(ols)[1], b=coefficients(ols)[2])
