User:Timothee Flutre/Notebook/Postdoc/2012/03/04
From OpenWetWare
Main project page Previous entry Next entry
| |
"Advanced Data Analysis from an Elementary Point of View" by Cosma Shalizi(This page summarizes my notes about this great course. All the course is available online, so you're likely to prefer to refer to it directly.)
1.1 Statistics, Data Analysis, Regression 1.2 Guessing the Value of a Random Variable Use mean squared error to see how bad we are doing when guessing value of Y by using a: MSE(a) = E[(Y − a)2] MSE(a) = (E[Y − a])2 + V[Y − a] MSE(a) = (E[Y] − a)2 + V[Y]
1.2.1 Estimating the Expected Value Sample mean: If the (yi) are iid, law of large numbers says 1.3 The Regression Function Use X (predictor or independent variable or covariate or input) to predict Y (dependent or variable or output or response). How bad are we doing when using f(X) to predict Y? MSE(f(X)) = E[(Y − f(X))2] Use law of total expectation (E[U] = E[E[U | V]]): MSE(f(X)) = E[E[(Y − f(X))2 | X]] MSE(f(X)) = E[V[Y | X] + (E[Y − f(X) | X])2] Regression function: r(x) = E[Y | X = x] 1.3.1 Some Disclaimers Usually we observe Y | X = r(X) + η(X), ie. η (noise variable with mean 0 and variance 1.4 Estimating the Regression Function Use conditional sample means: Works only when X is discrete. 1.4.1 The Bias-Variance Tradeoff
In fact, we have analyzed
Even if our method is unbiased ( A method is consistent (for r) when both the approximation bias and the estimation variance go to 0 when we get more and more data. 1.4.2 The Bias-Variance Trade-Off in Action 1.4.3 Ordinary Least Squares Linear Regression as Smoothing Assume X is one-dimensional and both X and Y are centered. Choose to approximate r(x) by α + βx. Need to find their values a and b minimizing the MSE. MSE(α,β) = E[(Y − α − βX)2] MSE(α,β) = E[E[(Y − α − βX)2 | X]] MSE(α,β) = E[V[Y | X] + (E[Y − α − βX) | X])2] MSE(α,β) = E[V[Y | X]] + E[(E[Y − α − βX) | X])2]
Now, estimate a and b from the data (replacing population values by sample values, or minimizing the residual sum of squares):
Least-square linear regression is thus a smoothing of the data:
Indeed, the prediction is a weighted average of the observed values yi, where the weights are proportional to how far xi is from the center of the data, relative to the variance, and proportional to the magnitude of x. Note that the weight of a data point depends on how far it is from the center of all the data, not how far it is from the point at which we are trying to predict. 1.5 Linear Smoothers
Sample mean: Ordinary linear regression: 1.5.1 k-Nearest-Neighbor Regression
1.5.2 Kernel Smoothers For instance use 1.6 Exercises What minimizes the mean absolute error? MAE(a) = E[ | Y − a | ]
Using Leibniz rule for differentiation under the integral:
The median minimizes the MAE. | |

and central limit theorem indicates how fast convergence is (squared error is about
) depends on X...
where
is a random regression function estimated using n random pairs
, no approximation bias), we can still have a lot of variance in our estimates (
large).
and
if
where h is the bandwidth so that


