- Prepare journal club on "Analysis of population structure: A unifying framework and novel methods based on sparse factor analysis." by Engelhardt & Stephens (PLoS Genetics 2010).
- From "Inference of population structure using multilocus genotype data" by Pritchard, Stephens & Donnelly (Genetics 2000):
- data: genotypes at L loci for N individuals (matrix X: N x L) from several populations (K, unknown)
- aim: jointly assign individuals to populations while estimating population allele frequencies P, allow admixture, use MCMC
- From "Applied Multivariate Statistical Analysis" (Amazon):
- Let be a vector of p observed variables with as mean vector and as covariance matrix.
- A principal component analysis is concerned with explaining the variance-covariance structure of through a few linear (and uncorrelated) combinations of these variables. Although p components are required to reproduce the total variability, often much of this variability can be accounted for by a small number k of the principal components that depend solely on .
- A factor analysis attempts to describe the covariance relationships among the X's in terms of a few underlying, but unobservable, random quantities called factors. It postulates that is linearly dependent upon k random variables F1,F2,...,Fk called factors, and p additional source of variation ε1,ε2,...,εp called errors. A matrix contains the loadings lij of the ith variable on the jth factor:
- The difference between the factor analysis model above and the multivariate linear regression model, , is that in the latter both and are observed, whereas in the former is not.