Revision as of 20:51, 16 October 2013 (view source) (→Notes from Meeting)← Previous diff Current revision (20:25, 20 October 2013) (view source) (→Notes from Meeting) (18 intermediate revisions not shown.) Line 6: Line 6: | colspan="2"| | colspan="2"| - ==Notes from Meeting== - Consider a single lineage for now. - - $X_j$ = frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool) - - *'''Data:''' - $(n_j^0, n_j^1)$ = number of "0", "1" alleles at SNP j ($n_j = n_j^0 + n_j^1$) - - - *'''Normal approximation''' - $n_j^1$ ~ $Bin(n_j, X_j) \approx N(n_jX_j, n_jX_j(1-X_j))$ Normal approximation to binomial - - $\frac{n_j^1}{n_j} \approx N(X_j, \frac{X_j(1-X_j)}{n_j})$ - The variance of this distribution results from error due to binomial sampling. - - To simplify, we just plug in $\hat{X_j} = \frac{n_j^1}{n_j}$ for $X_j$ - - $\implies \frac{n_j^1}{n_j} | X_j \approx N(X_j, \frac{\hat{X_j}(1-\hat{X_j})}{n_j})$ - - *'''notation''' - - $f_{i,k,j} =$ frequency of reference allele in group i, replicate and SNP j. - - $\vec{f_{i,k}} =$ vector of frequencies - - Without loss of generality, we assume that the putative selected site is site $j = 1$ - - * '''Model''' - We assume a prior on our vector of frequencies based on our panel of SNPs $(M)$ of dimension $2mxp$ - - $\vec{f_{i,k}}$ ~ $MVN(\mu, \Sigma)$ - - $\mu = (1-\theta)f^{panel} + \frac{\theta}{2} 1$ - - $\Sigma = (1-\theta)^2 S + \frac{\theta}{2}(1 - \frac{\theta}{2})I$ - - where $S_{i,j} = \sum_{i,j}^{panel}$ if i = j or $e^{-\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}}$ if i not equal to j - - $\theta = \frac{(\sum_{i=1}^{2m-1} \frac{1}{i})^{-1}}{2m + (\sum_{i=1}^{2m-1} \frac{1}{i})^{-1}}$ - - - * '''at selected site''' - $log \frac{f_{i,k,1}}{1-f_{i,k,1}} = \mu + \beta g_i + \epsilon_{i,k}$ - - * '''conditional distribution''' - $(f_{i,k,2}, .... , f_{i,k,p}) | f_{i,k,1}, M$ ~ $MVN(\bar{\mu}, \bar{\Sigma})$ - The conditional distribution is easily obtained when we use a result derived [http://openwetware.org/wiki/User:Hussein_Alasadi/Notebook/stephens/2013/10/14 here]. - - let $X_2 = (f_{i,k,2}, .... , f_{i,k,p})$ and $X_1 = f_{i,k,1}$ - - $X_2 | X_1, M$ ~ $N(\vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12})$ - - Thus $\bar{\mu} = \vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \bar{\Sigma} = \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}$ - - *'''Likelihood for frequency a the test SNP t given all data''' - $L(f_{i,k,t}^{true}) = P(\prod_{j \not= t} f_{i,k,j} | y_j, M)$ - - |} |} __NOTOC__ __NOTOC__