

(6 intermediate revisions not shown.) 
Line 6: 
Line 6: 
  colspan="2"    colspan="2" 
 <! ##### DO NOT edit above this line unless you know what you are doing. ##### >   <! ##### DO NOT edit above this line unless you know what you are doing. ##### > 
  ==Notes from Meeting==
 
  Consider a single lineage for now.
 
 
 
  <math>X_j</math> = frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool)
 
 
 
  *'''Data:'''
 
  <math> (n_j^0, n_j^1) </math> = number of "0", "1" alleles at SNP j (<math> n_j = n_j^0 + n_j^1 </math>)
 
 
 
 
 
  *'''Normal approximation'''
 
  <math> n_j^1</math> ~ <math>Bin(n_j, X_j) \approx N(n_jX_j, n_jX_j(1X_j))</math> Normal approximation to binomial
 
 
 
  <math> \frac{n_j^1}{n_j} \approx N(X_j, \frac{X_j(1X_j)}{n_j}) </math>
 
  The variance of this distribution results from error due to binomial sampling.
 
 
 
  To simplify, we just plug in <math>\hat{X_j} = \frac{n_j^1}{n_j}</math> for <math> X_j </math>
 
 
 
  <math> \implies \frac{n_j^1}{n_j}  X_j \approx N(X_j, \frac{\hat{X_j}(1\hat{X_j})}{n_j}) </math>
 
 
 
  *'''notation'''
 
 
 
  <math>f_{i,k,j} = </math> frequency of reference allele in group i, replicate and SNP j.
 
 
 
  <math> \vec{f_{i,k}} = </math> vector of frequencies
 
 
 
  Without loss of generality, we assume that the putative selected site is site <math> j = 1 </math>
 
 
 
  * '''Model'''
 
  We assume a prior on our vector of frequencies based on our panel of SNPs <math> (M) </math> of dimension <math> 2mxp </math>
 
 
 
  <math> \vec{f_{i,k}} </math> ~ <math> MVN(\mu, \Sigma) </math>
 
 
 
  <math> \mu = (1\theta)f^{panel} + \frac{\theta}{2} 1 </math>
 
 
 
  <math> \Sigma = (1\theta)^2 S + \frac{\theta}{2}(1  \frac{\theta}{2})I </math>
 
 
 
  where <math> S_{i,j} = \sum_{i,j}^{panel}</math> if i = j or <math> e^{\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} </math> if i not equal to j
 
 
 
  <math> \theta = \frac{(\sum_{i=1}^{2m1} \frac{1}{i})^{1}}{2m + (\sum_{i=1}^{2m1} \frac{1}{i})^{1}} </math>
 
 
 
 
 
  * '''at selected site'''
 
  <math> log \frac{f_{i,k,1}}{1f_{i,k,1}} = \mu + \beta g_i + \epsilon_{i,k} </math>
 
 
 
  * '''conditional distribution'''
 
  <math> (f_{i,k,2}, .... , f_{i,k,p})  f_{i,k,1}, M </math> ~ <math> MVN(\bar{\mu}, \bar{\Sigma}) </math>
 
  The conditional distribution is easily obtained when we use a result derived [http://openwetware.org/wiki/User:Hussein_Alasadi/Notebook/stephens/2013/10/14 here].
 
 
 
  let <math> X_2 = (f_{i,k,2}, .... , f_{i,k,p}) </math> and <math> X_1 = f_{i,k,1} </math>
 
 
 
  <math> X_2  X_1, M </math> ~ <math> N(\vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{1} (x_1  \mu_1), \Sigma_{22}  \Sigma_{21}\Sigma_{11}^{1}\Sigma_{12}) </math>
 
 
 
  Thus <math> \bar{\mu} = \vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{1} (x_1  \mu_1), \bar{\Sigma} = \Sigma_{22}  \Sigma_{21}\Sigma_{11}^{1}\Sigma_{12} </math>
 
 
 
  And equivalently we could derive the distribution <math> X_1  X_2, M </math> which is again <math> f_{i,k,1}  f_{i,k,2}, .... , f_{i,k,p}), M </math>
 
 
 
  *'''Likelihood for frequency a the test SNP t given all data'''
 
 
 
  let <math>f_{obs} = \prod_{j \not= t} f_{i,k,j} </math>
 
 
 
  <math> L(f_{i,k,t}^{true}) = P(f_{obs}  f_{i,k,t}^{true}, M) = \frac{P( f_{i,k,t}^{true}  M, f_{obs}) P(f^{obs}M)}{P(f_{i,k,t}^{true}  M)}</math>
 
 
 
  where <math> f_{i,k,t}^{true}  M </math> ~ <math> N(\mu, \sigma^2 \Sigma) </math> The parameter <math> \sigma^2 </math> allows for overdispersion
 
 
 
  where <math> f^{obs} M </math> ~ <math> N_{p1} (\mu_2, \sigma^2 \Sigma_{22} + \epsilon^2 I) </math> where <math> \epsilon^2 </math> allows for measurement error.
 
 
 
  and I don't understand <math> f_{obs}  f_{i,k,t}^{true}, M </math>. Shouldn't it come from (2.12) and not (2.13)  ask Matthew
 
 
 
 
 
 
 
 
 
 
 
 
 
 <! ##### DO NOT edit below this line unless you know what you are doing. ##### >   <! ##### DO NOT edit below this line unless you know what you are doing. ##### > 
 }   } 
   
 __NOTOC__   __NOTOC__ 