User:Hussein Alasadi/Notebook/stephens/2013/10/03
From OpenWetWare
(→Notes from Meeting) 
(→Notes from Meeting) 

(22 intermediate revisions not shown.)  
Line 36:  Line 36:  
We assume a prior on our vector of frequencies based on our panel of SNPs <math> (M) </math> of dimension <math> 2mxp </math>  We assume a prior on our vector of frequencies based on our panel of SNPs <math> (M) </math> of dimension <math> 2mxp </math>  
  <math> \vec{f_{i,k}} </math> ~ <math> MVN(\mu, \  +  <math> \vec{f_{i,k}} </math> ~ <math> MVN(\mu, \Sigma) </math> 
<math> \mu = (1\theta)f^{panel} + \frac{\theta}{2} 1 </math>  <math> \mu = (1\theta)f^{panel} + \frac{\theta}{2} 1 </math>  
  <math> \  +  <math> \Sigma = (1\theta)^2 S + \frac{\theta}{2}(1  \frac{\theta}{2})I </math> 
where <math> S_{i,j} = \sum_{i,j}^{panel}</math> if i = j or <math> e^{\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} </math> if i not equal to j  where <math> S_{i,j} = \sum_{i,j}^{panel}</math> if i = j or <math> e^{\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} </math> if i not equal to j  
Line 51:  Line 51:  
* '''conditional distribution'''  * '''conditional distribution'''  
  <math> (f_{i,k,2}, .... , f_{i,k,p})  f_{i,k,1}, M </math> ~ <math> MVN(\bar{\mu}, \bar{\  +  <math> (f_{i,k,2}, .... , f_{i,k,p})  f_{i,k,1}, M </math> ~ <math> MVN(\bar{\mu}, \bar{\Sigma}) </math> 
  The conditional distribution is easily obtained when we use a result derived [http://openwetware.org/wiki/User:Hussein_Alasadi/Notebook/stephens/2013/10/14 here].  +  The conditional distribution is easily obtained when we use a result derived [http://openwetware.org/wiki/User:Hussein_Alasadi/Notebook/stephens/2013/10/14 here]. 
+  
+  let <math> X_2 = (f_{i,k,2}, .... , f_{i,k,p}) </math> and <math> X_1 = f_{i,k,1} </math>  
+  
+  <math> X_2  X_1, M </math> ~ <math> N(\vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{1} (x_1  \mu_1), \Sigma_{22}  \Sigma_{21}\Sigma_{11}^{1}\Sigma_{12}) </math>  
+  
+  Thus <math> \bar{\mu} = \vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{1} (x_1  \mu_1), \bar{\Sigma} = \Sigma_{22}  \Sigma_{21}\Sigma_{11}^{1}\Sigma_{12} </math>  
+  
+  And equivalently we could derive the distribution <math> X_1  X_2, M </math> which is again <math> f_{i,k,1}  f_{i,k,2}, .... , f_{i,k,p}), M </math>  
+  
+  *'''Likelihood for frequency a the test SNP t given all data'''  
+  
+  let <math>f_{obs} = \prod_{j \not= t} f_{i,k,j} </math>  
+  
+  <math> L(f_{i,k,t}^{true}) = P(f_{obs}  f_{i,k,t}^{true}, M) = \frac{P( f_{i,k,t}^{true}  M, f_{obs}) P(f^{obs}M)}{P(f_{i,k,t}^{true}  M)}</math>  
+  
+  Confused here, can we just use the expression derived above for <math>P( f_{i,k,t}^{true}  M, f_{obs}) </math>. Also, isn't <math> f_{i,k,t}^{true}  M </math> ~  
+  <math> N(\mu_1, \Sigma_{11}) </math> and <math> f^{obs}  M </math> ~ <math> N(\mu_2, \Sigma_{22}) </math>. But, how do we then incorporate <math> \beta </math> into the likelihood calculation?  
+  
+  
+  But maybe we want to incorporate dispersion and measurement error parameters  
+  
+  Then:  
+  <math> f_{i,k,t}^{true}  M </math> ~ <math> N(\mu, \sigma^2 \Sigma) </math> The parameter <math> \sigma^2 </math> allows for overdispersion  
+  <math> f^{obs} M </math> ~ <math> N_{p1} (\mu_2, \sigma^2 \Sigma_{22} + \epsilon^2 I) </math> where <math> \epsilon^2 </math> allows for measurement error.  
+  
+  and I don't understand <math> f_{obs}  f_{i,k,t}^{true}, M </math>. Shouldn't it come from (2.12) and not (2.13)  ask Matthew  
+  
+  
+  
Revision as of 22:04, 16 October 2013
analyzing pooled sequenced data with selection  Main project page Next entry 
Notes from MeetingConsider a single lineage for now. X_{j} = frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool)
= number of "0", "1" alleles at SNP j ()
~ Normal approximation to binomial The variance of this distribution results from error due to binomial sampling. To simplify, we just plug in for X_{j}
f_{i,k,j} = frequency of reference allele in group i, replicate and SNP j. vector of frequencies Without loss of generality, we assume that the putative selected site is site j = 1
We assume a prior on our vector of frequencies based on our panel of SNPs (M) of dimension 2mxp ~ MVN(μ,Σ)
where if i = j or if i not equal to j
(f_{i,k,2},....,f_{i,k,p})  f_{i,k,1},M ~ The conditional distribution is easily obtained when we use a result derived here. let X_{2} = (f_{i,k,2},....,f_{i,k,p}) and X_{1} = f_{i,k,1} X_{2}  X_{1},M ~ Thus And equivalently we could derive the distribution X_{1}  X_{2},M which is again f_{i,k,1}  f_{i,k,2},....,f_{i,k,p}),M
let
Confused here, can we just use the expression derived above for . Also, isn't ~ N(μ_{1},Σ_{11}) and f^{obs}  M ~ N(μ_{2},Σ_{22}). But, how do we then incorporate β into the likelihood calculation?
Then: ~ N(μ,σ^{2}Σ) The parameter σ^{2} allows for overdispersion f^{obs}  M ~ N_{p − 1}(μ_{2},σ^{2}Σ_{22} + ε^{2}I) where ε^{2} allows for measurement error. and I don't understand . Shouldn't it come from (2.12) and not (2.13)  ask Matthew
