User:Hussein Alasadi/Notebook/stephens/2013/10/03

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(Notes from Meeting)
(Notes from Meeting)
(22 intermediate revisions not shown.)
Line 36: Line 36:
We assume a prior on our vector of frequencies based on our panel of SNPs <math> (M) </math> of dimension <math> 2mxp </math>  
We assume a prior on our vector of frequencies based on our panel of SNPs <math> (M) </math> of dimension <math> 2mxp </math>  
-
<math> \vec{f_{i,k}} </math> ~ <math> MVN(\mu, \sum) </math>
+
<math> \vec{f_{i,k}} </math> ~ <math> MVN(\mu, \Sigma) </math>
<math> \mu = (1-\theta)f^{panel} + \frac{\theta}{2} 1 </math>
<math> \mu = (1-\theta)f^{panel} + \frac{\theta}{2} 1 </math>
-
<math> \sum = (1-\theta)^2 S + \frac{\theta}{2}(1 - \frac{\theta}{2})I </math>
+
<math> \Sigma = (1-\theta)^2 S + \frac{\theta}{2}(1 - \frac{\theta}{2})I </math>
where <math> S_{i,j} = \sum_{i,j}^{panel}</math> if i = j or <math> e^{-\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} </math> if i not equal to j
where <math> S_{i,j} = \sum_{i,j}^{panel}</math> if i = j or <math> e^{-\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} </math> if i not equal to j
Line 51: Line 51:
* '''conditional distribution'''
* '''conditional distribution'''
-
<math> (f_{i,k,2}, .... , f_{i,k,p}) | f_{i,k,1}, M </math> ~ <math> MVN(\bar{\mu}, \bar{\sum}) </math>
+
<math> (f_{i,k,2}, .... , f_{i,k,p}) | f_{i,k,1}, M </math> ~ <math> MVN(\bar{\mu}, \bar{\Sigma}) </math>
-
The conditional distribution is easily obtained when we use a result derived [http://openwetware.org/wiki/User:Hussein_Alasadi/Notebook/stephens/2013/10/14 here].  
+
The conditional distribution is easily obtained when we use a result derived [http://openwetware.org/wiki/User:Hussein_Alasadi/Notebook/stephens/2013/10/14 here].
 +
 
 +
let <math> X_2 = (f_{i,k,2}, .... , f_{i,k,p}) </math> and <math> X_1 = f_{i,k,1} </math>
 +
 
 +
<math> X_2 | X_1, M </math> ~ <math> N(\vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}) </math>
 +
 
 +
Thus <math>  \bar{\mu} = \vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \bar{\Sigma} = \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12} </math>
 +
 
 +
And equivalently we could derive the distribution <math> X_1 | X_2, M </math> which is again <math> f_{i,k,1} | f_{i,k,2}, .... , f_{i,k,p}),  M </math>
 +
 
 +
*'''Likelihood for frequency a the test SNP t given all data'''
 +
 
 +
let <math>f_{obs} = \prod_{j \not= t} f_{i,k,j} </math>
 +
 
 +
<math> L(f_{i,k,t}^{true}) = P(f_{obs} | f_{i,k,t}^{true}, M) = \frac{P( f_{i,k,t}^{true}  | M, f_{obs}) P(f^{obs}|M)}{P(f_{i,k,t}^{true} | M)}</math>
 +
 
 +
Confused here, can we just use the expression derived above for <math>P( f_{i,k,t}^{true}  | M, f_{obs})  </math>. Also, isn't <math> f_{i,k,t}^{true} | M </math> ~
 +
<math> N(\mu_1, \Sigma_{11}) </math> and <math> f^{obs} | M </math> ~ <math> N(\mu_2, \Sigma_{22}) </math>. But, how do we then incorporate <math> \beta </math> into the likelihood calculation?
 +
 
 +
 
 +
But maybe we want to incorporate dispersion and measurement error parameters
 +
 
 +
Then:
 +
<math> f_{i,k,t}^{true}  | M </math> ~ <math> N(\mu, \sigma^2 \Sigma) </math> The parameter <math> \sigma^2 </math> allows for over-dispersion
   
   
 +
<math> f^{obs}| M </math> ~ <math> N_{p-1} (\mu_2, \sigma^2 \Sigma_{22} + \epsilon^2 I) </math> where <math> \epsilon^2 </math> allows for measurement error.
 +
 +
and I don't understand <math> f_{obs} | f_{i,k,t}^{true}, M </math>. Shouldn't it come from (2.12) and not (2.13) - ask Matthew
 +
 +
 +

Revision as of 22:04, 16 October 2013

analyzing pooled sequenced data with selection Main project page
Next entry

Notes from Meeting

Consider a single lineage for now.

Xj = frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool)

  • Data:

 (n_j^0, n_j^1) = number of "0", "1" alleles at SNP j ( n_j = n_j^0 + n_j^1 )


  • Normal approximation

 n_j^1 ~ Bin(n_j, X_j) \approx N(n_jX_j, n_jX_j(1-X_j)) Normal approximation to binomial

 \frac{n_j^1}{n_j} \approx N(X_j, \frac{X_j(1-X_j)}{n_j}) The variance of this distribution results from error due to binomial sampling.

To simplify, we just plug in \hat{X_j} = \frac{n_j^1}{n_j} for Xj

 \implies \frac{n_j^1}{n_j} | X_j \approx N(X_j, \frac{\hat{X_j}(1-\hat{X_j})}{n_j})

  • notation

fi,k,j = frequency of reference allele in group i, replicate and SNP j.

 \vec{f_{i,k}} =  vector of frequencies

Without loss of generality, we assume that the putative selected site is site j = 1

  • Model

We assume a prior on our vector of frequencies based on our panel of SNPs (M) of dimension 2mxp

 \vec{f_{i,k}} ~ MVN(μ,Σ)

 \mu = (1-\theta)f^{panel} + \frac{\theta}{2} 1

 \Sigma = (1-\theta)^2 S + \frac{\theta}{2}(1 - \frac{\theta}{2})I

where  S_{i,j} = \sum_{i,j}^{panel} if i = j or  e^{-\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} if i not equal to j

 \theta = \frac{(\sum_{i=1}^{2m-1} \frac{1}{i})^{-1}}{2m + (\sum_{i=1}^{2m-1} \frac{1}{i})^{-1}}


  • at selected site

 log \frac{f_{i,k,1}}{1-f_{i,k,1}} = \mu + \beta g_i + \epsilon_{i,k}

  • conditional distribution

(fi,k,2,....,fi,k,p) | fi,k,1,M ~  MVN(\bar{\mu}, \bar{\Sigma}) The conditional distribution is easily obtained when we use a result derived here.

let X2 = (fi,k,2,....,fi,k,p) and X1 = fi,k,1

X2 | X1,M ~  N(\vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12})

Thus   \bar{\mu} = \vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \bar{\Sigma} = \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}

And equivalently we could derive the distribution X1 | X2,M which is again fi,k,1 | fi,k,2,....,fi,k,p),M

  • Likelihood for frequency a the test SNP t given all data

let f_{obs} = \prod_{j \not= t} f_{i,k,j}

 L(f_{i,k,t}^{true}) = P(f_{obs} | f_{i,k,t}^{true}, M) = \frac{P( f_{i,k,t}^{true}  | M, f_{obs}) P(f^{obs}|M)}{P(f_{i,k,t}^{true} | M)}

Confused here, can we just use the expression derived above for P( f_{i,k,t}^{true}  | M, f_{obs})  . Also, isn't  f_{i,k,t}^{true} | M ~ N111) and fobs | M ~ N222). But, how do we then incorporate β into the likelihood calculation?


But maybe we want to incorporate dispersion and measurement error parameters

Then:  f_{i,k,t}^{true}  | M ~ N(μ,σ2Σ) The parameter σ2 allows for over-dispersion

fobs | M ~ Np − 122Σ22 + ε2I) where ε2 allows for measurement error.

and I don't understand  f_{obs} | f_{i,k,t}^{true}, M . Shouldn't it come from (2.12) and not (2.13) - ask Matthew





Personal tools