Difference between revisions of "User:Hussein Alasadi/Notebook/stephens/2013/10/03"

From OpenWetWare
Jump to: navigation, search
(Notes from Meeting)
(Notes from Meeting)
(22 intermediate revisions by the same user not shown)
Line 36: Line 36:
 
We assume a prior on our vector of frequencies based on our panel of SNPs <math> (M) </math> of dimension <math> 2mxp </math>  
 
We assume a prior on our vector of frequencies based on our panel of SNPs <math> (M) </math> of dimension <math> 2mxp </math>  
  
<math> \vec{f_{i,k}} </math> ~ <math> MVN(\mu, \sum) </math>
+
<math> \vec{f_{i,k}} </math> ~ <math> MVN(\mu, \Sigma) </math>
  
 
<math> \mu = (1-\theta)f^{panel} + \frac{\theta}{2} 1 </math>
 
<math> \mu = (1-\theta)f^{panel} + \frac{\theta}{2} 1 </math>
  
<math> \sum = (1-\theta)^2 S + \frac{\theta}{2}(1 - \frac{\theta}{2})I </math>
+
<math> \Sigma = (1-\theta)^2 S + \frac{\theta}{2}(1 - \frac{\theta}{2})I </math>
  
 
where <math> S_{i,j} = \sum_{i,j}^{panel}</math> if i = j or <math> e^{-\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} </math> if i not equal to j
 
where <math> S_{i,j} = \sum_{i,j}^{panel}</math> if i = j or <math> e^{-\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} </math> if i not equal to j
Line 51: Line 51:
  
 
* '''conditional distribution'''
 
* '''conditional distribution'''
<math> (f_{i,k,2}, .... , f_{i,k,p}) | f_{i,k,1}, M </math> ~ <math> MVN(\bar{\mu}, \bar{\sum}) </math>
+
<math> (f_{i,k,2}, .... , f_{i,k,p}) | f_{i,k,1}, M </math> ~ <math> MVN(\bar{\mu}, \bar{\Sigma}) </math>
The conditional distribution is easily obtained when we use a result derived [http://openwetware.org/wiki/User:Hussein_Alasadi/Notebook/stephens/2013/10/14 here].  
+
The conditional distribution is easily obtained when we use a result derived [http://openwetware.org/wiki/User:Hussein_Alasadi/Notebook/stephens/2013/10/14 here].
 +
 
 +
let <math> X_2 = (f_{i,k,2}, .... , f_{i,k,p}) </math> and <math> X_1 = f_{i,k,1} </math>
 +
 
 +
<math> X_2 | X_1, M </math> ~ <math> N(\vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}) </math>
 +
 
 +
Thus <math>  \bar{\mu} = \vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \bar{\Sigma} = \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12} </math>
 +
 
 +
And equivalently we could derive the distribution <math> X_1 | X_2, M </math> which is again <math> f_{i,k,1} | f_{i,k,2}, .... , f_{i,k,p}),  M </math>
 +
 
 +
*'''Likelihood for frequency a the test SNP t given all data'''
 +
 
 +
let <math>f_{obs} = \prod_{j \not= t} f_{i,k,j} </math>
 +
 
 +
<math> L(f_{i,k,t}^{true}) = P(f_{obs} | f_{i,k,t}^{true}, M) = \frac{P( f_{i,k,t}^{true}  | M, f_{obs}) P(f^{obs}|M)}{P(f_{i,k,t}^{true} | M)}</math>
 +
 
 +
Confused here, can we just use the expression derived above for <math>P( f_{i,k,t}^{true}  | M, f_{obs})  </math>. Also, isn't <math> f_{i,k,t}^{true} | M </math> ~
 +
<math> N(\mu_1, \Sigma_{11}) </math> and <math> f^{obs} | M </math> ~ <math> N(\mu_2, \Sigma_{22}) </math>. But, how do we then incorporate <math> \beta </math> into the likelihood calculation?
 +
 
 +
 
 +
But maybe we want to incorporate dispersion and measurement error parameters
 +
 
 +
Then:
 +
<math> f_{i,k,t}^{true}  | M </math> ~ <math> N(\mu, \sigma^2 \Sigma) </math> The parameter <math> \sigma^2 </math> allows for over-dispersion
 
   
 
   
 +
<math> f^{obs}| M </math> ~ <math> N_{p-1} (\mu_2, \sigma^2 \Sigma_{22} + \epsilon^2 I) </math> where <math> \epsilon^2 </math> allows for measurement error.
 +
 +
and I don't understand <math> f_{obs} | f_{i,k,t}^{true}, M </math>. Shouldn't it come from (2.12) and not (2.13) - ask Matthew
 +
 +
 +
  
  

Revision as of 19:04, 16 October 2013

Owwnotebook icon.png analyzing pooled sequenced data with selection <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Notes from Meeting

Consider a single lineage for now.

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle X_j} = frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool)

  • Data:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle (n_j^0, n_j^1) } = number of "0", "1" alleles at SNP j (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle n_j = n_j^0 + n_j^1 } )


  • Normal approximation

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle n_j^1} ~ Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle Bin(n_j, X_j) \approx N(n_jX_j, n_jX_j(1-X_j))} Normal approximation to binomial

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \frac{n_j^1}{n_j} \approx N(X_j, \frac{X_j(1-X_j)}{n_j}) } The variance of this distribution results from error due to binomial sampling.

To simplify, we just plug in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \hat{X_j} = \frac{n_j^1}{n_j}} for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle X_j }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \implies \frac{n_j^1}{n_j} | X_j \approx N(X_j, \frac{\hat{X_j}(1-\hat{X_j})}{n_j}) }

  • notation

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f_{i,k,j} = } frequency of reference allele in group i, replicate and SNP j.

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \vec{f_{i,k}} = } vector of frequencies

Without loss of generality, we assume that the putative selected site is site Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle j = 1 }

  • Model

We assume a prior on our vector of frequencies based on our panel of SNPs Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle (M) } of dimension Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle 2mxp }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \vec{f_{i,k}} } ~ Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle MVN(\mu, \Sigma) }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mu = (1-\theta)f^{panel} + \frac{\theta}{2} 1 }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \Sigma = (1-\theta)^2 S + \frac{\theta}{2}(1 - \frac{\theta}{2})I }

where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle S_{i,j} = \sum_{i,j}^{panel}} if i = j or Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle e^{-\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} } if i not equal to j

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \theta = \frac{(\sum_{i=1}^{2m-1} \frac{1}{i})^{-1}}{2m + (\sum_{i=1}^{2m-1} \frac{1}{i})^{-1}} }


  • at selected site

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle log \frac{f_{i,k,1}}{1-f_{i,k,1}} = \mu + \beta g_i + \epsilon_{i,k} }

  • conditional distribution

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle (f_{i,k,2}, .... , f_{i,k,p}) | f_{i,k,1}, M } ~ Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle MVN(\bar{\mu}, \bar{\Sigma}) } The conditional distribution is easily obtained when we use a result derived here.

let Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle X_2 = (f_{i,k,2}, .... , f_{i,k,p}) } and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle X_1 = f_{i,k,1} }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle X_2 | X_1, M } ~ Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle N(\vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}) }

Thus Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \bar{\mu} = \vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \bar{\Sigma} = \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12} }

And equivalently we could derive the distribution Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle X_1 | X_2, M } which is again Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f_{i,k,1} | f_{i,k,2}, .... , f_{i,k,p}), M }

  • Likelihood for frequency a the test SNP t given all data

let Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f_{obs} = \prod_{j \not= t} f_{i,k,j} }

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle L(f_{i,k,t}^{true}) = P(f_{obs} | f_{i,k,t}^{true}, M) = \frac{P( f_{i,k,t}^{true} | M, f_{obs}) P(f^{obs}|M)}{P(f_{i,k,t}^{true} | M)}}

Confused here, can we just use the expression derived above for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle P( f_{i,k,t}^{true} | M, f_{obs}) } . Also, isn't Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f_{i,k,t}^{true} | M } ~ Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle N(\mu_1, \Sigma_{11}) } and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f^{obs} | M } ~ Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle N(\mu_2, \Sigma_{22}) } . But, how do we then incorporate Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \beta } into the likelihood calculation?


But maybe we want to incorporate dispersion and measurement error parameters

Then: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f_{i,k,t}^{true} | M } ~ Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle N(\mu, \sigma^2 \Sigma) } The parameter Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma^2 } allows for over-dispersion

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f^{obs}| M } ~ Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle N_{p-1} (\mu_2, \sigma^2 \Sigma_{22} + \epsilon^2 I) } where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \epsilon^2 } allows for measurement error.

and I don't understand Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f_{obs} | f_{i,k,t}^{true}, M } . Shouldn't it come from (2.12) and not (2.13) - ask Matthew