User:Hussein Alasadi/Notebook/stephens/2013/10/03

From OpenWetWare
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
analyzing pooled sequenced data with selection <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Notes from Meeting

Consider a single lineage for now.

[math]\displaystyle{ X_j }[/math] = frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool)

  • Data:

[math]\displaystyle{ (n_j^0, n_j^1) }[/math] = number of "0", "1" alleles at SNP j ([math]\displaystyle{ n_j = n_j^0 + n_j^1 }[/math])


  • Normal approximation

[math]\displaystyle{ n_j^1 }[/math] ~ [math]\displaystyle{ Bin(n_j, X_j) \approx N(n_jX_j, n_jX_j(1-X_j)) }[/math] Normal approximation to binomial

[math]\displaystyle{ \frac{n_j^1}{n_j} \approx N(X_j, \frac{X_j(1-X_j)}{n_j}) }[/math] The variance of this distribution results from error due to binomial sampling.

To simplify, we just plug in [math]\displaystyle{ \hat{X_j} = \frac{n_j^1}{n_j} }[/math] for [math]\displaystyle{ X_j }[/math]

[math]\displaystyle{ \implies \frac{n_j^1}{n_j} | X_j \approx N(X_j, \frac{\hat{X_j}(1-\hat{X_j})}{n_j}) }[/math]

  • notation

[math]\displaystyle{ f_{i,k,j} = }[/math] frequency of reference allele in group i, replicate and SNP j.

[math]\displaystyle{ \vec{f_{i,k}} = }[/math] vector of frequencies

Without loss of generality, we assume that the putative selected site is site [math]\displaystyle{ j = 1 }[/math]

  • Model

We assume a prior on our vector of frequencies based on our panel of SNPs [math]\displaystyle{ (M) }[/math] of dimension [math]\displaystyle{ 2mxp }[/math]

[math]\displaystyle{ \vec{f_{i,k}} }[/math] ~ [math]\displaystyle{ MVN(\mu, \Sigma) }[/math]

[math]\displaystyle{ \mu = (1-\theta)f^{panel} + \frac{\theta}{2} 1 }[/math]

[math]\displaystyle{ \Sigma = (1-\theta)^2 S + \frac{\theta}{2}(1 - \frac{\theta}{2})I }[/math]

where [math]\displaystyle{ S_{i,j} = \sum_{i,j}^{panel} }[/math] if i = j or [math]\displaystyle{ e^{-\frac{\rho_{i,j}}{2m} \sum_{i,j}^{panel}} }[/math] if i not equal to j

[math]\displaystyle{ \theta = \frac{(\sum_{i=1}^{2m-1} \frac{1}{i})^{-1}}{2m + (\sum_{i=1}^{2m-1} \frac{1}{i})^{-1}} }[/math]


  • at selected site

[math]\displaystyle{ log \frac{f_{i,k,1}}{1-f_{i,k,1}} = \mu + \beta g_i + \epsilon_{i,k} }[/math]

  • conditional distribution

[math]\displaystyle{ (f_{i,k,2}, .... , f_{i,k,p}) | f_{i,k,1}, M }[/math] ~ [math]\displaystyle{ MVN(\bar{\mu}, \bar{\Sigma}) }[/math] The conditional distribution is easily obtained when we use a result derived here.

let [math]\displaystyle{ X_2 = (f_{i,k,2}, .... , f_{i,k,p}) }[/math] and [math]\displaystyle{ X_1 = f_{i,k,1} }[/math]

[math]\displaystyle{ X_2 | X_1, M }[/math] ~ [math]\displaystyle{ N(\vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}) }[/math]

Thus [math]\displaystyle{ \bar{\mu} = \vec{\mu_2} + \Sigma_{21} \Sigma_{11}^{-1} (x_1 - \mu_1), \bar{\Sigma} = \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12} }[/math]

  • Likelihood for frequency a the test SNP t given all data

let [math]\displaystyle{ f_{obs} = \prod_{j \not= t} f_{i,k,j} }[/math]

[math]\displaystyle{ L(f_{i,k,t}^{true}) = P(f_{obs} | f_{i,k,t}^{true}, M) = \frac{P( f_{i,k,t}^{true} | M, f_{obs}) P(f^{obs}|M)}{P(f_{i,k,t}^{true}) | M)} }[/math]

where