# Difference between revisions of "User:Hussein Alasadi/Notebook/stephens/2013/10/03"

analyzing pooled sequenced data with selection <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

## Notes from Meeting

Consider a single lineage for now.

${\displaystyle X_{j}}$ = frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool)

• Data:

${\displaystyle (n_{j}^{0},n_{j}^{1})}$ = number of "0", "1" alleles at SNP j (${\displaystyle n_{j}=n_{j}^{0}+n_{j}^{1}}$)

• Normal approximation

${\displaystyle n_{j}^{1}}$ ~ ${\displaystyle Bin(n_{j},X_{j})\approx N(n_{j}X_{j},n_{j}X_{j}(1-X_{j}))}$ Normal approximation to binomial

${\displaystyle {\frac {n_{j}^{1}}{n_{j}}}\approx N(X_{j},{\frac {X_{j}(1-X_{j})}{n_{j}}})}$ The variance of this distribution results from error due to binomial sampling.

To simplify, we just plug in ${\displaystyle {\hat {X_{j}}}={\frac {n_{j}^{1}}{n_{j}}}}$ for ${\displaystyle X_{j}}$

${\displaystyle \implies {\frac {n_{j}^{1}}{n_{j}}}|X_{j}\approx N(X_{j},{\frac {{\hat {X_{j}}}(1-{\hat {X_{j}}})}{n_{j}}})}$

• notation

${\displaystyle f_{i,k,j}=}$ frequency of reference allele in group i, replicate and SNP j.

${\displaystyle {\vec {f_{i,k}}}=}$ vector of frequencies

Without loss of generality, we assume that the putative selected site is site ${\displaystyle j=1}$

• Model

We assume a prior on our vector of frequencies based on our panel of SNPs ${\displaystyle (M)}$ of dimension ${\displaystyle 2mxp}$

${\displaystyle {\vec {f_{i,k}}}}$ ~ ${\displaystyle MVN(\mu ,\sum )}$

${\displaystyle \mu =(1-\theta )f^{panel}+{\frac {\theta }{2}}1}$

${\displaystyle \sum =(1-\theta )^{2}S+{\frac {\theta }{2}}(1-{\frac {\theta }{2}})I}$

where ${\displaystyle S_{i,j}=\sum _{i,j}^{panel}}$ if i = j or ${\displaystyle e^{-{\frac {\rho _{i,j}}{2m}}\sum _{i,j}^{panel}}}$ if i not equal to j

${\displaystyle \theta ={\frac {(\sum _{i=1}^{2m-1}{\frac {1}{i}})^{-1}}{2m+(\sum _{i=1}^{2m-1}{\frac {1}{i}})^{-1}}}}$

• at selected site

${\displaystyle log{\frac {f_{i,k,1}}{1-f_{i,k,1}}}=\mu +\beta g_{i}+\epsilon _{i,k}}$

• conditional distribution

${\displaystyle (f_{i,k,2},....,f_{i,k,p})|f_{i,k,1},M}$ ~ ${\displaystyle MVN({\bar {\mu }},{\bar {\Sigma }})}$ The conditional distribution is easily obtained when we use a result derived here.

let ${\displaystyle X_{2}=(f_{i,k,2},....,f_{i,k,p})}$ and ${\displaystyle X_{1}=f_{i,k,1}}$

${\displaystyle X_{2}|X_{1},M}$ ~ ${\displaystyle N({\vec {\mu _{2}}}+\Sigma _{21}\Sigma _{11}^{-1}(x_{1}-\mu _{1}),\Sigma _{22}-\Sigma _{21}\Sigma _{11}^{-1}\Sigma _{12})}$

Thus ${\displaystyle {\bar {\mu }}=vec{\mu _{2}}+\Sigma _{21}\Sigma _{11}^{-1}(x_{1}-\mu _{1}),{\bar {\Sigma }}=\Sigma _{22}-\Sigma _{21}\Sigma _{11}^{-1}\Sigma _{12}}$