Notes from Meeting
Consider a single lineage for now.
$X_{j}$ = frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool)
$(n_{j}^{0},n_{j}^{1})$ = number of "0", "1" alleles at SNP j ($n_{j}=n_{j}^{0}+n_{j}^{1}$)
$n_{j}^{1}$ ~ $Bin(n_{j},X_{j})\approx N(n_{j}X_{j},n_{j}X_{j}(1X_{j}))$ Normal approximation to binomial
${\frac {n_{j}^{1}}{n_{j}}}\approx N(X_{j},{\frac {X_{j}(1X_{j})}{n_{j}}})$
The variance of this distribution results from error due to binomial sampling.
To simplify, we just plug in ${\hat {X_{j}}}={\frac {n_{j}^{1}}{n_{j}}}$ for $X_{j}$
$\implies {\frac {n_{j}^{1}}{n_{j}}}X_{j}\approx N(X_{j},{\frac {{\hat {X_{j}}}(1{\hat {X_{j}}})}{n_{j}}})$
$f_{i,k,j}=$ frequency of reference allele in group i, replicate and SNP j.
${\vec {f_{i,k}}}=$ vector of frequencies
Without loss of generality, we assume that the putative selected site is site $j=1$
We assume a prior on our vector of frequencies based on our panel of SNPs $(M)$ of dimension $2mxp$
${\vec {f_{i,k}}}$ ~ $MVN(\mu ,\Sigma )$
$\mu =(1\theta )f^{panel}+{\frac {\theta }{2}}1$
$\Sigma =(1\theta )^{2}S+{\frac {\theta }{2}}(1{\frac {\theta }{2}})I$
where $S_{i,j}=\sum _{i,j}^{panel}$ if i = j or $e^{{\frac {\rho _{i,j}}{2m}}\sum _{i,j}^{panel}}$ if i not equal to j
$\theta ={\frac {(\sum _{i=1}^{2m1}{\frac {1}{i}})^{1}}{2m+(\sum _{i=1}^{2m1}{\frac {1}{i}})^{1}}}$
$log{\frac {f_{i,k,1}}{1f_{i,k,1}}}=\mu +\beta g_{i}+\epsilon _{i,k}$
$(f_{i,k,2},....,f_{i,k,p})f_{i,k,1},M$ ~ $MVN({\bar {\mu }},{\bar {\Sigma }})$
The conditional distribution is easily obtained when we use a result derived here.
let $X_{2}=(f_{i,k,2},....,f_{i,k,p})$ and $X_{1}=f_{i,k,1}$
$X_{2}X_{1},M$ ~ $N({\vec {\mu _{2}}}+\Sigma _{21}\Sigma _{11}^{1}(x_{1}\mu _{1}),\Sigma _{22}\Sigma _{21}\Sigma _{11}^{1}\Sigma _{12})$
Thus ${\bar {\mu }}={\vec {\mu _{2}}}+\Sigma _{21}\Sigma _{11}^{1}(x_{1}\mu _{1}),{\bar {\Sigma }}=\Sigma _{22}\Sigma _{21}\Sigma _{11}^{1}\Sigma _{12}$
And equivalently we could derive the distribution $X_{1}X_{2},M$ $(f_{i,k,1}f_{i,k,2},....,f_{i,k,p},M)$
 Likelihood for frequency a the test SNP t given all data
let $f_{obs}=\prod _{j\not =t}f_{i,k,j}$
$L(f_{i,k,t}^{true})=P(f_{obs}f_{i,k,t}^{true},M)={\frac {P(f_{i,k,t}^{true}M,f_{obs})P(f^{obs}M)}{P(f_{i,k,t}^{true}M)}}$
Confused here, can we just use the expression derived above for $P(f_{i,k,t}^{true}M,f_{obs})$. Also, isn't $f_{i,k,t}^{true}M$ ~
$N(\mu _{1},\Sigma _{11})$ and $f^{obs}M$ ~ $N(\mu _{2},\Sigma _{22})$. But, how do we then incorporate $\beta$ into the likelihood calculation?
But maybe we want to incorporate dispersion and measurement error parameters
Then:
$f_{i,k,t}^{true}M$ ~ $N(\mu ,\sigma ^{2}\Sigma )$ The parameter $\sigma ^{2}$ allows for overdispersion
$f^{obs}M$ ~ $N_{p1}(\mu _{2},\sigma ^{2}\Sigma _{22}+\epsilon ^{2}I)$ where $\epsilon ^{2}$ allows for measurement error.
and I don't understand $f_{obs}f_{i,k,t}^{true},M$. Shouldn't it come from (2.12) and not (2.13)  ask Matthew
