|analyzing pooled sequenced data with selection|| Main project page|
Notes from Meeting
Consider a single lineage for now.
Xj = frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool)
= number of "0", "1" alleles at SNP j ()
~ Normal approximation to binomial
The variance of this distribution results from error due to binomial sampling.
To simplify, we just plug in for Xj
fi,k,j = frequency of reference allele in group i, replicate and SNP j.
vector of frequencies
Without loss of generality, we assume that the putative selected site is site j = 1
We assume a prior on our vector of frequencies based on our panel of SNPs (M) of dimension 2mxp
where if i = j or if i not equal to j
(fi,k,2,....,fi,k,p) | fi,k,1,M ~ The conditional distribution is easily obtained when we use a result derived here.
let X2 = (fi,k,2,....,fi,k,p) and X1 = fi,k,1
X2 | X1,M ~
And equivalently we could derive the distribution X1 | X2,M which is again fi,k,1 | fi,k,2,....,fi,k,p),M
Confused here, can we just use the expression derived above for . Also, isn't ~ N(μ1,Σ11) and fobs | M ~ N(μ2,Σ22). But, how do we then incorporate β into the likelihood calculation?
Then: ~ N(μ,σ2Σ) The parameter σ2 allows for over-dispersion
fobs | M ~ Np − 1(μ2,σ2Σ22 + ε2I) where ε2 allows for measurement error.
and I don't understand . Shouldn't it come from (2.12) and not (2.13) - ask Matthew