User:Hussein Alasadi/Notebook/stephens/2013/10/03

From OpenWetWare
< User:Hussein Alasadi‎ | Notebook‎ | stephens‎ | 2013‎ | 10
Revision as of 21:02, 16 October 2013 by Hussein Alasadi (talk | contribs) (Notes from Meeting)
Jump to: navigation, search
Owwnotebook icon.png analyzing pooled sequenced data with selection <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Notes from Meeting

Consider a single lineage for now.

= frequency of "1" allele at SNP j in the pool (i.e. the true frequency of the 1 allele in the pool)

  • Data:

= number of "0", "1" alleles at SNP j ()

  • Normal approximation

~ Normal approximation to binomial

The variance of this distribution results from error due to binomial sampling.

To simplify, we just plug in for

  • notation

frequency of reference allele in group i, replicate and SNP j.

vector of frequencies

Without loss of generality, we assume that the putative selected site is site

  • Model

We assume a prior on our vector of frequencies based on our panel of SNPs of dimension


where if i = j or if i not equal to j

  • at selected site

  • conditional distribution

~ The conditional distribution is easily obtained when we use a result derived here.

let and



And equivalently we could derive the distribution

  • Likelihood for frequency a the test SNP t given all data


Confused here, can we just use the expression derived above for . Also, isn't ~ and ~ . But, how do we then incorporate into the likelihood calculation?

But maybe we want to incorporate dispersion and measurement error parameters

Then: ~ The parameter allows for over-dispersion

~ where allows for measurement error.

and I don't understand . Shouldn't it come from (2.12) and not (2.13) - ask Matthew