User:Hussein Alasadi/Notebook/stephens/2013/10/13
From OpenWetWare
analyzing pooled sequenced data with selection | Main project page Previous entry Next entry |
Intro to Wen & Stephens in 2DSuppose we have only summary-level data for haplotypes h_{1},h_{2},...,h_{2n}. Specifically let the summary-level data be denoted by . We assume in this two locus model, the first locus is typed and the second locus is untyped. We hope to predict what the allele frequency is at the untyped SNP using information from panel data (perhaps this can be interpreted as our prior). Formally, let y_{1} denote the allele frequency at the typed SNP and y_{2} the allele frequency at the untyped SNP. We assume that h_{1},h_{2},...,h_{2n} are independent and identically distributed from P(M) (our prior).
(assign as the vector of untyped SNPs) is a function of both the panel data (μ_{2}) and the typed SNPs (y_{1}). Li & Stephens in 2DWe describe the Li & Stephens haplotype copying model: Let h_{1},h_{2},...,h_{k} denote the k sampled haplotypes at 2 loci. Thus there are 4 possible haplotypes. The first haplotype is randomly chosen with equal probability from the four possible haplotypes. Consider now the conditional distribution of h_{k + 1} given h_{1},h_{2},...,h_{k}. Recall the intuition is that h_{k + 1} is a mosaic of h_{1},h_{2},..,h_{k}. Let X_{j} denote which hapolotype h_{k + 1} copies at site j (so ). We model X_{j} as a markov chain on 1,..,k with . The transition probabilities are: if x' = x and otherwise. ρ_{j} and d_{j} denote recombination and physical distances, respectively. Now in a hidden markov model, there is also the transmission process. To mimic the effects of mutation, the copying process may be imperfect. if h_{x,j} = a and otherwise. , where the motivation is the more haplotypes the less frequent mutation occurs. |