Meeting with Jeong Eun
- Run simulation by sampling from 3 distributions f0 - null, f11 - paralog, ingroup1 replacement, f12 - paralog, ingroup2 replacement
- Introduce 5% and 25% f1 distribution (paralog) either a) solely from f11 b) solely f12 c) 50% mixture of both
- Measure q - proportion of f1 density in region used to estimate f0
- Measure bias - difference between estimated proportion of orthologs and true proportion of orthologs
- Measure distance between distributions (d=sqrt(mean(f0(x)-f11(x))^2) - parameter.
- This was done for 2 datasets:
- P.syr, P.put, P.aer
- P.syr, P.put, E.coli
- P.aer has lower bias, but is overestimating ortholog proportion (pos bias)? why is that considered better than E.coli set - which is all negative biased, but to a larger degree
- It seems that there is significant bias. Is this consistent for other "reasonable" datasets?
- Bias was measured at the cutoff, what is it for ratios above cutoff?