Task1
Meeting with Jeong Eun
Points:
 Run simulation by sampling from 3 distributions f0  null, f11  paralog, ingroup1 replacement, f12  paralog, ingroup2 replacement
 Introduce 5% and 25% f1 distribution (paralog) either a) solely from f11 b) solely f12 c) 50% mixture of both
 Measure q  proportion of f1 density in region used to estimate f0
 Measure bias  difference between estimated proportion of orthologs and true proportion of orthologs
 Measure distance between distributions (d=sqrt(mean(f0(x)f11(x))^2)  parameter.
 This was done for 2 datasets:
 P.syr, P.put, P.aer
 P.syr, P.put, E.coli
Questions:
 P.aer has lower bias, but is overestimating ortholog proportion (pos bias)? why is that considered better than E.coli set  which is all negative biased, but to a larger degree
 It seems that there is significant bias. Is this consistent for other "reasonable" datasets?
 Bias was measured at the cutoff, what is it for ratios above cutoff?
