Meeting with Jeong Eun Points:

  • Run simulation by sampling from 3 distributions f0 - null, f11 - paralog, ingroup1 replacement, f12 - paralog, ingroup2 replacement
  • Introduce 5% and 25% f1 distribution (paralog) either a) solely from f11 b) solely f12 c) 50% mixture of both
  • Measure q - proportion of f1 density in region used to estimate f0
  • Measure bias - difference between estimated proportion of orthologs and true proportion of orthologs
  • Measure distance between distributions (d=sqrt(mean(f0(x)-f11(x))^2) - parameter.
  • This was done for 2 datasets:
    1. P.syr, P.put, P.aer
    2. P.syr, P.put, E.coli


  1. P.aer has lower bias, but is overestimating ortholog proportion (pos bias)? why is that considered better than E.coli set - which is all negative biased, but to a larger degree
  2. It seems that there is significant bias. Is this consistent for other "reasonable" datasets?
  3. Bias was measured at the cutoff, what is it for ratios above cutoff?