User:Matthew Whiteside/Notebook/Ortholuge Development/2009/01/21

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Project name
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Task1
Meeting with Jeong Eun Points: Questions:
 * Run simulation by sampling from 3 distributions f0 - null, f11 - paralog, ingroup1 replacement, f12 - paralog, ingroup2 replacement
 * Introduce 5% and 25% f1 distribution (paralog) either a) solely from f11 b) solely f12 c) 50% mixture of both
 * Measure q - proportion of f1 density in region used to estimate f0
 * Measure bias - difference between estimated proportion of orthologs and true proportion of orthologs
 * Measure distance between distributions (d=sqrt(mean(f0(x)-f11(x))^2) - parameter.
 * This was done for 2 datasets:
 * P.syr, P.put, P.aer
 * P.syr, P.put, E.coli
 * 1) P.aer has lower bias, but is overestimating ortholog proportion (pos bias)? why is that considered better than E.coli set - which is all negative biased, but to a larger degree
 * 2) It seems that there is significant bias. Is this consistent for other "reasonable" datasets?
 * 3) Bias was measured at the cutoff, what is it for ratios above cutoff?