Task 1.2 Emailed group asking some questions about validation results:

Question 1: From the validation, you came to the conclusion that dataset 7 (P.syringae, P.putida and outgroup P.aeruginosa) performed better than dataset 8 (P.syringae, P.putida, outgroup E.coli). When I looked closer at bias values for dataset 7 vs. dataset 8, in most cases the bias was smaller in magnitude for dataset 7 (although some biases were positive). How did dataset 8 outperform dataset 7? Did you focus on some other indicator besides bias?

(the plots are attached)

Question 2: Bias is the difference between estimated proportion of orthologs and true proportion of orthologs at the cutoff bin. What was the cutoff you used? What would the bias be like for other bins? I am curious, because one of the things i do in ortholuge is assign each ortholog pair above the cutoff a probability(non-ssd) value equivalent to the estimated proportion of non-ssds in the bin.

3) This is not really a question, just a comment. Its seems from the bias values that the method is conservative (estimating fewer orthologs). One thing i would like to find out is if the bias is somewhat consistent for other real datasets. I know dataset 7 was inappropriate, but i would like to see what the bias is like for other valid datasets. I am going to work on generating some more datasets and then i will get the scripts from Jeong Eun and run them to see how these other datasets do.