Looked at Thea's approach. Solid ideas. Here are my comments:
- Pairwise focused. InParanoids strategy is to be permissive intially, create clusters, and then divide, prune clusters later. Sometimes this is better than making decisions as you parse your top BLAST hits. E.g. you have close call ortholog, choose one, but then that ortholog is not ortholog in other species the other one was.
- Better strategy might be to build a linkage graph, look for ties. this might be more flexible with adding species.
- BLAST alignment rules? 50% coverage? Consider Bit Score a tie breaker?
- Incorrect Inparalog definition. Inparalogs don't have to be reciprocal - 1 gene can have multiple inparalogs they all don't have to be the genes top hit.
- Inparanoid definition, permissive, probably intended as a first step. Need to resolve ownership after (i.e. overlap between inparalogs)
- Like the synteny approach
- Adding genes that are only RBBH to one member of group may be questionable. Should consider some kind of cluster trimming.
- Very aware of RBBH conflicts that arise.
- Ties only allowed between pairwise genes?? (see step B.c.ii in Thea's document)