Tregwiki:Finding interspecies-orthologs

These tools simplify the search for all homologous genes across as many species as possible. Apart from Unigene, all of these are the result of massiv protein-to-genome or genome-to-genome BLAST runs:


 * THOR is designed for low-coverage genomes. It will try to infer the order of contigs by aligning them on the human genome. Works of course only species close to human.
 * Inparanoid is a database of orthologous proteins between 26 model organisms and a downloadable PERL script that should run on any machine (beware of the BLAST-version issue, read the readme-file)
 * Compara The part of ensembl that is called Compara could be used for this. Search a gene in Ensembl and it will display all orthologs from all sequenced model species. You can use biomart.org to download the data in compara. As of this writing (01/2008) there is still no paper describing the pipeline of compara and compara still doesn't seem to include synteny. In addition, when using compara, as everything related to Ensembl, make sure that your clearly keep track of the version of Ensembl that was used. Compara data might suddenly change from one version to another and since Ensembl is updated it's complete database every few months (unlike UCSC, which only updates the programs) you might end up a with "version issues" in your database.
 * Homologene 17 main model organisms
 * Unigene not directly usable but can give some hints
 * UCSC Proteome Browser The famous Browser imports data from Unigene and also points to similiarities, but this seems to be deduced from gene names instead of comprehensive searches
 * EGO from tigr is yet another cross-blast-clustering database
 * the Metazome consists of more than 26 genomes which you can search for clusters of proteins, align and display the domains. What makes metazome interesting is its "neighbors"-view: Type in a gene-name and it will immediately display all 5 flanking genes. With this you can easily discover genes located in syntenic regions.
 * Comparative Plant Genomics at Berkeley lets you view two genomes and their conserved elements around homologous genes using short non-coding blast hits
 * Multiparanoid is like Inparanoid, but outputs clusters. Yeah!
 * OrthoMCL uses MCL to build gene clusters and identify orthologs. The basic difference between OrthoMCL and Inparanoid is that Inparanoid gives you only pairwise results and OrthoMCL creates clusters across ALL genomes.
 * [ http://www.ebi.ac.uk/research/cgg/tribe/ TribeMCL is similar to OrthoMCL but older, seems to be used in compara?]
 * A couple of other options include treefam though it is not really meant for this task and includes only selected organisms, TribeMCL (which is well integrated into Bioperl). However, in a recent comparison of various orthology-detection techniques, Inparanoid and OrthoMCL scored best.

For closer model species, you can project features from one genome to the other, based on whole genome (nucleotide-based) alignments:
 * Genmapper seems to be one of the better programs (Projector and Genewise are two other options)
 * Net and chain tracks on UCSC can be used to map from one species to the other, and the UCSC liftOver-tool can be used to do it automatically (download as part of jim kent's source tree)
 * Do not forget Ensembl's multicontigview. There is nothing similar in UCSC. You can display several genomes one atop the other and elements that can be aligned (only blat?) are connected by lines. Try a developmental gene with highly conserved flanking regions and follow the compaction of genomes down to fugu. Impressive visualization.
 * If you need a less cluttered visualization and more sensitivity: try Gata from the Eisen lab. It will run blast on your regions and then visualize the results with a synteny-plot. If your regions is longer, try syn-apollo from my toolbox