Wilke:Accessing Eukaryote genome alignments

Ensembl (www.ensembl.org) is an amazing resource; they have over 30 Eukaryote genomes that are heavily annotated and aligned. You can query their public database using a tool in pycogent (http://pycogent.sourceforge.net/). Much of this example follows the documentation they provide at http://pycogent.sourceforge.net/examples/query_ensembl.html. They also go into detail about things that I'm not interested in, but may be useful for your research, so it's probably a good idea to check it out.

I should probably compare and contrast this ensembl interface with the ucsc interface available in pygr. It seemed to be updated somewhat less, and the documentation was somewhat unclear. It might be a good alternative to the ensembl database (downloading would certainly be much faster, because ensembl is based in Europe).

Pycogent allows you in a few lines of code do some very powerful searches and collections. The only downside is that it is a very huge and complex python package with lots of non-standard (ie probably won't be installed on your system by default) dependencies. If I have time and remember, I'll list the things I had to do in order to get it working on by system. Another downside is that while the overall scope is similar to that of Biopython (www.biopython.org ; another valuable bioinformatics research tool). The API approach for using the writing function names are radically different, causing a headache when trying to switch back and forth between the two systems. I also find the function names to be excessively long, but at least they are explicit.

Anyway, let's assume you have gotten pycogent installed. You can now do lots of cool things. For instance, the following code will grab all known orthologs (ie, the gene that shares a common ancestor with the gene you give it) of the human gene BRCA2 (I'm riffing off the pycogent tutorial, so I don't remember what this gene does or why it's interesting).

FINISH CODE