Wikiomics:Ensembl local install draft

From OpenWetWare
Jump to navigationJump to search

Local Ensembl install

There are several possible main configurations for installing Ensembl locally with increasing level of complexity.

Running virtual image from Eagle Genomics

The current versions (last: v59, August 2010) of EagleBrowser are here:

It is an Ubuntu Karmic Koala system image to be run inside VMware Virtual Machine. You need to sign up and download VMware Workstation (from plus have linux-headers-$version installed. EagleBrowser connects to public Ensembl MySQL database, but stores local user data locally in MySQL ensembl_web_user_db.

Pros: simplest to install, gives a chance to look at a working ENSEMBL setup. One can use ssh (user: ensembl, passwd: ensembl to connect to it from the host machine.

After any host kernel update VMware will need some time to reconfigure itself.

remote MySQL Ensembl DB + local SQLite for ensembl_web_user_db (defunct)

So far not successfully tested. Potentially useful for testing all the components required by Ensembl except connection to local MySQL. There is a special Ensembl plugin (./public-plugins/sqlite/) but no information how to make it work.

CAVEAT: According to ENSEMBL news group this is dead route, as there is a lot of MySQL specific code not easily portable to SQLite.

remote MySQL Ensembl DB + local MySQL for ensembl_web_user_db

Tested on Ubuntu 9.10 /virtual Debian 5.04 32bit. Useful for testing all the components required by Ensembl plus finding discrepancies between local and remote MySQL DB.

local MySQL Ensembl DB all the way for mirroring main Ensembl site

Probably the most common setup allowing a possible speedup of Ensembl connections

local MySQL Ensembl DB all the way for custom species

Default setup for groups annotating novel genomes.

any of the above (except EagleBrowser) inside a virtual machine

Greater flexibility, clean de novo installations of an OS, easy migration between machines. Cons: some CPU/networking(?) overhead, greater overal complexity.


Save for EagleBrowser which comes as a setup system, all other kind of installations require multiple programs / perl modules / at least some configuration. Following procedures were executed so far on three different systems (all versions for workstation):

  • Fedora 8 64bit (workstation)
  • Ubuntu 9.10 64bit
  • Debian 5.04 32bit run inside VirtualBox 3.1
  • Debian 5.04 64bit run inside VirtualBox 3.1

Some perl modules versions have changed since first installation.

Debian 5.04 installs seemed to be the easiest, but this may be simply due to the fact that I have already documented Fedora/Ubuntu installs. Also Debians were an installation on otherwise unspoiled systems.

Folowing precedures are records for "Debian 5.04 64bit run inside VirtualBox 3.1".


  • Fedora 8 64bit as host
  • VirtualBox 3.1.8
  • Debian 5.04 64bit with LXDE:

I installed only Debian inside VirtualBox. For 64bit or multiple cores&procesor use you have to switch VT-x in: VirtualBox > Debian_5.04_64bit > System > Acceleration

On my system VT-x was not set up by default, so I had to enable it (reboot, enter Setup, etc.).

Debian packages

Divided into groups for clarity. Few of these may not be needed, but this was untested.

apt-get install ssh bzip2 libbz2-dev unzip 

apt-get install gcc g++ make

apt-get install  cvs subversion git-core 

apt-get install expat libxmltok1 libxmltok1-dev zlib1g-dev

apt-get install mysql-server libmysqlclient15-dev 
#installs by default also libnet-daemon-perl libdbi-perl  libdbd-mysql-perl libhtml-template-perl 

apt-get install libgd2-xpm fontconfig libgd-tools
#libgd2-xpm-dev ??

apt-get instal memcached

We will need Microsoft fonts down the line. On Debian edit: /etc/apt/sources.list Change:

deb lenny main 
deb-src lenny main 


deb lenny main non-free contrib
deb-src lenny main non-free contrib


apt-get install ttf-mscorefonts-installer
locate arial.ttf

Perl got perl-5.12.1

tar xfvz perl-5.12.1.tar.gz
cd perl-5.12.1/
CFLAGS='-m64 -mtune=nocona'  ./Configure -des -A ccflags=-fPIC -Dprefix=/home/ensembl/local/ -Dusethreads
make test
make install

The "CFLAGS" line is required on 64-bit Linux system to compile mod_perl. For the 32bit Debian

./Configure -Dprefix=/home/ensembl/local/ 

was enough.

Apache httpd got httpd-2.2.15.tar.bz2


tar xfvj httpd-2.2.15.tar.bz2
cd httpd-2.2.15/

./configure  --enable-deflate --enable-headers --enable-expires --prefix=/home/ensembl/local/apache2
make install

checking what is build in *specified modules(:

/home/ensembl/local/apache2/bin/apachectl -t -D DUMP_MODULES | grep deflate
/home/ensembl/local/apache2/bin/apachectl -t -D DUMP_MODULES | grep expires
/home/ensembl/local/apache2/bin/apachectl -t -D DUMP_MODULES | grep headers

mod_per 4 apache2.x got mod_perl-2.0.4

tar xfvz  mod_perl-2.0-current.tar.gz

export PATH=/home/ensembl/local/bin/:$PATH

cd mod_perl-2.0.4
perl Makefile.PL MP_APXS=/home/ensembl/local/apache2/bin/apxs 
make test
make install

Perl modules required by ENSEMBL

Assumes that you installed Perl in /home/ensembl/local/ and got perl binary in /home/ensembl/local/bin/

Check the list of modules here i.e:

There are several versions of this list of modules, but ultimately you may be missing several not listed modules, and you will get their names (one by one) after trying to start your ENSEMBL site.

Despite the advise of installing always the newest module versions there is one important exception: LWP. LWP version 5.812 is required by latest (2.57) ParallelUserAgent. This will be covered in a separate section of this page.

Also some modules do not install (at least on my machine) from perls CPAN shell. These may require installation by hand from sources (described later).

CPAN Shell

Easy things first *these should install automaticaly i.e. from a script) :

export PATH=/home/ensembl/local/bin/:$PATH
which perl
      # ~/local/bin/perl
perl -MCPAN -e shell

install Cache::Memcached 
        # Cache-Memcached-1.28.tar.gz
install CGI
install CGI::Ajax
        # CGI-Ajax-0.707.tar.gz
install CGI::Session
        # CGI-Session-4.42.tar.gz

install Class::Accessor
        # Class::Accessor is up to date (0.34). // checked after installation of all modules
install Class::Data::Inheritable 
        # Class::Data::Inheritable is up to date (0.08). // checked after inst. of all modules
install Class::Std 
	# Class-Std-0.011.tar.gz
install Class::Std::Utils 
        # Class-Std-Utils-v0.0.3.tar.gz

install Compress::Zlib
	# Compress::Zlib is up to date (2.027).
install Compress::Raw::Zlib
        # Compress-Raw-Zlib-2.027.tar.gz
install Compress::Bzip2
	# Compress-Bzip2-2.09.tar.gz

install Devel::StackTrace 
        # Devel-StackTrace-1.22.tar.gz
install Data::UUID
        Update 2010-05-28: Data-UUID-1.215.tar.gz
install Digest::MD5
	#Digest::MD5 is up to date (2.39).
install Exception::Class 
        # Exception-Class-1.30.tar.gz

install File::Temp 
	# File::Temp is up to date (0.22)

install Hash::Merge
	# Hash-Merge-0.12.tar.gz

install Storable
	#Storable is up to date (2.22).

install	PDF::API2
	# PDF-API2-0.73.tar.gz
install Spreadsheet::WriteExcel	
	# Spreadsheet-WriteExcel-2.37.tar.gz
install OLE::Storage_Lite
	# OLE::Storage_Lite is up to date (0.19)

install	Mail::Mailer
	# MailTools-2.06.tar.gz
install Math::Bezier
	# Math-Bezier-0.01.tar.gz
install IO::String
	# IO-String-1.08.tar.gz
install Image::Size
	# Image-Size-3.221.tar.gz

install List::MoreUtils

install Number::Format
        # Number-Format-1.73.tar.gz
install Time::HiRes
        # Time-HiRes-1.9721.tar.gz

install BSD::Resource  
	# BSD-Resource-1.2904.tar.gz
install Sys::Hostname::Long
	# Sys-Hostname-Long-1.4.tar.gz

install MIME::Types 
install IPC::Run
install RTF::Writer
       # RTF-Writer-1.11.tar.gz

database work

create database test_beta_core_2_57_1;

mysql  test_beta_core_2_57_1  <   ENSEMBL_plant_dbs/

cd /home/ensembl/local/ensembl/ensembl/sql
for file in  patch_56_57_?.sql; do  mysql test_beta_core_2_57_1  < $file; done

INSERT INTO test_beta_core_1_57_1.meta SELECT * FROM beta_fake_core_1_57_1.meta ;
#in case you have patch entries:
INSERT INTO test_beta_core_2_57_1.meta SELECT * FROM test_beta_core_1_57_1.meta WHERE species_id=1;

export PATH=/home/ensembl/local/bin/:$PATH

export PERL5LIB=/home/ensembl/local/ensembl/bioperl-live/:/home/ensembl/local/ensembl/ensembl/modules:$PERL5LIB

perl ensembl-pipeline/scripts/ \
-dbhost localhost  -dbuser ensembl -dbpass secret1 -dbname test_alpha_core_1_57_1 \
-coord_system_name scaffold -rank 4 -sequence_level -coord_system_version BV02  \
-fasta_file 454Scaffolds.fa

UPDATE test_beta_core_1_57_1.coord_system SET version='BV02', attrib='default_version,sequence_level';

#loading genes from GFF files:
 perl -e ./ensembl.registry.test_beta -s Test -l non_zeros  
 perl -e ./ensembl.registry.test_beta -s Test -l zeros 

#fixing the analysis_description table:

INSERT INTO test_beta_core_1_57_1.analysis_description select * from  analysis_description;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

#check the analysis_id in  test_beta_core_1_57_1.analysis
#UPDATE if necessary!
#I had to insert the values twice and change the analysis_id to "40" and "41"