Wikiomics:Ensembl local install draft

=Local Ensembl install=

There are several possible main configurations for installing Ensembl locally with increasing level of complexity.

Running virtual image from Eagle Genomics
The current versions (last: v59, August 2010) of EagleBrowser are here: http://www.eaglegenomics.com/downloads/eaglebrowser/

It is an Ubuntu Karmic Koala system image to be run inside VMware Virtual Machine. You need to sign up and download VMware Workstation (from http://www.vmware.com) plus have linux-headers-$version installed. EagleBrowser connects to public Ensembl MySQL database, but stores local user data locally in MySQL ensembl_web_user_db.

Pros: simplest to install, gives a chance to look at a working ENSEMBL setup. One can use ssh (user: ensembl, passwd: ensembl to connect to it from the host machine.

After any host kernel update VMware will need some time to reconfigure itself.

remote MySQL Ensembl DB + local SQLite for ensembl_web_user_db (defunct)
So far not successfully tested. Potentially useful for testing all the components required by Ensembl except connection to local MySQL. There is a special Ensembl plugin (./public-plugins/sqlite/) but no information how to make it work.

CAVEAT: According to ENSEMBL news group this is dead route, as there is a lot of MySQL specific code not easily portable to SQLite.

remote MySQL Ensembl DB + local MySQL for ensembl_web_user_db
Tested on Ubuntu 9.10 /virtual Debian 5.04 32bit. Useful for testing all the components required by Ensembl plus finding discrepancies between local and remote MySQL DB.

local MySQL Ensembl DB all the way for mirroring main Ensembl site
Probably the most common setup allowing a possible speedup of Ensembl connections

local MySQL Ensembl DB all the way for custom species
Default setup for groups annotating novel genomes.

any of the above (except EagleBrowser) inside a virtual machine
Greater flexibility, clean de novo installations of an OS, easy migration between machines. Cons: some CPU/networking(?) overhead, greater overal complexity.

=Installation= Save for EagleBrowser which comes as a setup system, all other kind of installations require multiple programs / perl modules / at least some configuration. Following procedures were executed so far on three different systems (all versions for workstation):
 * Fedora 8 64bit (workstation)
 * Ubuntu 9.10 64bit
 * Debian 5.04 32bit run inside VirtualBox 3.1
 * Debian 5.04 64bit run inside VirtualBox 3.1

Some perl modules versions have changed since first installation.

Debian 5.04 installs seemed to be the easiest, but this may be simply due to the fact that I have already documented Fedora/Ubuntu installs. Also Debians were an installation on otherwise unspoiled systems.

Folowing precedures are records for "Debian 5.04 64bit run inside VirtualBox 3.1".

System(s)
http://cdimage.debian.org/debian-cd/5.0.4/amd64/iso-cd/debian-504-amd64-xfce+lxde-CD-1.iso
 * Fedora 8 64bit as host
 * VirtualBox 3.1.8
 * Debian 5.04 64bit with LXDE:

I installed only Debian inside VirtualBox. For 64bit or multiple cores&procesor use you have to switch VT-x in: VirtualBox > Debian_5.04_64bit > System > Acceleration

On my system VT-x was not set up by default, so I had to enable it (reboot, enter Setup, etc.).

Debian packages
Divided into groups for clarity. Few of these may not be needed, but this was untested.

apt-get install ssh bzip2 libbz2-dev unzip

apt-get install gcc g++ make

apt-get install cvs subversion git-core

apt-get install expat libxmltok1 libxmltok1-dev zlib1g-dev

apt-get install mysql-server libmysqlclient15-dev
 * 1) installs by default also libnet-daemon-perl libdbi-perl libdbd-mysql-perl libhtml-template-perl

apt-get install libgd2-xpm fontconfig libgd-tools
 * 1) libgd2-xpm-dev ??

apt-get instal memcached
 * 1) optional

We will need Microsoft fonts down the line. On Debian edit: /etc/apt/sources.list Change:

deb http://ftp.es.debian.org/debian/ lenny main deb-src http://ftp.es.debian.org/debian/ lenny main

to:

deb http://ftp.es.debian.org/debian/ lenny main non-free contrib deb-src http://ftp.es.debian.org/debian/ lenny main non-free contrib

run: apt-get install ttf-mscorefonts-installer updatedb locate arial.ttf
 * 1) /usr/share/fonts/truetype/msttcorefonts/arial.ttf

Perl
http://www.perl.org/get.html got perl-5.12.1

wget http://www.cpan.org/src/5.0/perl-5.12.1.tar.gz tar xfvz perl-5.12.1.tar.gz cd perl-5.12.1/ CFLAGS='-m64 -mtune=nocona' ./Configure -des -A ccflags=-fPIC -Dprefix=/home/ensembl/local/ -Dusethreads make make test make install

The "CFLAGS" line is required on 64-bit Linux system to compile mod_perl. For the 32bit Debian ./Configure -Dprefix=/home/ensembl/local/

was enough.

Apache httpd
http://httpd.apache.org/download.cgi got httpd-2.2.15.tar.bz2

installation: wget http://apache.securedservers.com/httpd/httpd-2.2.15.tar.bz2 tar xfvj httpd-2.2.15.tar.bz2 cd httpd-2.2.15/

./configure --enable-deflate --enable-headers --enable-expires --prefix=/home/ensembl/local/apache2 make make install

checking what is build in *specified modules(: /home/ensembl/local/apache2/bin/apachectl -t -D DUMP_MODULES | grep deflate /home/ensembl/local/apache2/bin/apachectl -t -D DUMP_MODULES | grep expires /home/ensembl/local/apache2/bin/apachectl -t -D DUMP_MODULES | grep headers

mod_per 4 apache2.x
http://perl.apache.org/download/index.html got mod_perl-2.0.4

wget http://perl.apache.org/dist/mod_perl-2.0-current.tar.gz tar xfvz mod_perl-2.0-current.tar.gz

export PATH=/home/ensembl/local/bin/:$PATH

cd mod_perl-2.0.4 perl Makefile.PL MP_APXS=/home/ensembl/local/apache2/bin/apxs make make test make install

Perl modules required by ENSEMBL
Assumes that you installed Perl in /home/ensembl/local/ and got perl binary in /home/ensembl/local/bin/

Check the list of modules here i.e: http://browser.1000genomes.org/info/docs/webcode/install/non-ensembl-code.html

There are several versions of this list of modules, but ultimately you may be missing several not listed modules, and you will get their names (one by one) after trying to start your ENSEMBL site.

Despite the advise of installing always the newest module versions there is one important exception: LWP. LWP version 5.812 is required by latest (2.57) ParallelUserAgent. This will be covered in a separate section of this page.

Also some modules do not install (at least on my machine) from perls CPAN shell. These may require installation by hand from sources (described later).

CPAN Shell
Easy things first *these should install automaticaly i.e. from a script) :

export PATH=/home/ensembl/local/bin/:$PATH which perl # ~/local/bin/perl perl -MCPAN -e shell

install Cache::Memcached # Cache-Memcached-1.28.tar.gz install CGI # CGI.pm-3.49.tar.gz install CGI::Ajax # CGI-Ajax-0.707.tar.gz install CGI::Session # CGI-Session-4.42.tar.gz

install Class::Accessor # Class::Accessor is up to date (0.34). // checked after installation of all modules install Class::Data::Inheritable # Class::Data::Inheritable is up to date (0.08). // checked after inst. of all modules install Class::Std # Class-Std-0.011.tar.gz install Class::Std::Utils # Class-Std-Utils-v0.0.3.tar.gz

install Compress::Zlib # Compress::Zlib is up to date (2.027). install Compress::Raw::Zlib # Compress-Raw-Zlib-2.027.tar.gz install Compress::Bzip2 # Compress-Bzip2-2.09.tar.gz

install Devel::StackTrace # Devel-StackTrace-1.22.tar.gz install Data::UUID #Data-UUID-1.203.tar.gz       Update 2010-05-28: Data-UUID-1.215.tar.gz install Digest::MD5 #Digest::MD5 is up to date (2.39). install Exception::Class # Exception-Class-1.30.tar.gz

install File::Temp # File::Temp is up to date (0.22)

install Hash::Merge # Hash-Merge-0.12.tar.gz

install Storable #Storable is up to date (2.22).

install	PDF::API2 # PDF-API2-0.73.tar.gz install Spreadsheet::WriteExcel # Spreadsheet-WriteExcel-2.37.tar.gz install OLE::Storage_Lite # OLE::Storage_Lite is up to date (0.19)

install	Mail::Mailer # MailTools-2.06.tar.gz install Math::Bezier # Math-Bezier-0.01.tar.gz install IO::String # IO-String-1.08.tar.gz install Image::Size # Image-Size-3.221.tar.gz

install List::MoreUtils #List-MoreUtils-0.22.tar.gz

install Number::Format # Number-Format-1.73.tar.gz install Time::HiRes # Time-HiRes-1.9721.tar.gz

install BSD::Resource # BSD-Resource-1.2904.tar.gz install Sys::Hostname::Long # Sys-Hostname-Long-1.4.tar.gz

install MIME::Types #MIME-Types-1.29.tar.gz install IPC::Run #/IPC-Run-0.89.tar.gz install RTF::Writer # RTF-Writer-1.11.tar.gz

=database work= mysql create database test_beta_core_2_57_1; quit

mysql test_beta_core_2_57_1  <   ENSEMBL_plant_dbs/ftp.ensemblgenomes.org/pub/plants/release-4/mysql/populus_trichocarpa_core_4_56_11/populus_trichocarpa_core_4_56_11.sql

cd /home/ensembl/local/ensembl/ensembl/sql for file in patch_56_57_?.sql; do  mysql test_beta_core_2_57_1  < $file; done

INSERT INTO test_beta_core_1_57_1.meta SELECT * FROM beta_fake_core_1_57_1.meta ; INSERT INTO test_beta_core_2_57_1.meta SELECT * FROM test_beta_core_1_57_1.meta WHERE species_id=1;
 * 1) in case you have patch entries:

export PATH=/home/ensembl/local/bin/:$PATH

export PERL5LIB=/home/ensembl/local/ensembl/bioperl-live/:/home/ensembl/local/ensembl/ensembl/modules:$PERL5LIB

perl ensembl-pipeline/scripts/load_seq_region.pl \ -dbhost localhost -dbuser ensembl -dbpass secret1 -dbname test_alpha_core_1_57_1 \ -coord_system_name scaffold -rank 4 -sequence_level -coord_system_version BV02 \ -fasta_file 454Scaffolds.fa

UPDATE test_beta_core_1_57_1.coord_system SET version='BV02', attrib='default_version,sequence_level';

perl load_genes_from_jgi_gff3_02.pl -e ./ensembl.registry.test_beta -s Test -l non_zeros 00_names.gff3.new.non_zeros perl load_genes_from_jgi_gff3_02.pl -e ./ensembl.registry.test_beta -s Test -l zeros 00_names.gff3.new.zeros
 * 1) loading genes from GFF files:


 * 1) fixing the analysis_description table:

INSERT INTO test_beta_core_1_57_1.analysis_description select * from analysis_description; Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0  Warnings: 0


 * 1) check the analysis_id in test_beta_core_1_57_1.analysis
 * 2) UPDATE if necessary!
 * 3) I had to insert the values twice and change the analysis_id to "40" and "41"