Wikiomics:RNA-Seq: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
 
(52 intermediate revisions by the same user not shown)
Line 8: Line 8:
===Tophat and Cufflinks===
===Tophat and Cufflinks===
* Tophat
* Tophat
web: http://tophat.cbcb.umd.edu/
web: http://ccb.jhu.edu/software/tophat/
 
current version: 2.0.12 released 2014-06-24 6/24/2014


current version: 1.3.3 released 2011.10.16
active development: yes
active development: yes
platforms
platforms
Line 19: Line 20:
requirements: SAMtools (http://samtools.sourceforge.net/)
requirements: SAMtools (http://samtools.sourceforge.net/)


base mapper: bowtie
base mapper: bowtie2/bowtie


input: fastq, fasta
input: fastq, fasta
Line 34: Line 35:
web: http://cufflinks.cbcb.umd.edu/
web: http://cufflinks.cbcb.umd.edu/


current version: 1.1.0 released 2011.09.08
current version: 2.2.1 released 2.2.1 release 2014-05-05


===GMAP/GSNAP===
===GMAP/GSNAP===
Line 40: Line 41:
http://research-pub.gene.com/gmap/
http://research-pub.gene.com/gmap/


current version: 2011-10-16 (NEW!)
current version: 2014-09-29
 
active developement: yes
 
FastA and FASTQ input, support for paired ends, gziped fastq files, circular chromosomes
 
 
To compile:
<pre>


FastA and FASTQ input, support for paired ends, gziped fastq files
./configure --prefix=/some/path/for_install/ --with-gmapdb=/path_to/gmap_DBs/ --with-samtools=/path_to/samtools_0.1.18/
make
make install
</pre>


To run:
<pre>
<pre>
gsnap --dir=genome_directory --db=genome_database --batch=2 --localsplicedist=10000 --nthreads=4 --nofails my_reads.fasta
gsnap --dir=genome_directory --db=genome_database --batch=2 --localsplicedist=10000 --nthreads=4 --nofails my_reads.fasta
Line 51: Line 64:


===GEM===
===GEM===
http://sourceforge.net/apps/mediawiki/gemlibrary/index.php?title=The_GEM_library


current version: GEM-binaries-Linux-x86_64-20100419-003425.tbz2
web: http://algorithms.cnag.cat/wiki/The_GEM_library
 
current version: 2013-04-06
active development: yes


base mapper: gem-mapper and gem-split-mapper
base mapper: gem-mapper and gem-split-mapper


Developed in Ocaml and Python.  
Developed in Ocaml and Python.  
Two step mapping (unspliced mode first, then unmapped reads are mapped with splicing).  
Two step mapping (unspliced mode first, then unmapped reads are mapped with splicing).


Article: http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2221.html


===HMMSplicer===
===HMMSplicer===
http://derisilab.ucsf.edu/index.php?software=105
http://derisilab.ucsf.edu/index.php?software=105


current version: 0.9.5 2010.11.25
current version: 0.9.5 from 2010.11.25
 
active development: maybe (no new release in a 2 years)


base mapper: bowtie
base mapper: bowtie
Line 92: Line 110:


Caveat: due to training process you have to use reads of the same length.
Caveat: due to training process you have to use reads of the same length.


===SOAPsplice===
===SOAPsplice===
http://soap.genomics.org.cn/soapsplice.html
http://soap.genomics.org.cn/soapsplice.html


current version: 1.6 from 2011.10.17
current version: 1.9 from 2011.02.23
 
active development: yes


The SOAPals website provides detailed information how to install and run it plus performance evaluation data.
The SOAPals website provides detailed information how to install and run it plus performance evaluation data.
Line 104: Line 123:
(untested)
(untested)
=== SplitSeek ===
=== SplitSeek ===
http://solidsoftwaretools.com/gf/project/splitseek/
http://solidsoftwaretools.com/gf/project/splitseek/ (warning: dead link. solidsoftwaretools.com closed)


current version: 1.3.4
current version: 1.3.4
Line 125: Line 144:


Solved by Paolo Di Tommaso from CRG.
Solved by Paolo Di Tommaso from CRG.


===X-MATE===
===X-MATE===
Line 136: Line 154:


==Spliced Mappers (in developement)==
==Spliced Mappers (in developement)==
=== Olego ===
web: http://ngs-olego.sourceforge.net/


===PALMapper (fusion of GenomeMapper & QPALMA)===
===PALMapper (fusion of GenomeMapper & QPALMA)===


http://www.fml.tuebingen.mpg.de/raetsch/suppl/palmapper
web: http://raetschlab.org//suppl/palmapper


current version: palmapper-0.4.tar.gz 2011.07.04
current version: palmapper-0.4.tar.gz 2011.07.04


active development: yes(?)


Simple installation (run "make" in installation directory -> tested on Debian 6.0 with gcc ). To check the install go to "testcase" and run "make" again. This requires fast Internet connection as it downloads genome and FASTQ files.   
Simple installation (run "make" in installation directory -> tested on Debian 6.0 with gcc ). To check the install go to "testcase" and run "make" again. This requires fast Internet connection as it downloads genome and FASTQ files.   
Line 167: Line 191:
web: http://www.netlab.uky.edu/p/bioinfo/MapSplice/
web: http://www.netlab.uky.edu/p/bioinfo/MapSplice/


current version: MapSplice 1.15.2  from 2011.4.12
current version: MapSplice 2.1.8 from 07/01/2014 
active development: yes


base mapper: bowtie (new version finally supports bowtie 0.12.7)
base mapper: bowtie (new version finally supports bowtie 0.12.7)


It does not use genome annotation, detects splice junctions based on RNASeq data.
<!---
Caveat: reference genome sequence is chromosome oriented (= one fasta file for a chromosome).
Caveat: reference genome sequence is chromosome oriented (= one fasta file for a chromosome).
-->


===SpliceMap===
===SpliceMap===
Line 177: Line 206:


current version: 3.3.5.2  2010.10.23
current version: 3.3.5.2  2010.10.23
active development: maybe (no new release in a 1.5 years)


base mapper (preferred): bowtie (others possible)
base mapper (preferred): bowtie (others possible)
Line 199: Line 229:
Malachi Griffith, Griffith OL, Mwenifumbo J, Morin RD, Goya R, Tang MJ, Hou YC, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li I, McDonald H, Teague K, Zhao Y, Zeng T, Delaney AD, Hirst M, Morin GB, Jones SJM, Tai IT, Marco A. Marra. Alternative expression analysis by RNA sequencing. Nature Methods. 2010 Oct;7(10):843-847.
Malachi Griffith, Griffith OL, Mwenifumbo J, Morin RD, Goya R, Tang MJ, Hou YC, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li I, McDonald H, Teague K, Zhao Y, Zeng T, Delaney AD, Hirst M, Morin GB, Jones SJM, Tai IT, Marco A. Marra. Alternative expression analysis by RNA sequencing. Nature Methods. 2010 Oct;7(10):843-847.


version: ALEXA-Seq_v.1.16 from  2011.06.22
version: ALEXA-Seq_v.1.17 from  Feb 2012


Available configured virtual machines (for VMware) ver.  1.12
Available configured virtual machines (for VMware) ver.  1.12
Line 237: Line 267:
Transcriptome Assembly Utility: requires already mapped input.  
Transcriptome Assembly Utility: requires already mapped input.  
Compatible mappers: Blat, Eland and HashMatch.  Also accepts gff3 files.
Compatible mappers: Blat, Eland and HashMatch.  Also accepts gff3 files.
===SAW (method no software yet)===
Ning K, Fermin D (2010) SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicing Fingerprints. PLoS ONE 5(8): e12047. doi:10.1371/journal.pone.0012047


==Not spliced==
==Not spliced==
Line 285: Line 310:
www: http://last.cbrc.jp/
www: http://last.cbrc.jp/


Latest:  
Latest: last-262.zip from 2012.10.23
last-189.zip 2011.10.19


paper: http://genome.cshlp.org/content/21/3/487.full
paper: http://genome.cshlp.org/content/21/3/487.full
Line 327: Line 351:


<pre>
<pre>
samtools view -b -S -T my_genome.fasta -o reads_vs_my_genome_db.bam reads_vs_my_genome_db.sam
samtools view -but my_genome.fasta.fai -o reads_vs_my_genome_db.unsorted.bam reads_vs_my_genome_db.sam
 
samtools sort reads_vs_my_genome_db.unsorted.bam reads_vs_my_genome_db.sorted
 
samtools index reads_vs_my_genome_db.sorted.bam
 
</pre>
</pre>


Line 335: Line 364:
Caveat: default options result in mapping homopolymeric runs/simple repeats etc. to multiple genome position. To avoid this genome should be softmasked first.
Caveat: default options result in mapping homopolymeric runs/simple repeats etc. to multiple genome position. To avoid this genome should be softmasked first.


==Obsolete?==
==Obsolete / not available==
===RNA-mate===
===RNA-mate===


http://solidsoftwaretools.com/gf/project/rnamate
http://solidsoftwaretools.com/gf/project/rnamate (dead link)


current version: 1.01
current version: 1.01
Line 354: Line 383:


Seems that SOAPals has been replaced by SOAPsplice, and SOAPals is not available anymore.
Seems that SOAPals has been replaced by SOAPsplice, and SOAPals is not available anymore.
===SAW (method no software yet)===
Ning K, Fermin D (2010) SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicing Fingerprints. PLoS ONE 5(8): e12047. doi:10.1371/journal.pone.0012047

Latest revision as of 02:50, 30 September 2014

This list is intended mostly for de novo splice site / transcript / gene prediction in newly sequenced genomes. At the same time tools listed below often are used in other pipelines such as transcript quantification or SNP discovery.


Mappers

Spliced Mappers (tested)

Tophat and Cufflinks

  • Tophat

web: http://ccb.jhu.edu/software/tophat/

current version: 2.0.12 released 2014-06-24 6/24/2014

active development: yes platforms

  • Linux x86_64 binary
  • Mac OS X x86_64 binary
  • source code

requirements: SAMtools (http://samtools.sourceforge.net/)

base mapper: bowtie2/bowtie

input: fastq, fasta

output: BAM

Currently the most widely used program for RNA-Seq mapping. Output often processed with Cufflinks. Latest version supports TopHat detection of insertions and deletions using RNA-Seq data. Supports mixed length reads (suitable i.e. for 454 data)


  • Cufflinks:

web: http://cufflinks.cbcb.umd.edu/

current version: 2.2.1 released 2.2.1 release 2014-05-05

GMAP/GSNAP

http://research-pub.gene.com/gmap/

current version: 2014-09-29

active developement: yes

FastA and FASTQ input, support for paired ends, gziped fastq files, circular chromosomes


To compile:


./configure --prefix=/some/path/for_install/ --with-gmapdb=/path_to/gmap_DBs/ --with-samtools=/path_to/samtools_0.1.18/
make
make install

To run:

gsnap --dir=genome_directory --db=genome_database --batch=2 --localsplicedist=10000 --nthreads=4 --nofails my_reads.fasta

Check "gsnap --help" for more detailed options

GEM

web: http://algorithms.cnag.cat/wiki/The_GEM_library

current version: 2013-04-06 active development: yes

base mapper: gem-mapper and gem-split-mapper

Developed in Ocaml and Python. Two step mapping (unspliced mode first, then unmapped reads are mapped with splicing).

Article: http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2221.html

HMMSplicer

http://derisilab.ucsf.edu/index.php?software=105

current version: 0.9.5 from 2010.11.25

active development: maybe (no new release in a 2 years)

base mapper: bowtie

input: fastq (converts quality values to phred scale)

output: bed file of junctions

Developed in Python. Requirements:

  • OS: tested on MacOS X (authors), Linux Fedora 8,
  • Python 2.6 (tested with 2.6.4)
  • numpy (tested by authors with version 1.3.0)
  • bowtie (works with 0.12.7)

Also completes running example with Python 2.7.1rc1, numpy-1.5.1 and bowtie 0.12.7 on in-house data.

Basic command:

python runHMM.py -o output_dir -i input_RNA-seq_data.qseq  -q quality_type -g genome4mapping  -j min_intron_size -k max_intron_size -p number_of_procesors_to_use 

type: python runHMM.py --help for more explanation

Tip: you can map your reads first in a non-spliced mode with a mapper of your choice, filter out all mapped reads and feed HMMsplicer with just unmapped reads.

Caveat: due to training process you have to use reads of the same length.

SOAPsplice

http://soap.genomics.org.cn/soapsplice.html

current version: 1.9 from 2011.02.23

active development: yes

The SOAPals website provides detailed information how to install and run it plus performance evaluation data.

SOLiD data only

(untested)

SplitSeek

http://solidsoftwaretools.com/gf/project/splitseek/ (warning: dead link. solidsoftwaretools.com closed)

current version: 1.3.4

Ameur A, Wetterbom A, Feuk L, Gyllensten U. Global and unbiased detection of splice junctions from RNA-seq data. Genome Biol. 2010 Mar 17;11(3):R34.

Developed in Perl.

It requires AB WT Analysis Pipeline http://solidsoftwaretools.com/gf/project/transcriptome/ which breaks while compiling out of the box with gcc 4.4.x.

Solution:

  1. edit ./readsmap/src/simu.cxx file
  2. replace line 27:
char *s = strchr(seq, '.');

with this one:

const char *s = strchr(seq, '.');

Solved by Paolo Di Tommaso from CRG.

X-MATE

http://grimmond.imb.uq.edu.au/X-MATE/

current version: Oct 2010?

written in: perl (with some optional python, Java and C++) Requires junction libraries (available from X-MAte web site for human and mouse).

Spliced Mappers (in developement)

Olego

web: http://ngs-olego.sourceforge.net/


PALMapper (fusion of GenomeMapper & QPALMA)

web: http://raetschlab.org//suppl/palmapper

current version: palmapper-0.4.tar.gz 2011.07.04

active development: yes(?)

Simple installation (run "make" in installation directory -> tested on Debian 6.0 with gcc ). To check the install go to "testcase" and run "make" again. This requires fast Internet connection as it downloads genome and FASTQ files.


Mapsplice

web: http://www.netlab.uky.edu/p/bioinfo/MapSplice/

current version: MapSplice 2.1.8 from 07/01/2014 active development: yes

base mapper: bowtie (new version finally supports bowtie 0.12.7)

It does not use genome annotation, detects splice junctions based on RNASeq data.


SpliceMap

http://www.stanford.edu/group/wonglab/SpliceMap/

current version: 3.3.5.2 2010.10.23 active development: maybe (no new release in a 1.5 years)

base mapper (preferred): bowtie (others possible) "Currently, only the canonical GT-AG splice sites are identified."


Requirements:

  • 8GB minimum for human genome, 16GB recommended
  • input formats: RAW, FASTQ or FASTA
  • Read >= 50bp

Base mappers:

  • Bowtie (preferred)
  • others: SeqMap, Eland

Features: "Support for arbitrarily long uneven read lengths"

Alexa-Seq

http://www.alexaplatform.org/alexa_seq/downloads.htm

Malachi Griffith, Griffith OL, Mwenifumbo J, Morin RD, Goya R, Tang MJ, Hou YC, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li I, McDonald H, Teague K, Zhao Y, Zeng T, Delaney AD, Hirst M, Morin GB, Jones SJM, Tai IT, Marco A. Marra. Alternative expression analysis by RNA sequencing. Nature Methods. 2010 Oct;7(10):843-847.

version: ALEXA-Seq_v.1.17 from Feb 2012

Available configured virtual machines (for VMware) ver. 1.12

GMORSE

www: http://www.genoscope.cns.fr/externe/gmorse/

Proper name: G-Mo.R-Se

current version: gmorse_02mar2011.tar.gz 2011.03.02 (new version)

It was used for Vitis vinifera genome project (old version).

ERANGE

www: http://woldlab.caltech.edu/rnaseq/

current version: Interim Erange3.3 release plus 4.0a new version

base mapper: bowtie or blat

requirements:

  1. Python 2.5+
  2. Cistematic 3.0 from http://cistematic.caltech.edu
  3. Cistematic version of the genomes, unless providing your own custom genome and annotations.
  4. You will need genomic sequences to build the expanded genome, as well as gene models from UCSC.
  1. (Optional) Python is very slow on large datasets. Use of the psyco module (psyco.sf.net) on 32-bit Linux or all Mac Intel machines to significantly speed up runtime is highly recommended.
  2. (Optional) Several of the plotting scripts also rely on Matplotlib,

which is available at matplotlib.sf.net.

TAU

http://mocklerlab-tools.cgrb.oregonstate.edu/TAU.html

current version: 1.4 2010.09.06

Transcriptome Assembly Utility: requires already mapped input. Compatible mappers: Blat, Eland and HashMatch. Also accepts gff3 files.

Not spliced

Mapping short reads to draft genome sequence with multiple contigs poses problems for current spliced mappers.

blat

http://genome.ucsc.edu/FAQ/FAQblat.html

Detailed description: http://genome.ucsc.edu/goldenPath/help/blatSpec.html

download from: http://users.soe.ucsc.edu/~kent/src/

last version: blatSrc34.zip 2007.04.20

Options used to produce hints for Augustus gene prediction program: (based on: http://augustus.gobics.de/binaries/readme.rnaseq.html)

blat -noHead -stepSize=5 -minIdentity=93 genome.masked.fa rnaseq.fa ali.psl

bahlerlab (nature Protocols 200 Defining transcribed regions using RNA-seq Brian T Wilhelm, Samuel Marguerat, Ian Goodhead & Jürg Bähler http://www.bahlerlab.info/docs/nprot.2009.229.pdf

blat -noHead  -out=psl -oneOff=1  -tileSize=8 FASTA_genome.txt FASTA_sequences.txt Output.bsl

Pash

Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics. 2010 Nov 23;11(1):572 Authors: Coarfa C, Yu F, Miller CA, Chen Z, Harris RA, Milosavljevic A

Download: http://www.brl.bcm.tmc.edu/pash/pashDownload.rhtml

current version: 3.0.6.2

last

www: http://last.cbrc.jp/

Latest: last-262.zip from 2012.10.23

paper: http://genome.cshlp.org/content/21/3/487.full

requirements: min 2GB RAM/mammalian genome, 16-20GB recommended for optimal performance

Installation:

cd src; make

Creating genomic database and short reads mapping:

#db creation
lastdb  -m1111110 -s20G -v my_genome_db my_genome.fasta

#mapping
lastal -Q3 -o reads_vs_my_genome_db.out -f 0 -v my_genome_db reads.fastq

where: -Q3: fastq Illumina format -f 0: output in tabulated format -v: verbose (prints what it is doing)

last can map reads with indels and truncate large parts of the reads (highly sensitive but with lower specificity). For example it can report just 30 nucleotide long matches out of 54nn long queries. Output needs to be filtered from spurious matches.

It does not have multiple processor option, so for faster mapping one has to split fastq file(s), run last in parallel and combine the results (or use Hadoop).

Since version 149 it is possible to get SAM output by two step procedure:

#get MAF output first
lastal -Q3 -o reads_vs_my_genome_db.maf -f 1 -v my_genome_db reads.fastq

#convert MAF to SAM using maf-convert.py from scripts directory 
maf-convert.py sam reads_vs_my_genome_db.maf > reads_vs_my_genome_db.sam

To convert it to bam format use samtools:

samtools view -but my_genome.fasta.fai -o reads_vs_my_genome_db.unsorted.bam reads_vs_my_genome_db.sam

samtools sort reads_vs_my_genome_db.unsorted.bam reads_vs_my_genome_db.sorted

samtools index reads_vs_my_genome_db.sorted.bam

(tested with samtools ver 0.1.13)


Caveat: default options result in mapping homopolymeric runs/simple repeats etc. to multiple genome position. To avoid this genome should be softmasked first.

Obsolete / not available

RNA-mate

http://solidsoftwaretools.com/gf/project/rnamate (dead link)

current version: 1.01

No activity since 2009. Successor: X-MATE

SOAPals

http://soap.genomics.org.cn/soapals.html

current version: 1.1 , 05-05-2010

The SOAPals website provides exact informations how to install and run it.

Note 2011.03.01

Seems that SOAPals has been replaced by SOAPsplice, and SOAPals is not available anymore.


SAW (method no software yet)

Ning K, Fermin D (2010) SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicing Fingerprints. PLoS ONE 5(8): e12047. doi:10.1371/journal.pone.0012047