Short read toolbox Botany2012

From OpenWetWare
Jump to navigationJump to search

This page was created to provide an online resource for participants of the Next Generation Sequencing Workshop at Botany2012.

Short Read Workshop, Botany 2012

This page was created to provide an online resource for participants of the Next Generation Sequencing Workshop at Botany2012.

Why open source software?

Rocchini and Neteler 2012 Four Freedoms - An article which explains the importance of open source software in science.


Currently available platforms:

  • Illumina - Illumina (formerly Solexa).
  • 454 - 454/Roche.

Sequence format information

  • Short Read Toolbox - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
  • FASTQ - Wikipedia's FASTQ page.
  • FASTA - Wikipedia's FASTA page.

Alignment format information

Short-read quality control software

  • TileQC - Requires R, RMySQL and MySQL.
  • FastQC - A quality control tool for high throughput sequence data. A Java application.
  • Short Read Toolbox - Scripts for quality control of Illumina data.

Open source de novo genome assemblers

  • Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
  • ABySS - Multi-threaded de novo assembly.

Open source de novo transcriptome assemblers

  • Trinity - De novo assembler designed specifically for transcriptomes.
  • Rnnotator - Uses multiple calls to velvet (see de novo genome assemblers).
  • Trans-ABySS - Uses multiple calls to ABySS (see de novo genome assemblers).
  • Oases - Post-processes velvet output (see de novo genome assemblers) for transcriptomic work.

Hybrid assemblers (reference guided & de novo)

Open source reference guided assemblers

  • SOAP - Short Oligonucleotide Analysis Package.
  • MAQ - Mapping and Assembly with Qualities.
  • Bowtie - Bowtie. An ultrafast, memory-efficient short read aligner.
  • BWA - Burrows-Wheeler aligner.

SNP discovery and calling

Assembly viewers

  • Tablet - Tablet, visualizes ACE, AFG, MAQ, SOAP, SAM and BAM formats.
  • SAMtools - SAMtools.

Sequence query programs

  • PLAN - A web application for conducting, organizing, and mining large-scale BLAST searches (limited to 1,000 queries).
  • BLAT - BLAT.


A very brief example to demonstrate file input/output.


use strict;
use warnings;
my (@temp, $in, $out);
my $inf = "data.fq";
my $outf = "data_out.fq";
open($in, "<", $inf) or die "Can't open $inf: $!";
open($out, ">", $outf) or die "Can't open $outf: $!";
  chomp($temp[0]=$_); # First line is an identifier.
  chomp($temp[1]=<$in>); # Second line is sequence.
  chomp($temp[2]=<$in>); # Third line is an identifier.
  chomp($temp[3]=<$in>); # Fourth line is quality.
  print $out join("\t", @temp)."\n";
close $in or die "$in: $!";
close $out or die "$out: $!";
  • perlintro - Introduction to perl with links to other documentation.
  • BioPerl beginners - Introduction to BioPerl (be prepared for object oriented code).

R project

Computing resources

  • Galaxy - Web-based front end for popular bioinformatic tools.
  • Atmosphere - Virtual computing at iPlant.
  • XSEDE portal - Extreme Science and Engineering Discovery Environment.

Useful links