Short read toolbox Botany2012
From OpenWetWare
Jump to navigationJump to search
This page was created to provide an online resource for participants of the Next Generation Sequencing Workshop at Botany2012.
Short Read Workshop, Botany 2012
This page was created to provide an online resource for participants of the Next Generation Sequencing Workshop at Botany2012.
Why open source software?
Rocchini and Neteler 2012 Four Freedoms - An article which explains the importance of open source software in science.
Platforms
Currently available platforms:
Sequence format information
- Short Read Toolbox - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
- FASTQ - Wikipedia's FASTQ page.
- FASTA - Wikipedia's FASTA page.
Alignment format information
Short-read quality control software
- TileQC - Requires R, RMySQL and MySQL.
- FastQC - A quality control tool for high throughput sequence data. A Java application.
- Short Read Toolbox - Scripts for quality control of Illumina data.
Open source de novo genome assemblers
- Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
- ABySS - Multi-threaded de novo assembly.
Open source de novo transcriptome assemblers
- Trinity - De novo assembler designed specifically for transcriptomes.
- Rnnotator - Uses multiple calls to velvet (see de novo genome assemblers).
- Trans-ABySS - Uses multiple calls to ABySS (see de novo genome assemblers).
- Oases - Post-processes velvet output (see de novo genome assemblers) for transcriptomic work.
Hybrid assemblers (reference guided & de novo)
- YASRA - Yet Another Short Read Aligner.
- Aakrosh Ratan dissertation - Description of YASRA.
- Liston:Computer_Scripts - Scripts for post-processing of YASRA contigs.
Open source reference guided assemblers
- SOAP - Short Oligonucleotide Analysis Package.
- MAQ - Mapping and Assembly with Qualities.
- Bowtie - Bowtie. An ultrafast, memory-efficient short read aligner.
- BWA - Burrows-Wheeler aligner.
SNP discovery and calling
Assembly viewers
Sequence query programs
- BLAST - BLAST.
- PLAN - A web application for conducting, organizing, and mining large-scale BLAST searches (limited to 1,000 queries).
- BLAT - BLAT.
Perl
A very brief example to demonstrate file input/output.
Code:
#!/usr/bin/perl use strict; use warnings; my (@temp, $in, $out); my $inf = "data.fq"; my $outf = "data_out.fq"; open($in, "<", $inf) or die "Can't open $inf: $!"; open($out, ">", $outf) or die "Can't open $outf: $!"; while(<$in>){ chomp($temp[0]=$_); # First line is an identifier. chomp($temp[1]=<$in>); # Second line is sequence. chomp($temp[2]=<$in>); # Third line is an identifier. chomp($temp[3]=<$in>); # Fourth line is quality. print $out join("\t", @temp)."\n"; } close $in or die "$in: $!"; close $out or die "$out: $!";
- perlintro - Introduction to perl with links to other documentation.
- BioPerl beginners - Introduction to BioPerl (be prepared for object oriented code).
R project
- R project - Statistical programming environment.
- Bioconductor - R for biologists (micro-array and next generation data).
- APE - Analysis of phylogenetics and evolution R package.
- HT Sequence Analysis with R and Bioconductor
Computing resources
- Galaxy - Web-based front end for popular bioinformatic tools.
- Atmosphere - Virtual computing at iPlant.
- XSEDE portal - Extreme Science and Engineering Discovery Environment.