Short read toolbox: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 6: Line 6:
=Online short-read resources=
=Online short-read resources=
*[http://seqanswers.com/ SEQanswers] - Online forum for next generation sequencing.
*[http://seqanswers.com/ SEQanswers] - Online forum for next generation sequencing.
*[http://seqanswers.com/forums/showthread.php?t=43 SEQanswers software post] - Post of software avaliable for next generation sequence data.
*[http://seqanswers.com/forums/showthread.php?t=43 SEQanswers software post] - Post of software available for next generation sequence data.
*[http://seqanswers.com/wiki/Category:Bioinformatics_application SEQwiki] - SEQ Answers wikilist of bioinformatic applications.
*[http://seqanswers.com/wiki/Category:Bioinformatics_application SEQwiki] - SEQ Answers wikilist of bioinformatic applications.
*[http://pathogenomics.bham.ac.uk/blog/2009/09/tips-for-de-novo-bacterial-genome-assembly/ De novo tips] - Blog on de novo assembly.
*[http://pathogenomics.bham.ac.uk/blog/2009/09/tips-for-de-novo-bacterial-genome-assembly/ De novo tips] - Blog on de novo assembly.

Revision as of 16:33, 25 July 2010

Short read toolbox

This page has been created to help list resources for working with next generation sequence data.


Online short-read resources

List of sequence format information

  • Short Read Toolbox - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
  • FASTQ - Wikipedia's FASTQ page.
  • FASTA - Wikipedia's FASTA page.

List of alignment format information

List of short-read quality control software

  • TileQC - Requires R, RMySQL and MySQL.
  • FastQC - A quality control tool for high throughput sequence data. A Java application.
  • Short Read Toolbox - Scripts for quality control of Illumina data.

List of open source de novo assemblers

  • Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
  • Edena - 32 and 64 bit Linux.
  • ABySS - Multi-threaded de novo assembly.
  • Ray - Multi-threaded de novo assembly.
  • QSRA - Utilizes quality scores.

List of open source reference guided assemblers

  • SOAP - Short Oligonucleotide Analysis Package.
  • MAQ - Mapping and Assembly with Qualities.
  • Bowtie - Bowtie. An ultrafast, memory-efficient short read aligner.
  • BWA - Burrows-Wheeler aligner.
  • RGA - Perl script which calls blat to assemble short reads.

Hybrid assemblers (reference guided & de novo)

List of assembly viewers

  • Tablet - Tablet, visualizes ACE, AFG, MAQ, SOAP, SAM and BAM formats.
  • SAMtools - SAMtools.

List of alignment programs

  • MAFFT - MAFFT.
  • T-Coffee - T-Coffee.
  • Muscle - Muscle.
  • LASTZ - LASTZ, hosted at the Miller lab.
  • MUMmer - MUMmer.
  • Mulan Multiple Sequence Alignment and Visualization Tool.
  • VISTA Tools for Comparative Genomics.
  • mauve - Multiple (bacterial) genome aligment.

List of nucleotide sequence query programs

Perl

A very brief example to demonstrate file input/output.

Code:

#!/usr/bin/perl
use strict;
use warnings;
my (@temp, $in, $out);
my $inf = "data.fq";
my $outf = "data_out.fq";
open($in, "<", $inf) or die "Can't open $inf: $!";
open($out, ">", $outf) or die "Can't open $outf: $!";
while(<$in>){
  chomp($temp[0]=$_); # First line is an identifier.
  chomp($temp[1]=<$in>); # Second line is sequence.
  chomp($temp[2]=<$in>); # Third line is an identifier.
  chomp($temp[3]=<$in>); # Fourth line is quality.
  print $out join("\t", @temp)."\n";
}
close $in or die "$in: $!";
close $out or die "$out: $!";
  • perlintro - Introduction to perl with links to other documentation.
  • BioPerl beginners - Introduction to BioPerl (be prepared for object oriented code).

Python

R project

Useful links