Wikiomics:DNA sequencing

=Base calling (ABI)=


 * phred giving more accurate calls for less accurate part of the sequence (like at the end of the run, say 600bp and more) . Phred also gives a probability/quality values for each of the bases allowing more accurate assembly. Quality scores range from 4 to about 60, the "high quality bases" are these with scores > 20. The latest beta version (0.071220.b) supports ABI_3730 as well as older ABI models (373, 377, and 3700),      Molecular Dynamics MegaBACE and LI-COR 4000.

Bash shell: export PHRED_PARAMETER_FILE=/path/to/your/file/phredpar.dat
 * 1) To run it you need to set up PHRED_PARAMETER_FILE variable.

phred -doc | less
 * 1) To see all the options:

phred -id input_directory -cd scf_output_directory:
 * 1) To do simple basecalling on _all_ files in a input_directory and store the SCF files in scf_output_directory:

Caveat: names of the new SCF files will be the same as input files.


 * Long Trace & Peak Trace from Nucleics. Claims to increase the lenght of readable bases by ca 80bp. Separate software module for increasing daily throughput of a capillary sequencer. (not tested)
 * there has been a number of other papers describing algorithms supposedly superior to phred but working software is not easily obtainable if at all.

=Sequence assembly= See and read!: http://www.cbcb.umd.edu/software/

First generation

 * phrap
 * TIGR assembler
 * CAP3 WWW@Pasteur

Genome assemblers used in current genomic projects

 * Arachne @Broad Inst (largest number of citations)
 * Phusion @Sanger
 * Atlas @Baylor


 * JAZZ -> @JGI in house only
 * RAMEN (not published yet as for 10-02-12), used for medaka and silkworm genome sequencing projects

New Programs

 * Minimus suitable for bacterial genomes, part of AMOS
 * AMOS A Modular Open-Source Assembler


 * EULER P.Pevzner graph algorithm producing superior contigs. Requires phrap and patched ReAligner


 * MIRA latest version 2.9.25 enables true hybrid sequence assembly (454 data [GS20 or GS FLX], Solexa with Sanger reads).


 * SSAKE program for assembly milions of short sequences


 * Newbler Assembler software from 454 for de novo sequence assembly.

Experimental

 * SHARCGS, a DNA assembly program designed for de novo assembly of 25-40mer input fragments and deep sequence coverage.


 * ALLPATHS (HTML) algorithm only


 * Maq mapping short reads to an existing genomic sequence


 * SOAP suite of programs, no de novo assembly (yet).


 * Bowtie "fast and memory efficient"

The most complete list to date is @seqanswers

See also software from
 * GSC Software Centre at Canada's Michael Smith Genome Sciences Centre.

Sequence databases & formats

 * SRF a generic format for DNA sequence data
 * The Short Read Archive @NCBI

Short reads assembly (Solexa etc)

 * Velvet Paper(HTML) De Bruijn Graphs based asembler from EBI (Zerbin & Birney)

Contig ordering/finishing

 * Hawkeye interactive visual analytics tool for genome assemblies

=Quality control=


 * TileQC R based program for quality control of Solexa reads