User:Jarle Pahr/Sequencing

From OpenWetWare
Jump to navigationJump to search

http://nucleicacids.bitesizebio.com/articles/how-to-get-great-dna-sequencing-results/

http://barricklab.org/twiki/bin/view/Lab/ProceduresPrimerDesign

http://www.ki.se/kiseq/KIGene%20troubleshooting.pdf


Nature focus issue - sequencing technology: http://www.nature.com/nbt/journal/v30/n11/index.html

Technologies

For a comparison of next-generation sequencing methods, see http://en.wikipedia.org/wiki/Dna_sequencing#Next-generation_methods

See also:

SeqAnswers.com Tech summaries: http://seqanswers.com/index.php?pageid=summaries


Sanger sequencing (chain termination method)

Pyrosequencing ("454 sequencing")

Pyrosequencing is a "sequence by synthesis" method developed by Mostafa Ronaghi and Pål Nyrén at the Royal Institute of Technology, Stockholm. Sequences are determined by observation of light emission upon addition of a nucleotide complementary to the first unpaired nucleotide of the template.

Quote from Wikipedia:Pyrosequencing:

"ssDNA template is hybridized to a sequencing primer and incubated with the enzymes DNA polymerase, ATP sulfurylase, luciferase and apyrase, and with the substrates adenosine 5´ phosphosulfate (APS) and luciferin."

Sequencing proceeds as follows:

  • Addition of one of the four dNTPs (dATPαS is substituted for ATP, as the former is not a substrate for luciferase). If the dNTP is complementary, DNA polyerase incorporates the nucleotide, releasing pyrophosphate (PPi).
  • ATP sulfurylase catalyzes reaction of PPi and adenosine 5' phosphosulfate to create ATP
  • ATP fuels luciferase-catalyzed conversion of luciferin to oxyluceferin, generating visible light.
  • Unincorporated nucleotides and ATP are degraded by apyrase.

454 sequencing performs massively parallel pyrosequencing. Library DNA containing adapter sequences are adsorbed to DNA-capturing beads. The DNA bound to each bead is then amplified by emulsion-PCR, in which the beads with bound DNA are mixed with PCR reagents and emulsion oil to create a water-in-oil emulsion containing many "microreactors" consisting of beads sorrounded by water. Following PCR amplification, the DNA-binding beads are isolated and deposited into the wells of a microtiter plate. Beads with pyrosequencing enzymes are then added to the plate. Finally, the pyrosequencing is performed, processing the plate in a sequencing machine. 400 000+ DNA fragments/beads can be processed per plate.

Using "multiplex identifiers", different genomic libraries can be bar-coded, facilitating sequencing of several libraries in the same sequencing run.

Platforms:

Platform Throughput (bases/run) Time per run Average (a)/mode (m) read length (nt) Accuracy Introduced (year)
GS FLX+ 700 Mbp 23h Up to 1000 700 bp (m)
GS Junior 35Mbp 12 h 400 400 bp (a) at Phred20/read



GS FLX:

References:

Introductory paper, 454 sequencing: http://www.ncbi.nlm.nih.gov/pubmed/16056220?dopt=Abstract&holding=npg

http://www.wellcome.ac.uk/Education-resources/Education-and-learning/animations/dna/wtx056046.htm

The development and impact of 454 sequencing

Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample

Overview of 454 sequencing: http://classes.soe.ucsc.edu/bme215/Spring09/PPT/BME%20215-5.pdf

Illumina (Solexa) sequencing

http://www.illumina.com/technology/sequencing_technology.ilmn


Platform Throughput (bases/run) (maximum) Time per run Read length (nt) Accuracy Features Introduced (year)
MiSeq Personal Sequencer Up to 8.5 gbp 4 - 48 h 250 >70% bases higher than Q30 at read length 2 x 300 bp
HiSeq 2500/1500 600 Gb 2 x 100 >80 % higher than Q30
HiSeq 2000/1000 300 Gb 2 x 100 >80 % higher than Q30
Genome Analyzer IIx 95 Gb 2 x 150 >80 % higher than Q30

MiSeq datasheet: http://www.illumina.com/documents/products/datasheets/datasheet_miseq.pdf


Side by side comparison of Illumina sequencers: http://www.illumina.com/systems/sequencing.ilmn

Illumina - an introduction to NGS: http://www.illumina.com/Documents/products/Illumina_Sequencing_Introduction.pdf

Ion semiconductor sequencing

Ion Torrent: http://www.invitrogen.com/site/us/en/home/brands/Ion-Torrent.html?cid=fl-iontorrent Platforms:

Platform Throughput (bases/run) Time per run Typical read length Accuracy Introduced (year)
Ion PGM sequencer 10 Mb to 1Gb 90 min+ 35-400 bp
Ion Proton sequencer 1 human genome 2h+ 100 bp


http://www3.appliedbiosystems.com/cms/groups/applied_markets_marketing/documents/generaldocuments/cms_096460.pdf

Nanopore sequencing

Oxford Nanopore: http://www.nanoporetech.com/

Single molecule real time sequencing (Pacific Biosciences)

Microscopical wells on a chip (zero-mode waveguides) each contain a single DNA polymerase enzyme bound to the bottom of the well, which accept a single DNA molecule as template. Fluorescent labelled dNTPs are used for DNA synthesis. Upon incorporation of a dNTP, the fluorescence tag is cleaved from the nucleotide and diffuses from the observation area within the ZMW. The sequence is determined optically by observing incorporation events.

http://www.pacificbiosciences.com/

Platforms:

PacBio RS:

http://www.pacificbiosciences.com/products/

http://www.pacificbiosciences.com/brochure

http://www.pacificbiosciences.com/pdf/Software_and_Analysis_Brochure.pdf

SOLiD sequencing (Applied Biosystems)

DNA nanoball sequencing

http://www.completegenomics.com/services/technology/

Concepts

De Bruijn graph

http://en.wikipedia.org/wiki/De_Bruijn_graph

See also Compeaou et al. 2001, Nature Biotechnology - How to apply de Bruijn graphs to genome assembly: http://www.nature.com/nbt/journal/v29/n11/full/nbt.2023.html

Bridge amplification

http://seq.molbiol.ru/sch_clon_ampl.html

RNA-Seq

Genotyping by Sequencing (GBS)

http://www.maizegenetics.net/gbs-overview

ROC

See http://en.wikipedia.org/wiki/Receiver_operating_characteristic

Edit distance

See http://en.wikipedia.org/wiki/Levenshtein_distance

Color Space/2-base encoding

See

http://finchtalk.geospiza.com/2008/03/color-space-flow-space-sequence-space.html

http://www.biostars.org/p/43855/

http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/CSHL_Fu.pdf

See also

http://en.wikipedia.org/wiki/2_Base_Encoding

Targeted sequencing

Targeted "capturing kits" may be used to sequence a subset of genomic DNA. The human exome (as defined by the Consensus CDS (CCDS) project) totals about 38 Mb, covering about 1.22 % of the human genome

(The SureSelect Human All Exon Kit )

See also: http://massgenomics.org/2011/10/major-exome-platforms-compared.html

Scaffolding

http://genome.jgi-psf.org/help/scaffolds.html

http://seqanswers.com/wiki/How-to/scaffolding

http://bioinformatics.oxfordjournals.org/content/early/2012/04/05/bioinformatics.bts175

http://www.scfbm.org/content/7/1/4

http://www.cbcb.umd.edu/research/assembly_primer.shtml

Paired-end reads

N50 Statistic

N50 length: In a collection of contigs, the longest length for which the subset of contigs consisting of all contigs with that length or longer contains at least half of the total of the length of the contig collection.

NG50: As N50, except that the goal is half of the total of the genome size.

http://en.wikipedia.org/wiki/N50_statistic

http://seqanswers.com/forums/showthread.php?p=41420

Haplotypes

See also:

http://hapmap.ncbi.nlm.nih.gov/originhaplotype.html.en

http://en.wikipedia.org/wiki/Haplotype

Haploview

http://en.wikipedia.org/wiki/Haplogroup

Loss of Heterozygosity

http://en.wikipedia.org/wiki/Loss_of_heterozygosity

Copy number variants (CNVs)

Short Tandem Repeats (STRs)

Genotyping of STRs is used to produce forensic DNA profiles. See http://massgenomics.org/2013/01/identifying-samples-genomic-data.html

http://www.biology.arizona.edu/human_bio/activities/blackett2/str_codis.html

http://www.cstl.nist.gov/strbase/fbicore.htm

Databases

http://www.ncbi.nlm.nih.gov/gap

Sequence Read Archive: http://www.ncbi.nlm.nih.gov/sra

European Nucleotide Archive: http://www.ebi.ac.uk/ena/

Sequence alignment/Assembly

Compendium of HTS mappers: http://wwwdev.ebi.ac.uk/fg/hts_mappers/

Comparison of assemblers: http://lh3lh3.users.sourceforge.net/alnROC.shtml

BWA: http://bio-bwa.sourceforge.net/

Bowtie - An ultrafast memory-efficient short read aligner:' http://bowtie-bio.sourceforge.net/index.shtml

http://www.ncbi.nlm.nih.gov/pubmed/20211242

Primers and reviews:


http://www.broadinstitute.org/files/shared/mpg/nextgen2010/nextgen_li.pdf

NCBI primer on genome assembly methods: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/assembly.shtml

Nature Biotechnology Primer - How to map billions of short reads onto genomes: http://www.nature.com/nbt/journal/v27/n5/full/nbt0509-455.html

Bioinformatics, 2012: Tools for mapping high-throughput sequencing data: http://bioinformatics.oxfordjournals.org/content/28/24/3169

A survey of sequence alignment algorithms for next-generation sequencing: http://bib.oxfordjournals.org/content/11/5/473.full

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0019175


De novo assembly:

Optimal Assembly for High Throughput Shotgun Sequencing: http://arxiv.org/abs/1301.0068

Counter-intuitevely, too high coverage can be problematic: http://seqanswers.com/forums/showthread.php?t=24965

Sequencing services

Service Sample specification Primer specification Ship to Link
GATC LightRun Add 5 uL DNA (80-100 ng/uL plasmid or 20-80 ng/uL purified PCR product) + 5 uL 5uM (5 pmol/uL) primer to the same tube Tm 52-58 C, 17-19 bp, (8-9 G+C for 18-mer) G or C at 3' end (max 3 Gs or Cs), maximum 4bp run. GATC Biotech AG. European Custom Sequencing Centre. Gotrfied-Hagen-Strasse 20. 51105 Köln. http://www.gatc-biotech.com/en/lp4/new-lightrun-sequencing.html
Macrogen Single-pass Add 20 uL DNA (100 ng/uL plasmid or 50 ng/uL purified PCR product) to one tube. Add 20µl primer (10 pmol/uL) to a separate tube. 18-25 bp, 40-60 % GC, Tm 55-60 Macrogen Europe,

IWO, Kamer IA3-195, Meibergdreef 39,1105 AZ Amsterdam Zuid-oost. Netherlands. Attention: J.S .Park.

http://dna.macrogen.com/eng/support/seq/seq_submission.jsp

Sequencing-based techniques

ChIP-sequencing

Sequencing/genomics centres

BGI: http://www.genomics.cn/

New York Genome Center: http://nygenome.org/

JGI: http://www.jgi.doe.gov/


See also: http://omicsmaps.com/

Primers

Custom primers

Name Length (bp) Sequence Tm (C) [calculated] Tm (C) [Analytical] GC (% / bp) Comment
pJP-1_seq5 18 CAGCGTGCGAGTGATTAT 53.9/60.6 (2)/52.6 (3) 50 Binds upstream of XylS region in pSB-M1g
pJP-1_seq6 18 AGACCACATGGTCCTTCT 57.5° (2)/52.8 ºC(3) 53.9 50 Binds near end of GFPmut3 in pSB-M1g
SeqMG1 AGCAGATCCACATCCTTGAA 62.7 (2)/53.7 (3) Binds at nt 5672 of pSB-M1g, upstream of AgeI site. Designed to Macrogen sequencing primer criteria.
pSB-SeqA 18 TGCAAGAAGCGGATACAG 56 / 60.7°C (2)/52.3 ºC (3) 50 Binds at nt 7729 of pSB-M1g, upstream of Pm promoter and PciI site.

Universal primers

http://www.generi-biotech.com/sequencing-universal-seguencing-primers/ http://www.synthesisgene.com/tools/Universal-Primers.pdf http://www.genewiz.com/public/universalprimers.aspx https://secure.eurogentec.com/product/research-universal-primers.html


Tm calculations: 1: CloneManager 2: Thermo Scientific 3: IDT Oligoanalyzer


A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers http://www.biomedcentral.com/1471-2164/13/341

Software

Chromatogram viewers: http://www.dnaseq.co.uk/chrom_view.html

CodonCode aligner: http://www.codoncode.com/aligner/

BioEdit: http://www.mbio.ncsu.edu/BioEdit/bioedit.html

FinchTV: http://www.geospiza.com/Products/finchtv.shtml

About SCF (sequence chromatogram format) files: http://staden.sourceforge.net/manual/formats_unix_2.html

https://wiki.nci.nih.gov/display/TCGA/Sequence+trace+files

http://code.google.com/p/seqtrace/

http://www.phrap.com/background.htm

http://en.wikipedia.org/wiki/Phrap

http://www.ncbi.nlm.nih.gov/books/NBK47537/

http://www.bio.net/bionet/mm/autoseq/1999-April/001368.html

High-throughput sequencing tools:

SAM tools: http://samtools.sourceforge.net/

Burrows-Wheeler Aligner (BWA): http://bio-bwa.sourceforge.net/

http://seqanswers.com/wiki/BWA

Maq: Mapping and Assembly with Qualities


See also http://en.wikipedia.org/wiki/List_of_sequence_alignment_software

Sequencing quality and standards:

http://www.phrap.com/phred/

http://www.bio.net/bionet/mm/autoseq/1999-April/001366.html

http://en.wikipedia.org/wiki/Phred_quality_score

Sequencing projects

File formats

Sequence Alignment/Map (SAM) format: "A generic format for storing large nucleotide sequence alignments". Tab-delimited text format consisting of a header section (optional) and an alignment section.

http://samtools.sourceforge.net/

http://samtools.sourceforge.net/SAM1.pdf

See also:

http://compbio.soe.ucsc.edu/sam.html

http://www.ncbi.nlm.nih.gov/pubmed/19505943

http://seqanswers.com/wiki/SAM


Binary Compressed Sam format/Binary Alignment Format (BAM): Binary, compressed file format containing the same information as SAM files.

From https://wiki.nci.nih.gov/display/TCGA/Binary+Alignment+Map : "Centers align sequence reads to a reference genome to produce a Sequence Alignment Map (SAM) format file. The SAM file is then converted into a binary form, or Binary-sequence Alignment Format (BAM) file"

See also http://genome.ucsc.edu/goldenPath/help/bam.html

Variant Call Format (VCF):

Standard created by the 1000 Genomes Project.

http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

From http://www.ensembl.org/info/website/upload/large.html :

"The VCF format is a tab delimited format for storing variant calls and and individual genotypes. It is able to store all variant calls from single nucleotide variants to large scale insertions and deletions."

ABI (Applied Biosystems) format:


FASTQ:

FASTQ files encode identified nucleotides together with their corresponding quality scores. The interpretation of the quality scores may vary depending on the source of the sequence, but the most used is the "Sanger format" (Phred quality scores).

http://en.wikipedia.org/wiki/FASTQ_format

http://maq.sourceforge.net/fastq.shtml

http://www.bioperl.org/wiki/FASTQ_sequence_format

http://nar.oxfordjournals.org/content/38/6/1767.full


SRA:

Bibliography

nar.oxfordjournals.org/content/41/1/e1.full?sid=e66b42ac-a309-47cf-8cd1-94e1229a098e#ref-12


http://online.liebertpub.com/doi/full/10.1089/cmb.2011.0201

Comparison of variant-calling software


http://www.nature.com/nmeth/journal/v6/n11s/abs/nmeth.1376.html

http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1935.html


http://assemblathon.org/

http://www.oxfordjournals.org/our_journals/bioinformatics/nextgenerationsequencing.html

GAGE: A critical evaluation of genome assemblies and assembly algorithms: http://genome.cshlp.org/content/22/3/557

Commentary

The state of NGS variant calling - Don't panic: http://blog.goldenhelix.com/?p=1725

http://nxseq.bitesizebio.com/articles/which-way-forward-in-ultra-high-throughput-genomic-sequencing-reference-materials-and-performance-measurements/

Assemblies: The good, the bad and the ugly: http://www.nature.com/nmeth/journal/v8/n1/full/nmeth0111-59.html

http://ngs-expert.com/

http://bcbio.wordpress.com/

A tale of three next generation sequencers: http://www.biomedcentral.com/content/pdf/1471-2164-13-341.pdf

http://core-genomics.blogspot.no/2012_08_01_archive.html

Misc

http://nxseq.bitesizebio.com/

http://nxseq.bitesizebio.com/articles/which-way-forward-in-ultra-high-throughput-genomic-sequencing-reference-materials-and-performance-measurements/

http://nxseq.bitesizebio.com/articles/a-short-history-of-sequencing-part-2-the-first-of-the-next/

http://nxseq.bitesizebio.com/articles/a-short-history-of-sequencing-part-3-personal-ngs-disposable-sequencers-and-what-the-future-holds/

http://genomeinabottle.org/

SEQanswers: http://seqanswers.com/

SEQanswers wiki: http://seqanswers.com/wiki/SEQanswers

SEQansers - how to: http://seqanswers.com/wiki/How-to

Genome Reference Consortium: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/

List of NGS blogs: http://seqanswers.com/forums/showthread.php?t=5024

http://www.homolog.us/blogs/

NGS Necropolis: http://blueseq.com/knowledgebank/ngs-necropolis/

http://gnubio.com/