User:Jarle Pahr/Sequencing

From OpenWetWare

< User:Jarle Pahr(Difference between revisions)
Jump to: navigation, search
(De Bruijn graph)
Current revision (14:53, 27 January 2014) (view source)
 
(141 intermediate revisions not shown.)
Line 5: Line 5:
http://www.ki.se/kiseq/KIGene%20troubleshooting.pdf
http://www.ki.se/kiseq/KIGene%20troubleshooting.pdf
 +
http://nextgenseek.com/2012/12/evolution-of-next-gen-sequencing-development/
Nature focus issue - sequencing technology: http://www.nature.com/nbt/journal/v30/n11/index.html
Nature focus issue - sequencing technology: http://www.nature.com/nbt/journal/v30/n11/index.html
 +
 +
=Companies=
 +
 +
 +
Applied Biosystems:
 +
 +
https://www2.appliedbiosystems.com/about/presskit/pdfs/celebrating_25_years_aln_article.pdf?
 +
 +
 +
 +
Pacific Biosciences: http://bio.pgp.jhu.edu/~jgreene/NextGen/presentations/PacBio_SMRT_Sequencing_Oct_2012.pdf
 +
=Technologies=
=Technologies=
Line 17: Line 30:
==Sanger sequencing (chain termination method)==
==Sanger sequencing (chain termination method)==
 +
 +
 +
 +
http://users.ugent.be/~avierstr/principles/seq.html
 +
 +
http://www.ibt.lt/sc/files/DNASeqCG.pdf
==Pyrosequencing ("454 sequencing")==
==Pyrosequencing ("454 sequencing")==
Line 108: Line 127:
==Nanopore sequencing==
==Nanopore sequencing==
 +
 +
*Two main nanopore types: Biological nanopores (lipid membranes) and solid-state nanopores.
 +
 +
 +
Biological nanopores:
 +
 +
 +
Solid-state nanopores:
 +
*Potentially easier shipping/handling (more robust) and integration with electronics.
 +
*Technology development less advanced than for biological nanopores
 +
 +
Oxford Nanopore: http://www.nanoporetech.com/
Oxford Nanopore: http://www.nanoporetech.com/
 +
 +
 +
http://oldwww.phys.washington.edu/groups/nanopore/
 +
 +
Manrao et al. 2012. reading DNA at single-nucleotide resolution with a
 +
mutant MspA nanopore and phi29 DNA polymerase: http://211.144.68.84:9998/91keshi/Public/File/49/30-4/pdf/nbt.2171.pdf
 +
 +
 +
http://www.nature.com/nnano/journal/vaop/ncurrent/full/nnano.2013.71.html
 +
* Too good to be true? Violoating laws of physics??
 +
 +
 +
 +
http://www.upenn.edu/pennnews/news/penn-research-makes-advance-nanotech-gene-sequencing-technique
 +
 +
Differentiation of Short, Single-Stranded DNA Homopolymers in Solid-State Nanopores: http://pubs.acs.org/doi/abs/10.1021/nn4014388
==Single molecule real time sequencing (Pacific Biosciences)==
==Single molecule real time sequencing (Pacific Biosciences)==
Line 132: Line 179:
http://www.completegenomics.com/services/technology/
http://www.completegenomics.com/services/technology/
 +
 +
=Platforms=
 +
 +
 +
 +
Qiagen GeneReader
 +
 +
 +
Opgen Argus: http://www.opgen.com/products-services/argus-system
 +
 +
==Comparisons and reviews==
 +
 +
http://seqbench.org/
 +
 +
http://link.springer.com/content/pdf/10.1007%2Fs00439-013-1321-4.pdf
 +
 +
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030087
 +
 +
 +
http://www.molecularecologist.com/next-gen-table-3c/
 +
 +
==Illumina HiSeq==
 +
 +
HiSeq 2000
 +
 +
 +
HiSeq 2500
 +
 +
==Ion Torrent==
 +
 +
 +
==MiSeq==
 +
 +
==Ion Proton==
 +
 +
 +
http://www.invitrogen.com/site/us/en/home/Products-and-Services/Applications/Sequencing/Semiconductor-Sequencing/proton.htmlJonthan
 +
 +
==Capillary sequencers==
 +
 +
 +
 +
Applied Biosystems 3730xl : http://www.harlowscientific.com/Sequencers-ABI-3730xl-DNA-Sequencer-Harlow-Scientific
 +
 +
http://www6.appliedbiosystems.com/products/abi3730xlspecs.cfm
 +
 +
List price:  $357,000.00
 +
 +
 +
ABI Prism 3700: Released 1999.
 +
 +
http://www.ebay.com/itm/ABI-APPLIED-BIOSYSTEMS-PRISM-3700-DNA-ANALYZER-/251154269635?pt=LH_DefaultDomain_0&hash=item3a79f605c3
 +
 +
Lowest observed used price: $250
 +
 +
 +
ABI Prism 310:
 +
 +
ABI Prism 377: Released in 1995.
 +
 +
 +
See also http://en.wikipedia.org/wiki/Applied_Biosystems
=Concepts=
=Concepts=
 +
 +
==K-mer==
 +
High-throughput sequence assemblers often use shorter sub-sequences (k-mers, of length k) of produced reads in the assembly process. For example, reads of 100-mers may not be expected to capture all possible 100-mers in the genome.
 +
 +
By breaking reads into shorter k-mers, the resulting k-mers often represent nearly all k-mers from the genome for sufficiently small k, a prerequisite for assembly using de Bruijn graphs. (http://www.nature.com/nbt/journal/v29/n11/full/nbt.2023.html#bx2).
 +
 +
 +
Automated K-mer selection: http://perso.eleves.bretagne.ens-cachan.fr/~chikhi/2013-july-20-hitseq.pdf
==De Bruijn graph==  
==De Bruijn graph==  
Line 140: Line 257:
See also Compeaou et al. 2001, Nature Biotechnology - How to apply de Bruijn graphs to genome assembly: http://www.nature.com/nbt/journal/v29/n11/full/nbt.2023.html
See also Compeaou et al. 2001, Nature Biotechnology - How to apply de Bruijn graphs to genome assembly: http://www.nature.com/nbt/journal/v29/n11/full/nbt.2023.html
 +
 +
*Finding a hamiltonian cycle that visits all nodes of a graph is computationally expensive (NP-complete).
 +
*Easier to find a cycle that visits all ''edges'' of a graph (Eulerian cycle).
 +
*Ergo: Instead of assigning a k-mer to a node, we can assign a k-mer to an edge, allowing construction of a De Bruijn graph (http://www.nature.com/nbt/journal/v29/n11/full/nbt.2023.html#bx2).
 +
 +
http://homolog.us/Tutorials/index.php?p=2.1&s=1
 +
 +
http://www.pnas.org/content/early/2012/07/25/1121464109.abstract
 +
 +
 +
http://alexbowe.com/succinct-debruijn-graphs/
==Bridge amplification==
==Bridge amplification==
Line 145: Line 273:
==RNA-Seq==
==RNA-Seq==
 +
 +
http://blog.sbgenomics.com/history-of-rna-seq/
 +
 +
http://www.ncbi.nlm.nih.gov/pubmed/23716638?dopt=Abstract
 +
 +
http://en.wikipedia.org/wiki/RNA-Seq
 +
 +
http://seqanswers.com/forums/showpost.php?p=102911&postcount=60
 +
 +
 +
SeqAnswers - posts tagged RNA seq:
 +
http://seqanswers.com/forums/tags.php?tag=rna-seq
 +
 +
 +
http://nextgenseek.com/2013/05/large-scale-genetics-of-human-gene-expression-studies-turn-to-next-gen-sequencing/
 +
 +
 +
http://genome.cshlp.org/content/early/2011/09/07/gr.124321.111
 +
 +
 +
http://rna-seqsummit.com/
 +
 +
http://www.illumina.com/technology/mrna_seq.ilmn
 +
 +
 +
RNA-Seq: a revolutionary tool for transcriptomics.: http://www.ncbi.nlm.nih.gov/pubmed/19015660
 +
 +
 +
 +
Direct RNA Sequencing:
 +
 +
 +
 +
Software:
 +
 +
Velvet: http://en.wikipedia.org/wiki/Velvet_%28algorithm%29
 +
 +
Tophat: http://tophat.cbcb.umd.edu/
 +
 +
Cufflinks: http://cufflinks.cbcb.umd.edu/
 +
 +
(See also Tuxedo suite)
==Genotyping by Sequencing (GBS)==
==Genotyping by Sequencing (GBS)==
Line 239: Line 409:
European Nucleotide Archive: http://www.ebi.ac.uk/ena/
European Nucleotide Archive: http://www.ebi.ac.uk/ena/
-
=Sequence alignment/Assembly=
+
=Assembly and mapping=
 +
 
 +
Alignment to multiple reference sequences: http://bioinformatics.oxfordjournals.org/content/29/13/i361.full
 +
 
 +
http://www.nature.com/nmeth/journal/v10/n6/full/nmeth.2474.html
 +
 
 +
http://denovoassembler.sourceforge.net/
 +
 
 +
https://github.com/sebhtml/ray
 +
 
 +
http://dskernel.blogspot.no/2013/06/open-access-doctoral-theses-on-de-novo.html
Compendium of HTS mappers: http://wwwdev.ebi.ac.uk/fg/hts_mappers/
Compendium of HTS mappers: http://wwwdev.ebi.ac.uk/fg/hts_mappers/
Comparison of assemblers: http://lh3lh3.users.sourceforge.net/alnROC.shtml
Comparison of assemblers: http://lh3lh3.users.sourceforge.net/alnROC.shtml
 +
 +
SeqAnswers:Software packages for next gen sequence analysis: http://seqanswers.com/forums/showthread.php?t=43 (Thread closed since 2009)
 +
 +
 +
A5: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0042304
'''BWA:''' http://bio-bwa.sourceforge.net/
'''BWA:''' http://bio-bwa.sourceforge.net/
 +
 +
BWA-MEM: http://arxiv.org/abs/1303.3997
'''Bowtie - An ultrafast memory-efficient short read aligner:'''' http://bowtie-bio.sourceforge.net/index.shtml
'''Bowtie - An ultrafast memory-efficient short read aligner:'''' http://bowtie-bio.sourceforge.net/index.shtml
 +
 +
Bowtie 2: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
http://www.ncbi.nlm.nih.gov/pubmed/20211242
http://www.ncbi.nlm.nih.gov/pubmed/20211242
Line 272: Line 461:
Counter-intuitevely, too high coverage can be problematic: http://seqanswers.com/forums/showthread.php?t=24965
Counter-intuitevely, too high coverage can be problematic: http://seqanswers.com/forums/showthread.php?t=24965
 +
 +
 +
https://github.com/lexnederbragt/denovo-assembly-tutorial/tree/master/scripts
=Sequencing services=
=Sequencing services=
Line 284: Line 476:
|| http://dna.macrogen.com/eng/support/seq/seq_submission.jsp
|| http://dna.macrogen.com/eng/support/seq/seq_submission.jsp
|}
|}
 +
 +
 +
=Variant calling=
=Sequencing-based techniques=
=Sequencing-based techniques=
-
ChIP-sequencing
+
==ChIP-sequencing==
 +
 
 +
 
 +
==RNA-seq==
 +
 
 +
 
 +
==Single-cell sequencing==
 +
 
 +
http://trap.it/#!traps/id/f294f009-bb0f-4f14-9e59-e84cf36d2560/jump/6GoIaq61z002q6RDiaaY
=Sequencing/genomics centres=
=Sequencing/genomics centres=
 +
 +
http://openwetware.org/wiki/BioMicroCenter:Sequencing
BGI: http://www.genomics.cn/
BGI: http://www.genomics.cn/
Line 297: Line 502:
JGI: http://www.jgi.doe.gov/
JGI: http://www.jgi.doe.gov/
 +
 +
The Genome Analysis Centre (UK): http://jobs.tgac.ac.uk/
 +
 +
 +
Norwegian Cancer Genomics Consortium: http://www.cancergenomics.no/
 +
 +
http://medicalgenomics.no/
See also: http://omicsmaps.com/
See also: http://omicsmaps.com/
 +
 +
 +
 +
 +
Sequencing facilities in Norway:
 +
(Incomplete)
 +
 +
 +
Oslo:
 +
 +
Akershus University Hospital (Ahus): 1 x Ion Torrent
 +
 +
Norwegian High-Throughput Sequencing Centre (NSC) Oslo, Norway:
 +
2 x Roche/454, 1 x Illumina HiSeq, 1 x PacBio, 1 x Ion Torrent, 1 x Illumina MiSeq
 +
 +
Helse Sør-Øst/University of Oslo Genomics Core Facility Oslo, Norway:
 +
1 x Illumina GA2, 1 x MiSeq, 1 x HiSeq
 +
 +
NTNU Genomics Core Facility Sør-Trøndelag, Norway:
 +
1 x HiSeq
 +
 +
Telemark Hospital Telemark, Norway:
 +
1 x Illumina HiSeq
 +
 +
 +
Bergen:
 +
 +
 +
 +
Trondheim:
 +
 +
 +
UNN:
 +
 +
http://www.unn.no/dna-sequencing/category11734.html
 +
 +
'''Contact persons:'''
 +
 +
Lex Nederbragt: http://contig.wordpress.com/about/
 +
 +
Dr. Leonardo A. Meza-Zepeda
 +
Head Helse Sør-Øst/ Univ. of Oslo Genomics Core Facility
 +
 +
 +
Kjetill S. Jakobsen
 +
 +
Professor, Group Leader (CEES node)
 +
 +
 +
Dag Erik Undlien
 +
 +
Professor, Group Leader (IMG node)
 +
 +
 +
Other groups which employ HTS:
 +
 +
CIGENE, UMB. See https://sites.google.com/site/seqomics
=Primers=
=Primers=
Line 335: Line 604:
=Software=
=Software=
 +
 +
Consed graphical editor: http://bioinformatics.oxfordjournals.org/content/early/2013/08/31/bioinformatics.btt515.abstract
 +
 +
HTseq. Sequencing analysis with Python: http://seqanswers.com/forums/showthread.php?t=4805
 +
 +
Bamformatics: http://sourceforge.net/projects/bamformatics/?source=directory
 +
 +
http://www.digitalbiologist.com/2013/06/python-next-gen-sequencing.html
 +
 +
DISCOVAR: http://www.broadinstitute.org/software/discovar/blog/
 +
 +
ALLPATHS-LG: http://www.broadinstitute.org/software/allpaths-lg/blog/
 +
 +
Ray Cloud demo: http://browser.cloud.boisvert.info/client/?map=0&section=3&region=41&location=0
 +
 +
http://debian-med.alioth.debian.org/tasks/bio-ngs
 +
 +
Isaac / Illumina Open source software: https://github.com/sequencing
 +
 +
https://www.broad.harvard.edu/crd/wiki/index.php/Main_Page
Chromatogram viewers: http://www.dnaseq.co.uk/chrom_view.html
Chromatogram viewers: http://www.dnaseq.co.uk/chrom_view.html
Line 341: Line 630:
BioEdit: http://www.mbio.ncsu.edu/BioEdit/bioedit.html
BioEdit: http://www.mbio.ncsu.edu/BioEdit/bioedit.html
 +
 +
VCF view: http://www.easih.ac.uk/software.php
FinchTV: http://www.geospiza.com/Products/finchtv.shtml
FinchTV: http://www.geospiza.com/Products/finchtv.shtml
Line 370: Line 661:
See also http://en.wikipedia.org/wiki/List_of_sequence_alignment_software
See also http://en.wikipedia.org/wiki/List_of_sequence_alignment_software
 +
 +
 +
The Genome Analysis Center - software: https://github.com/TGAC
 +
 +
Genome Analysis Toolkit (GATK): http://www.broadinstitute.org/gatk/
'''Sequencing quality and standards:'''
'''Sequencing quality and standards:'''
Line 382: Line 678:
 +
http://www.microbe.net/undergraduate-research-built-environment-genomes/
=File formats=
=File formats=
 +
 +
FASTG: http://fastg.sourceforge.net/
'''Sequence Alignment/Map (SAM) format:'''
'''Sequence Alignment/Map (SAM) format:'''
Line 432: Line 731:
http://www.bioperl.org/wiki/FASTQ_sequence_format
http://www.bioperl.org/wiki/FASTQ_sequence_format
-
http://nar.oxfordjournals.org/content/38/6/1767.full
+
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants: http://nar.oxfordjournals.org/content/38/6/1767.full
Line 438: Line 737:
=Bibliography=
=Bibliography=
 +
 +
What is next generation sequencing? http://ep.bmj.com/content/early/2013/08/28/archdischild-2013-304340.long
 +
 +
SeqAnswers Literature watch: http://seqanswers.com/forums/forumdisplay.php?f=10
 +
nar.oxfordjournals.org/content/41/1/e1.full?sid=e66b42ac-a309-47cf-8cd1-94e1229a098e#ref-12
nar.oxfordjournals.org/content/41/1/e1.full?sid=e66b42ac-a309-47cf-8cd1-94e1229a098e#ref-12
 +
Assembly of large genomes using second-generation sequencing.: http://www.ncbi.nlm.nih.gov/pubmed/20508146?dopt=Abstract&holding=npg
http://online.liebertpub.com/doi/full/10.1089/cmb.2011.0201
http://online.liebertpub.com/doi/full/10.1089/cmb.2011.0201
Line 450: Line 755:
http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1935.html
http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1935.html
 +
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data: http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2474.html
http://assemblathon.org/
http://assemblathon.org/
Line 456: Line 762:
GAGE: A critical evaluation of genome assemblies and assembly algorithms: http://genome.cshlp.org/content/22/3/557
GAGE: A critical evaluation of genome assemblies and assembly algorithms: http://genome.cshlp.org/content/22/3/557
 +
 +
 +
==2011==
 +
 +
Miller 2011 - Assembly Algorithms for Next-Generation Sequencing Data: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874646/
 +
 +
 +
==2012==
 +
 +
An Integrated Pipeline for de Novo Assembly of Microbial Genomes : http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0042304
 +
 +
==2013==
 +
 +
Genome sequencing and next-generation sequence data analysis: A comprehensive compilation of bioinformatics tools and databases: http://www.scirp.org/journal/PaperInformation.aspx?PaperID=30744
 +
 +
 +
Harnessing Virtual Machines to simplify next generation DNA sequencing analysis: http://bioinformatics.oxfordjournals.org/content/early/2013/06/20/bioinformatics.btt352.abstract
 +
 +
High-throughput sequencing for biology and medicine: http://www.nature.com/msb/journal/v9/n1/full/msb201261.html
 +
 +
 +
DNA sequencing using electrical conductance measurements of a DNA polymerase: http://www.nature.com/nnano/journal/vaop/ncurrent/full/nnano.2013.71.html
 +
 +
 +
Li et al.: Memory Efficient Minimum Substring Partitioning: http://www.vldb.org/pvldb/vol6/p169-li.pdf
 +
 +
 +
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.: http://www.ncbi.nlm.nih.gov/pubmed/23644548
 +
 +
 +
Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly : http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062856
=Commentary=
=Commentary=
Line 472: Line 809:
http://core-genomics.blogspot.no/2012_08_01_archive.html
http://core-genomics.blogspot.no/2012_08_01_archive.html
 +
 +
http://ivory.idyll.org/blog/thoughts-on-assemblathon-2.html
 +
 +
http://flxlexblog.wordpress.com/2013/02/26/on-assembly-uncertainty-inspired-by-the-assemblathon2-debate/
 +
 +
http://www.yuzuki.org/
 +
 +
http://tidsskriftet.no/article/2928729
 +
 +
Nextgenseek: http://nextgenseek.com/
 +
 +
=Slides=
 +
 +
=Courses=
 +
 +
http://www.forbio.uio.no
 +
 +
http://www.forbio.uio.no/events/courses/2013/radseq.html
 +
 +
http://genomics.no/oslo/index.php?page=courses
 +
 +
http://ged.msu.edu/angus/bioinformatics-courses.html
 +
 +
=Procedures and troubleshooting=
 +
 +
 +
Results/success rates:
 +
 +
 +
{| class="wikitable" border="1" cellpadding="5" cellspacing="0"
 +
|-
 +
! Sample!! DNA !! Primer !! Result
 +
|-
 +
| Barcode|| - || - ||
 +
|}
 +
 +
 +
'''Sanger sequencing:'''
 +
 +
http://seqcore.brcf.med.umich.edu/doc/dnaseq/trouble/badseq.html
 +
 +
 +
Depending on economy and available sample amounts, consider sequencing each sample twice to more easily discern between possible sequencing errors and actual mutations.
 +
 +
Suggested work procedure when receiving sanger sequencing results (plasmids, etc.):
 +
 +
*Firstly, open the chromatogram file to asses the read length and overall quality.
 +
*If applicable, compare the automatically trimmed sequenc (.fas) file and the expected sequence using BLAST or another sequence alignment tool. OR: consider using raw sequence copied from a chromatogram viewer.
 +
*If no hit is found, make sure that the most permissive algorithm (blastn or similar) is used. If still no hit is found, manually inspect the chromatogram (.abi) file using a chromatogram viewer. If the trimmed file is small compared to the raw sequence (low chromatogram quality) and the remainder appears sensible, re-do the search using "raw" called bases (copied directly from the chromatogram viewer). '''When making notes on sequence results, always write which sequence (PHRED-generated, "raw" sequence from chromatogram viewer?) which was used for a given analysis (f. ex. BLAST search).''' Otherwise, confusion may ensue: Note says 100 % match, BLAST search gives no/bad match, etc....
 +
*As a quick check, the sequence file can be searched for a short portion of the expected sequence, while allowing for some mistmatches (which may be present because of sequencing errors).
 +
*If disrepancies occur, inspect the chromatogram at the relevant positions.
 +
*If a hit is found for the desired sequence, '''check that the sequence is in the right position, and that the flanking sequences are correct'''.
 +
*Be aware that alignment may produce suboptimal results (indicating a worse fit than is actually the case), especially when aligning to circular sequences.
 +
*If the chromatogram yields no sequence, note/report this as "no usable data".
 +
 +
Three main "concerns" may appears:
 +
*Base differs from expected.
 +
*Base is uncalled ("n")
 +
*Indel/Gap
 +
 +
In all cases inspecting the chromatogram may resolve the issue. '''Automatically generated sequences should be considered a best guess by the computer'''.
 +
 +
 +
'''Chromatogram interpretation:'''
 +
 +
http://peter.unmack.net/molecular/data/chromatogram.editing.html
 +
 +
http://www.sci.sdsu.edu/dnacore/InterpretData.html
 +
 +
http://cancer-seqbase.uchicago.edu/traces.html
 +
 +
http://seqcore.brcf.med.umich.edu/doc/dnaseq/interpret.html
 +
 +
http://www.roswellpark.edu/shared-resources/biomolecular-resource-facilities/dna-sequencing/interpreting-chromatograms
 +
 +
 +
'''Common causes of bad data from sanger sequencing:'''
 +
*Salt/alcohol/other contamination
 +
*GC rich of palindromic regions.
 +
*Double priming
 +
*Supression of signal after a strong signal: Happens most commonly for G's after A's, and often for G's after C's. Most often, weak G signals follow after multiple A's.
 +
 +
'''Common causes of mis-called bases:'''
 +
 +
*Unevenly spaced peaks in the chromatogram may lead the program to insert a non-existing, ambigious base ("n"). Some sequencing machines (http://seqcore.brcf.med.umich.edu/doc/dnaseq/interpret.html) have been known to give excess spacing between the peaks in "GA".
 +
*In the beginning portion of the sequence (~first 50 bases), two bases are often called as one (http://peter.unmack.net/molecular/data/chromatogram.editing.html).
 +
 +
'''Template preparation:'''
 +
 +
http://www.sci.sdsu.edu/dnacore/tempprep.html
=Misc=
=Misc=
-
http://nxseq.bitesizebio.com/
+
Blueseq online sequencing guide: http://www.blueseq.com/
 +
 
 +
http://lycofs01.lycoming.edu/~gcat-seek/
 +
 
 +
Bitesize bio NGS channel: http://nxseq.bitesizebio.com/
http://nxseq.bitesizebio.com/articles/which-way-forward-in-ultra-high-throughput-genomic-sequencing-reference-materials-and-performance-measurements/
http://nxseq.bitesizebio.com/articles/which-way-forward-in-ultra-high-throughput-genomic-sequencing-reference-materials-and-performance-measurements/
Line 483: Line 914:
http://nxseq.bitesizebio.com/articles/a-short-history-of-sequencing-part-3-personal-ngs-disposable-sequencers-and-what-the-future-holds/
http://nxseq.bitesizebio.com/articles/a-short-history-of-sequencing-part-3-personal-ngs-disposable-sequencers-and-what-the-future-holds/
-
http://genomeinabottle.org/
+
Genome in a bottle consortium: http://genomeinabottle.org/
SEQanswers: http://seqanswers.com/
SEQanswers: http://seqanswers.com/
Line 500: Line 931:
http://gnubio.com/
http://gnubio.com/
 +
 +
Rob Carlson's blog: http://synthesis.cc/
 +
 +
http://titojankowski.com/the-500000-dna-sequencer-tear-down/
 +
 +
=Raw data=
 +
 +
 +
=Guides  and instructional material=
 +
 +
PRACTICAL:
 +
Genome sequencing of Bacteroides isolates. http://www.nematodes.org/teaching/gg3/index.shtml
 +
 +
ANGUS: http://ged.msu.edu/angus/
 +
 +
See also http://ivory.idyll.org/blog/ngs-course-with-aws.html
 +
 +
NSU NGS Analysis workshop 2012: http://ged.msu.edu/angus/tutorials-2012/index.html
 +
 +
http://ged.msu.edu/angus/tutorials-2012/files/lecture3-mapping.pptx.pdf
 +
 +
MSU NGS analysis workshop 2013: http://ged.msu.edu/angus/tutorials-2013/index.html
 +
 +
Homolog.us tutorials: http://www.homolog.us/Tutorials/index.php?p=1.1&s=1
 +
 +
http://ged.msu.edu/angus/tutorials-2013/files/rayan-2013-june-18-msu.pdf
 +
 +
 +
NGS WikiBook: http://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29
 +
 +
 +
 +
GCAT SEEK: http://lycofs01.lycoming.edu/~gcat-seek/index.html
 +
 +
 +
Mason lab NGS workshop: http://chagall.med.cornell.edu/NGScourse/
 +
 +
=Results=
 +
 +
 +
=Quality control, error sources and error detection=
 +
 +
http://pathogenomics.bham.ac.uk/blog/2013/01/sequencing-data-i-want-the-truth-you-cant-handle-the-truth/
 +
 +
http://www.citeulike.org/user/cisevol/tag/sequencing_error
 +
 +
Churchill & Waterman 1991. The Accuracy of DNA Sequences:
 +
Estimating Sequence Quality: http://www.cmb.usc.edu/papers/msw_papers/msw-107.pdf
 +
 +
Discovery and characterization of artifactual
 +
mutations in deep coverage targeted capture
 +
sequencing data due to oxidative DNA damage
 +
during sample preparation: http://nar.oxfordjournals.org/content/early/2013/01/08/nar.gks1443.full.pdf?keytype=ref&ijkey=suYBLqdsrc7kH7G
 +
 +
10 rules of thumb in genomics: http://genomeinformatician.blogspot.co.uk/2011/07/10-rules-of-thumb-in-genomics.html
 +
 +
 +
Bioplanet GCAT: http://www.bioplanet.com/gcat
 +
 +
 +
QUAST: http://bioinf.spbau.ru/quast
 +
 +
HtSeq-Qa: http://www-huber.embl.de/users/anders/HTSeq/doc/qa.html
 +
 +
=Economy and costs=
 +
 +
http://www.genome.gov/sequencingcosts/
 +
 +
=Read simulation=
 +
 +
http://sourceforge.net/projects/readsim/?source=directory
 +
 +
 +
=Links=
 +
 +
https://sites.google.com/site/seqomics/

Current revision

http://nucleicacids.bitesizebio.com/articles/how-to-get-great-dna-sequencing-results/

http://barricklab.org/twiki/bin/view/Lab/ProceduresPrimerDesign

http://www.ki.se/kiseq/KIGene%20troubleshooting.pdf

http://nextgenseek.com/2012/12/evolution-of-next-gen-sequencing-development/

Nature focus issue - sequencing technology: http://www.nature.com/nbt/journal/v30/n11/index.html

Contents

Companies

Applied Biosystems:

https://www2.appliedbiosystems.com/about/presskit/pdfs/celebrating_25_years_aln_article.pdf?


Pacific Biosciences: http://bio.pgp.jhu.edu/~jgreene/NextGen/presentations/PacBio_SMRT_Sequencing_Oct_2012.pdf

Technologies

For a comparison of next-generation sequencing methods, see http://en.wikipedia.org/wiki/Dna_sequencing#Next-generation_methods

See also:

SeqAnswers.com Tech summaries: http://seqanswers.com/index.php?pageid=summaries


Sanger sequencing (chain termination method)

http://users.ugent.be/~avierstr/principles/seq.html

http://www.ibt.lt/sc/files/DNASeqCG.pdf

Pyrosequencing ("454 sequencing")

Pyrosequencing is a "sequence by synthesis" method developed by Mostafa Ronaghi and Pål Nyrén at the Royal Institute of Technology, Stockholm. Sequences are determined by observation of light emission upon addition of a nucleotide complementary to the first unpaired nucleotide of the template.

Quote from Wikipedia:Pyrosequencing:

"ssDNA template is hybridized to a sequencing primer and incubated with the enzymes DNA polymerase, ATP sulfurylase, luciferase and apyrase, and with the substrates adenosine 5´ phosphosulfate (APS) and luciferin."

Sequencing proceeds as follows:

  • Addition of one of the four dNTPs (dATPαS is substituted for ATP, as the former is not a substrate for luciferase). If the dNTP is complementary, DNA polyerase incorporates the nucleotide, releasing pyrophosphate (PPi).
  • ATP sulfurylase catalyzes reaction of PPi and adenosine 5' phosphosulfate to create ATP
  • ATP fuels luciferase-catalyzed conversion of luciferin to oxyluceferin, generating visible light.
  • Unincorporated nucleotides and ATP are degraded by apyrase.

454 sequencing performs massively parallel pyrosequencing. Library DNA containing adapter sequences are adsorbed to DNA-capturing beads. The DNA bound to each bead is then amplified by emulsion-PCR, in which the beads with bound DNA are mixed with PCR reagents and emulsion oil to create a water-in-oil emulsion containing many "microreactors" consisting of beads sorrounded by water. Following PCR amplification, the DNA-binding beads are isolated and deposited into the wells of a microtiter plate. Beads with pyrosequencing enzymes are then added to the plate. Finally, the pyrosequencing is performed, processing the plate in a sequencing machine. 400 000+ DNA fragments/beads can be processed per plate.

Using "multiplex identifiers", different genomic libraries can be bar-coded, facilitating sequencing of several libraries in the same sequencing run.

Platforms:

Platform Throughput (bases/run) Time per run Average (a)/mode (m) read length (nt) AccuracyIntroduced (year)
GS FLX+ 700 Mbp 23h Up to 1000 700 bp (m)
GS Junior 35Mbp 12 h 400 400 bp (a) at Phred20/read



GS FLX:

References:

Introductory paper, 454 sequencing: http://www.ncbi.nlm.nih.gov/pubmed/16056220?dopt=Abstract&holding=npg

http://www.wellcome.ac.uk/Education-resources/Education-and-learning/animations/dna/wtx056046.htm

The development and impact of 454 sequencing

Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample

Overview of 454 sequencing: http://classes.soe.ucsc.edu/bme215/Spring09/PPT/BME%20215-5.pdf

Illumina (Solexa) sequencing

http://www.illumina.com/technology/sequencing_technology.ilmn


Platform Throughput (bases/run) (maximum) Time per run Read length (nt) Accuracy Features Introduced (year)
MiSeq Personal Sequencer Up to 8.5 gbp 4 - 48 h 250 >70% bases higher than Q30 at read length 2 x 300 bp
HiSeq 2500/1500600 Gb 2 x 100 >80 % higher than Q30
HiSeq 2000/1000300 Gb 2 x 100 >80 % higher than Q30
Genome Analyzer IIx95 Gb 2 x 150 >80 % higher than Q30

MiSeq datasheet: http://www.illumina.com/documents/products/datasheets/datasheet_miseq.pdf


Side by side comparison of Illumina sequencers: http://www.illumina.com/systems/sequencing.ilmn

Illumina - an introduction to NGS: http://www.illumina.com/Documents/products/Illumina_Sequencing_Introduction.pdf

Ion semiconductor sequencing

Ion Torrent: http://www.invitrogen.com/site/us/en/home/brands/Ion-Torrent.html?cid=fl-iontorrent Platforms:

Platform Throughput (bases/run) Time per run Typical read length Accuracy Introduced (year)
Ion PGM sequencer 10 Mb to 1Gb 90 min+ 35-400 bp
Ion Proton sequencer 1 human genome2h+ 100 bp


http://www3.appliedbiosystems.com/cms/groups/applied_markets_marketing/documents/generaldocuments/cms_096460.pdf

Nanopore sequencing

  • Two main nanopore types: Biological nanopores (lipid membranes) and solid-state nanopores.


Biological nanopores:


Solid-state nanopores:

  • Potentially easier shipping/handling (more robust) and integration with electronics.
  • Technology development less advanced than for biological nanopores


Oxford Nanopore: http://www.nanoporetech.com/


http://oldwww.phys.washington.edu/groups/nanopore/

Manrao et al. 2012. reading DNA at single-nucleotide resolution with a mutant MspA nanopore and phi29 DNA polymerase: http://211.144.68.84:9998/91keshi/Public/File/49/30-4/pdf/nbt.2171.pdf


http://www.nature.com/nnano/journal/vaop/ncurrent/full/nnano.2013.71.html

  • Too good to be true? Violoating laws of physics??


http://www.upenn.edu/pennnews/news/penn-research-makes-advance-nanotech-gene-sequencing-technique

Differentiation of Short, Single-Stranded DNA Homopolymers in Solid-State Nanopores: http://pubs.acs.org/doi/abs/10.1021/nn4014388

Single molecule real time sequencing (Pacific Biosciences)

Microscopical wells on a chip (zero-mode waveguides) each contain a single DNA polymerase enzyme bound to the bottom of the well, which accept a single DNA molecule as template. Fluorescent labelled dNTPs are used for DNA synthesis. Upon incorporation of a dNTP, the fluorescence tag is cleaved from the nucleotide and diffuses from the observation area within the ZMW. The sequence is determined optically by observing incorporation events.

http://www.pacificbiosciences.com/

Platforms:

PacBio RS:

http://www.pacificbiosciences.com/products/

http://www.pacificbiosciences.com/brochure

http://www.pacificbiosciences.com/pdf/Software_and_Analysis_Brochure.pdf

SOLiD sequencing (Applied Biosystems)

DNA nanoball sequencing

http://www.completegenomics.com/services/technology/

Platforms

Qiagen GeneReader


Opgen Argus: http://www.opgen.com/products-services/argus-system

Comparisons and reviews

http://seqbench.org/

http://link.springer.com/content/pdf/10.1007%2Fs00439-013-1321-4.pdf

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030087


http://www.molecularecologist.com/next-gen-table-3c/

Illumina HiSeq

HiSeq 2000


HiSeq 2500

Ion Torrent

MiSeq

Ion Proton

http://www.invitrogen.com/site/us/en/home/Products-and-Services/Applications/Sequencing/Semiconductor-Sequencing/proton.htmlJonthan

Capillary sequencers

Applied Biosystems 3730xl : http://www.harlowscientific.com/Sequencers-ABI-3730xl-DNA-Sequencer-Harlow-Scientific

http://www6.appliedbiosystems.com/products/abi3730xlspecs.cfm

List price: $357,000.00


ABI Prism 3700: Released 1999.

http://www.ebay.com/itm/ABI-APPLIED-BIOSYSTEMS-PRISM-3700-DNA-ANALYZER-/251154269635?pt=LH_DefaultDomain_0&hash=item3a79f605c3

Lowest observed used price: $250


ABI Prism 310:

ABI Prism 377: Released in 1995.


See also http://en.wikipedia.org/wiki/Applied_Biosystems

Concepts

K-mer

High-throughput sequence assemblers often use shorter sub-sequences (k-mers, of length k) of produced reads in the assembly process. For example, reads of 100-mers may not be expected to capture all possible 100-mers in the genome.

By breaking reads into shorter k-mers, the resulting k-mers often represent nearly all k-mers from the genome for sufficiently small k, a prerequisite for assembly using de Bruijn graphs. (http://www.nature.com/nbt/journal/v29/n11/full/nbt.2023.html#bx2).


Automated K-mer selection: http://perso.eleves.bretagne.ens-cachan.fr/~chikhi/2013-july-20-hitseq.pdf

De Bruijn graph

http://en.wikipedia.org/wiki/De_Bruijn_graph

See also Compeaou et al. 2001, Nature Biotechnology - How to apply de Bruijn graphs to genome assembly: http://www.nature.com/nbt/journal/v29/n11/full/nbt.2023.html

  • Finding a hamiltonian cycle that visits all nodes of a graph is computationally expensive (NP-complete).
  • Easier to find a cycle that visits all edges of a graph (Eulerian cycle).
  • Ergo: Instead of assigning a k-mer to a node, we can assign a k-mer to an edge, allowing construction of a De Bruijn graph (http://www.nature.com/nbt/journal/v29/n11/full/nbt.2023.html#bx2).

http://homolog.us/Tutorials/index.php?p=2.1&s=1

http://www.pnas.org/content/early/2012/07/25/1121464109.abstract


http://alexbowe.com/succinct-debruijn-graphs/

Bridge amplification

http://seq.molbiol.ru/sch_clon_ampl.html

RNA-Seq

http://blog.sbgenomics.com/history-of-rna-seq/

http://www.ncbi.nlm.nih.gov/pubmed/23716638?dopt=Abstract

http://en.wikipedia.org/wiki/RNA-Seq

http://seqanswers.com/forums/showpost.php?p=102911&postcount=60


SeqAnswers - posts tagged RNA seq: http://seqanswers.com/forums/tags.php?tag=rna-seq


http://nextgenseek.com/2013/05/large-scale-genetics-of-human-gene-expression-studies-turn-to-next-gen-sequencing/


http://genome.cshlp.org/content/early/2011/09/07/gr.124321.111


http://rna-seqsummit.com/

http://www.illumina.com/technology/mrna_seq.ilmn


RNA-Seq: a revolutionary tool for transcriptomics.: http://www.ncbi.nlm.nih.gov/pubmed/19015660


Direct RNA Sequencing:


Software:

Velvet: http://en.wikipedia.org/wiki/Velvet_%28algorithm%29

Tophat: http://tophat.cbcb.umd.edu/

Cufflinks: http://cufflinks.cbcb.umd.edu/

(See also Tuxedo suite)

Genotyping by Sequencing (GBS)

http://www.maizegenetics.net/gbs-overview

ROC

See http://en.wikipedia.org/wiki/Receiver_operating_characteristic

Edit distance

See http://en.wikipedia.org/wiki/Levenshtein_distance

Color Space/2-base encoding

See

http://finchtalk.geospiza.com/2008/03/color-space-flow-space-sequence-space.html

http://www.biostars.org/p/43855/

http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/CSHL_Fu.pdf

See also

http://en.wikipedia.org/wiki/2_Base_Encoding

Targeted sequencing

Targeted "capturing kits" may be used to sequence a subset of genomic DNA. The human exome (as defined by the Consensus CDS (CCDS) project) totals about 38 Mb, covering about 1.22 % of the human genome

(The SureSelect Human All Exon Kit )

See also: http://massgenomics.org/2011/10/major-exome-platforms-compared.html

Scaffolding

http://genome.jgi-psf.org/help/scaffolds.html

http://seqanswers.com/wiki/How-to/scaffolding

http://bioinformatics.oxfordjournals.org/content/early/2012/04/05/bioinformatics.bts175

http://www.scfbm.org/content/7/1/4

http://www.cbcb.umd.edu/research/assembly_primer.shtml

Paired-end reads

N50 Statistic

N50 length: In a collection of contigs, the longest length for which the subset of contigs consisting of all contigs with that length or longer contains at least half of the total of the length of the contig collection.

NG50: As N50, except that the goal is half of the total of the genome size.

http://en.wikipedia.org/wiki/N50_statistic

http://seqanswers.com/forums/showthread.php?p=41420

Haplotypes

See also:

http://hapmap.ncbi.nlm.nih.gov/originhaplotype.html.en

http://en.wikipedia.org/wiki/Haplotype

Haploview

http://en.wikipedia.org/wiki/Haplogroup

Loss of Heterozygosity

http://en.wikipedia.org/wiki/Loss_of_heterozygosity

Copy number variants (CNVs)

Short Tandem Repeats (STRs)

Genotyping of STRs is used to produce forensic DNA profiles. See http://massgenomics.org/2013/01/identifying-samples-genomic-data.html

http://www.biology.arizona.edu/human_bio/activities/blackett2/str_codis.html

http://www.cstl.nist.gov/strbase/fbicore.htm

Databases

http://www.ncbi.nlm.nih.gov/gap

Sequence Read Archive: http://www.ncbi.nlm.nih.gov/sra

European Nucleotide Archive: http://www.ebi.ac.uk/ena/

Assembly and mapping

Alignment to multiple reference sequences: http://bioinformatics.oxfordjournals.org/content/29/13/i361.full

http://www.nature.com/nmeth/journal/v10/n6/full/nmeth.2474.html

http://denovoassembler.sourceforge.net/

https://github.com/sebhtml/ray

http://dskernel.blogspot.no/2013/06/open-access-doctoral-theses-on-de-novo.html

Compendium of HTS mappers: http://wwwdev.ebi.ac.uk/fg/hts_mappers/

Comparison of assemblers: http://lh3lh3.users.sourceforge.net/alnROC.shtml

SeqAnswers:Software packages for next gen sequence analysis: http://seqanswers.com/forums/showthread.php?t=43 (Thread closed since 2009)


A5: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0042304

BWA: http://bio-bwa.sourceforge.net/

BWA-MEM: http://arxiv.org/abs/1303.3997

Bowtie - An ultrafast memory-efficient short read aligner:' http://bowtie-bio.sourceforge.net/index.shtml

Bowtie 2: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

http://www.ncbi.nlm.nih.gov/pubmed/20211242

Primers and reviews:


http://www.broadinstitute.org/files/shared/mpg/nextgen2010/nextgen_li.pdf

NCBI primer on genome assembly methods: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/assembly.shtml

Nature Biotechnology Primer - How to map billions of short reads onto genomes: http://www.nature.com/nbt/journal/v27/n5/full/nbt0509-455.html

Bioinformatics, 2012: Tools for mapping high-throughput sequencing data: http://bioinformatics.oxfordjournals.org/content/28/24/3169

A survey of sequence alignment algorithms for next-generation sequencing: http://bib.oxfordjournals.org/content/11/5/473.full

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0019175


De novo assembly:

Optimal Assembly for High Throughput Shotgun Sequencing: http://arxiv.org/abs/1301.0068

Counter-intuitevely, too high coverage can be problematic: http://seqanswers.com/forums/showthread.php?t=24965


https://github.com/lexnederbragt/denovo-assembly-tutorial/tree/master/scripts

Sequencing services

Service Sample specification Primer specification Ship to Link
GATC LightRun Add 5 uL DNA (80-100 ng/uL plasmid or 20-80 ng/uL purified PCR product) + 5 uL 5uM (5 pmol/uL) primer to the same tube Tm 52-58 C, 17-19 bp, (8-9 G+C for 18-mer) G or C at 3' end (max 3 Gs or Cs), maximum 4bp run. GATC Biotech AG. European Custom Sequencing Centre. Gotrfied-Hagen-Strasse 20. 51105 Köln. http://www.gatc-biotech.com/en/lp4/new-lightrun-sequencing.html
Macrogen Single-pass Add 20 uL DNA (100 ng/uL plasmid or 50 ng/uL purified PCR product) to one tube. Add 20µl primer (10 pmol/uL) to a separate tube. 18-25 bp, 40-60 % GC, Tm 55-60 Macrogen Europe,

IWO, Kamer IA3-195, Meibergdreef 39,1105 AZ Amsterdam Zuid-oost. Netherlands. Attention: J.S .Park.

http://dna.macrogen.com/eng/support/seq/seq_submission.jsp


Variant calling

Sequencing-based techniques

ChIP-sequencing

RNA-seq

Single-cell sequencing

http://trap.it/#!traps/id/f294f009-bb0f-4f14-9e59-e84cf36d2560/jump/6GoIaq61z002q6RDiaaY

Sequencing/genomics centres

http://openwetware.org/wiki/BioMicroCenter:Sequencing

BGI: http://www.genomics.cn/

New York Genome Center: http://nygenome.org/

JGI: http://www.jgi.doe.gov/


The Genome Analysis Centre (UK): http://jobs.tgac.ac.uk/


Norwegian Cancer Genomics Consortium: http://www.cancergenomics.no/

http://medicalgenomics.no/

See also: http://omicsmaps.com/



Sequencing facilities in Norway: (Incomplete)


Oslo:

Akershus University Hospital (Ahus): 1 x Ion Torrent

Norwegian High-Throughput Sequencing Centre (NSC) Oslo, Norway: 2 x Roche/454, 1 x Illumina HiSeq, 1 x PacBio, 1 x Ion Torrent, 1 x Illumina MiSeq

Helse Sør-Øst/University of Oslo Genomics Core Facility Oslo, Norway: 1 x Illumina GA2, 1 x MiSeq, 1 x HiSeq

NTNU Genomics Core Facility Sør-Trøndelag, Norway: 1 x HiSeq

Telemark Hospital Telemark, Norway: 1 x Illumina HiSeq


Bergen:


Trondheim:


UNN:

http://www.unn.no/dna-sequencing/category11734.html

Contact persons:

Lex Nederbragt: http://contig.wordpress.com/about/

Dr. Leonardo A. Meza-Zepeda Head Helse Sør-Øst/ Univ. of Oslo Genomics Core Facility


Kjetill S. Jakobsen

Professor, Group Leader (CEES node)


Dag Erik Undlien

Professor, Group Leader (IMG node)


Other groups which employ HTS:

CIGENE, UMB. See https://sites.google.com/site/seqomics

Primers

Custom primers

Name Length (bp) Sequence Tm (C) [calculated] Tm (C) [Analytical] GC (% / bp) Comment
pJP-1_seq5 18CAGCGTGCGAGTGATTAT 53.9/60.6 (2)/52.6 (3) 50 Binds upstream of XylS region in pSB-M1g
pJP-1_seq618AGACCACATGGTCCTTCT 57.5° (2)/52.8 ºC(3) 53.9 50 Binds near end of GFPmut3 in pSB-M1g
SeqMG1 AGCAGATCCACATCCTTGAA62.7 (2)/53.7 (3) Binds at nt 5672 of pSB-M1g, upstream of AgeI site. Designed to Macrogen sequencing primer criteria.
pSB-SeqA18TGCAAGAAGCGGATACAG56 / 60.7°C (2)/52.3 ºC (3) 50Binds at nt 7729 of pSB-M1g, upstream of Pm promoter and PciI site.

Universal primers

http://www.generi-biotech.com/sequencing-universal-seguencing-primers/ http://www.synthesisgene.com/tools/Universal-Primers.pdf http://www.genewiz.com/public/universalprimers.aspx https://secure.eurogentec.com/product/research-universal-primers.html


Tm calculations: 1: CloneManager 2: Thermo Scientific 3: IDT Oligoanalyzer


A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers http://www.biomedcentral.com/1471-2164/13/341

Software

Consed graphical editor: http://bioinformatics.oxfordjournals.org/content/early/2013/08/31/bioinformatics.btt515.abstract

HTseq. Sequencing analysis with Python: http://seqanswers.com/forums/showthread.php?t=4805

Bamformatics: http://sourceforge.net/projects/bamformatics/?source=directory

http://www.digitalbiologist.com/2013/06/python-next-gen-sequencing.html

DISCOVAR: http://www.broadinstitute.org/software/discovar/blog/

ALLPATHS-LG: http://www.broadinstitute.org/software/allpaths-lg/blog/

Ray Cloud demo: http://browser.cloud.boisvert.info/client/?map=0&section=3&region=41&location=0

http://debian-med.alioth.debian.org/tasks/bio-ngs

Isaac / Illumina Open source software: https://github.com/sequencing

https://www.broad.harvard.edu/crd/wiki/index.php/Main_Page

Chromatogram viewers: http://www.dnaseq.co.uk/chrom_view.html

CodonCode aligner: http://www.codoncode.com/aligner/

BioEdit: http://www.mbio.ncsu.edu/BioEdit/bioedit.html

VCF view: http://www.easih.ac.uk/software.php

FinchTV: http://www.geospiza.com/Products/finchtv.shtml

About SCF (sequence chromatogram format) files: http://staden.sourceforge.net/manual/formats_unix_2.html

https://wiki.nci.nih.gov/display/TCGA/Sequence+trace+files

http://code.google.com/p/seqtrace/

http://www.phrap.com/background.htm

http://en.wikipedia.org/wiki/Phrap

http://www.ncbi.nlm.nih.gov/books/NBK47537/

http://www.bio.net/bionet/mm/autoseq/1999-April/001368.html

High-throughput sequencing tools:

SAM tools: http://samtools.sourceforge.net/

Burrows-Wheeler Aligner (BWA): http://bio-bwa.sourceforge.net/

http://seqanswers.com/wiki/BWA

Maq: Mapping and Assembly with Qualities


See also http://en.wikipedia.org/wiki/List_of_sequence_alignment_software


The Genome Analysis Center - software: https://github.com/TGAC

Genome Analysis Toolkit (GATK): http://www.broadinstitute.org/gatk/

Sequencing quality and standards:

http://www.phrap.com/phred/

http://www.bio.net/bionet/mm/autoseq/1999-April/001366.html

http://en.wikipedia.org/wiki/Phred_quality_score

Sequencing projects

http://www.microbe.net/undergraduate-research-built-environment-genomes/

File formats

FASTG: http://fastg.sourceforge.net/

Sequence Alignment/Map (SAM) format: "A generic format for storing large nucleotide sequence alignments". Tab-delimited text format consisting of a header section (optional) and an alignment section.

http://samtools.sourceforge.net/

http://samtools.sourceforge.net/SAM1.pdf

See also:

http://compbio.soe.ucsc.edu/sam.html

http://www.ncbi.nlm.nih.gov/pubmed/19505943

http://seqanswers.com/wiki/SAM


Binary Compressed Sam format/Binary Alignment Format (BAM): Binary, compressed file format containing the same information as SAM files.

From https://wiki.nci.nih.gov/display/TCGA/Binary+Alignment+Map : "Centers align sequence reads to a reference genome to produce a Sequence Alignment Map (SAM) format file. The SAM file is then converted into a binary form, or Binary-sequence Alignment Format (BAM) file"

See also http://genome.ucsc.edu/goldenPath/help/bam.html

Variant Call Format (VCF):

Standard created by the 1000 Genomes Project.

http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

From http://www.ensembl.org/info/website/upload/large.html :

"The VCF format is a tab delimited format for storing variant calls and and individual genotypes. It is able to store all variant calls from single nucleotide variants to large scale insertions and deletions."

ABI (Applied Biosystems) format:


FASTQ:

FASTQ files encode identified nucleotides together with their corresponding quality scores. The interpretation of the quality scores may vary depending on the source of the sequence, but the most used is the "Sanger format" (Phred quality scores).

http://en.wikipedia.org/wiki/FASTQ_format

http://maq.sourceforge.net/fastq.shtml

http://www.bioperl.org/wiki/FASTQ_sequence_format

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants: http://nar.oxfordjournals.org/content/38/6/1767.full


SRA:

Bibliography

What is next generation sequencing? http://ep.bmj.com/content/early/2013/08/28/archdischild-2013-304340.long

SeqAnswers Literature watch: http://seqanswers.com/forums/forumdisplay.php?f=10

nar.oxfordjournals.org/content/41/1/e1.full?sid=e66b42ac-a309-47cf-8cd1-94e1229a098e#ref-12

Assembly of large genomes using second-generation sequencing.: http://www.ncbi.nlm.nih.gov/pubmed/20508146?dopt=Abstract&holding=npg

http://online.liebertpub.com/doi/full/10.1089/cmb.2011.0201

Comparison of variant-calling software


http://www.nature.com/nmeth/journal/v6/n11s/abs/nmeth.1376.html

http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1935.html

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data: http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2474.html

http://assemblathon.org/

http://www.oxfordjournals.org/our_journals/bioinformatics/nextgenerationsequencing.html

GAGE: A critical evaluation of genome assemblies and assembly algorithms: http://genome.cshlp.org/content/22/3/557


2011

Miller 2011 - Assembly Algorithms for Next-Generation Sequencing Data: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874646/


2012

An Integrated Pipeline for de Novo Assembly of Microbial Genomes : http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0042304

2013

Genome sequencing and next-generation sequence data analysis: A comprehensive compilation of bioinformatics tools and databases: http://www.scirp.org/journal/PaperInformation.aspx?PaperID=30744


Harnessing Virtual Machines to simplify next generation DNA sequencing analysis: http://bioinformatics.oxfordjournals.org/content/early/2013/06/20/bioinformatics.btt352.abstract

High-throughput sequencing for biology and medicine: http://www.nature.com/msb/journal/v9/n1/full/msb201261.html


DNA sequencing using electrical conductance measurements of a DNA polymerase: http://www.nature.com/nnano/journal/vaop/ncurrent/full/nnano.2013.71.html


Li et al.: Memory Efficient Minimum Substring Partitioning: http://www.vldb.org/pvldb/vol6/p169-li.pdf


Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.: http://www.ncbi.nlm.nih.gov/pubmed/23644548


Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly : http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062856

Commentary

The state of NGS variant calling - Don't panic: http://blog.goldenhelix.com/?p=1725

http://nxseq.bitesizebio.com/articles/which-way-forward-in-ultra-high-throughput-genomic-sequencing-reference-materials-and-performance-measurements/

Assemblies: The good, the bad and the ugly: http://www.nature.com/nmeth/journal/v8/n1/full/nmeth0111-59.html

http://ngs-expert.com/

http://bcbio.wordpress.com/

A tale of three next generation sequencers: http://www.biomedcentral.com/content/pdf/1471-2164-13-341.pdf

http://core-genomics.blogspot.no/2012_08_01_archive.html

http://ivory.idyll.org/blog/thoughts-on-assemblathon-2.html

http://flxlexblog.wordpress.com/2013/02/26/on-assembly-uncertainty-inspired-by-the-assemblathon2-debate/

http://www.yuzuki.org/

http://tidsskriftet.no/article/2928729

Nextgenseek: http://nextgenseek.com/

Slides

Courses

http://www.forbio.uio.no

http://www.forbio.uio.no/events/courses/2013/radseq.html

http://genomics.no/oslo/index.php?page=courses

http://ged.msu.edu/angus/bioinformatics-courses.html

Procedures and troubleshooting

Results/success rates:


Sample DNA Primer Result
Barcode - -


Sanger sequencing:

http://seqcore.brcf.med.umich.edu/doc/dnaseq/trouble/badseq.html


Depending on economy and available sample amounts, consider sequencing each sample twice to more easily discern between possible sequencing errors and actual mutations.

Suggested work procedure when receiving sanger sequencing results (plasmids, etc.):

  • Firstly, open the chromatogram file to asses the read length and overall quality.
  • If applicable, compare the automatically trimmed sequenc (.fas) file and the expected sequence using BLAST or another sequence alignment tool. OR: consider using raw sequence copied from a chromatogram viewer.
  • If no hit is found, make sure that the most permissive algorithm (blastn or similar) is used. If still no hit is found, manually inspect the chromatogram (.abi) file using a chromatogram viewer. If the trimmed file is small compared to the raw sequence (low chromatogram quality) and the remainder appears sensible, re-do the search using "raw" called bases (copied directly from the chromatogram viewer). When making notes on sequence results, always write which sequence (PHRED-generated, "raw" sequence from chromatogram viewer?) which was used for a given analysis (f. ex. BLAST search). Otherwise, confusion may ensue: Note says 100 % match, BLAST search gives no/bad match, etc....
  • As a quick check, the sequence file can be searched for a short portion of the expected sequence, while allowing for some mistmatches (which may be present because of sequencing errors).
  • If disrepancies occur, inspect the chromatogram at the relevant positions.
  • If a hit is found for the desired sequence, check that the sequence is in the right position, and that the flanking sequences are correct.
  • Be aware that alignment may produce suboptimal results (indicating a worse fit than is actually the case), especially when aligning to circular sequences.
  • If the chromatogram yields no sequence, note/report this as "no usable data".

Three main "concerns" may appears:

  • Base differs from expected.
  • Base is uncalled ("n")
  • Indel/Gap

In all cases inspecting the chromatogram may resolve the issue. Automatically generated sequences should be considered a best guess by the computer.


Chromatogram interpretation:

http://peter.unmack.net/molecular/data/chromatogram.editing.html

http://www.sci.sdsu.edu/dnacore/InterpretData.html

http://cancer-seqbase.uchicago.edu/traces.html

http://seqcore.brcf.med.umich.edu/doc/dnaseq/interpret.html

http://www.roswellpark.edu/shared-resources/biomolecular-resource-facilities/dna-sequencing/interpreting-chromatograms


Common causes of bad data from sanger sequencing:

  • Salt/alcohol/other contamination
  • GC rich of palindromic regions.
  • Double priming
  • Supression of signal after a strong signal: Happens most commonly for G's after A's, and often for G's after C's. Most often, weak G signals follow after multiple A's.

Common causes of mis-called bases:

Template preparation:

http://www.sci.sdsu.edu/dnacore/tempprep.html

Misc

Blueseq online sequencing guide: http://www.blueseq.com/

http://lycofs01.lycoming.edu/~gcat-seek/

Bitesize bio NGS channel: http://nxseq.bitesizebio.com/

http://nxseq.bitesizebio.com/articles/which-way-forward-in-ultra-high-throughput-genomic-sequencing-reference-materials-and-performance-measurements/

http://nxseq.bitesizebio.com/articles/a-short-history-of-sequencing-part-2-the-first-of-the-next/

http://nxseq.bitesizebio.com/articles/a-short-history-of-sequencing-part-3-personal-ngs-disposable-sequencers-and-what-the-future-holds/

Genome in a bottle consortium: http://genomeinabottle.org/

SEQanswers: http://seqanswers.com/

SEQanswers wiki: http://seqanswers.com/wiki/SEQanswers

SEQansers - how to: http://seqanswers.com/wiki/How-to

Genome Reference Consortium: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/

List of NGS blogs: http://seqanswers.com/forums/showthread.php?t=5024

http://www.homolog.us/blogs/

NGS Necropolis: http://blueseq.com/knowledgebank/ngs-necropolis/

http://gnubio.com/

Rob Carlson's blog: http://synthesis.cc/

http://titojankowski.com/the-500000-dna-sequencer-tear-down/

Raw data

Guides and instructional material

PRACTICAL: Genome sequencing of Bacteroides isolates. http://www.nematodes.org/teaching/gg3/index.shtml

ANGUS: http://ged.msu.edu/angus/

See also http://ivory.idyll.org/blog/ngs-course-with-aws.html

NSU NGS Analysis workshop 2012: http://ged.msu.edu/angus/tutorials-2012/index.html

http://ged.msu.edu/angus/tutorials-2012/files/lecture3-mapping.pptx.pdf

MSU NGS analysis workshop 2013: http://ged.msu.edu/angus/tutorials-2013/index.html

Homolog.us tutorials: http://www.homolog.us/Tutorials/index.php?p=1.1&s=1

http://ged.msu.edu/angus/tutorials-2013/files/rayan-2013-june-18-msu.pdf


NGS WikiBook: http://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29


GCAT SEEK: http://lycofs01.lycoming.edu/~gcat-seek/index.html


Mason lab NGS workshop: http://chagall.med.cornell.edu/NGScourse/

Results

Quality control, error sources and error detection

http://pathogenomics.bham.ac.uk/blog/2013/01/sequencing-data-i-want-the-truth-you-cant-handle-the-truth/

http://www.citeulike.org/user/cisevol/tag/sequencing_error

Churchill & Waterman 1991. The Accuracy of DNA Sequences: Estimating Sequence Quality: http://www.cmb.usc.edu/papers/msw_papers/msw-107.pdf

Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation: http://nar.oxfordjournals.org/content/early/2013/01/08/nar.gks1443.full.pdf?keytype=ref&ijkey=suYBLqdsrc7kH7G

10 rules of thumb in genomics: http://genomeinformatician.blogspot.co.uk/2011/07/10-rules-of-thumb-in-genomics.html


Bioplanet GCAT: http://www.bioplanet.com/gcat


QUAST: http://bioinf.spbau.ru/quast

HtSeq-Qa: http://www-huber.embl.de/users/anders/HTSeq/doc/qa.html

Economy and costs

http://www.genome.gov/sequencingcosts/

Read simulation

http://sourceforge.net/projects/readsim/?source=directory


Links

https://sites.google.com/site/seqomics/

Personal tools