BioMicroCenter:Illumina Sequencing: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
mNo edit summary
 
(212 intermediate revisions by 11 users not shown)
Line 1: Line 1:
{{BioMicroCenter}}
{{BioMicroCenter}}


[[Image:hiseq_2000.jpg|right]]The MIT BioMicro Center has four high-throughput Illumina sequencers, including a HiSeq 2000. We support a wide variety of applications, such as ChIP-Seq, miRNA sequencing and RNA-seq. Each sequencer can process up to 7 lanes simultaneously, with a data yield of over 25 million reads per lane for the GAIIx and over 120 million reads for the HiSeq (single read). Each lane can potentially accomodate dozens of barcoded samples (depending on sequence complexity and desired coverage). [[BioMicroCenter:CoverageCalculations|Read lengths]] vary, depending on users, between 36nt and 150nt per end on the GAIIx and between 40nt and 100nt per end on the HiSeq. <br>
The MIT BioMicro Center has six high-throughput Illumina sequencers including one NovaSeq 6000, one HiSeq 2000, two NextSeq 500s and two MiSeqs. We support a wide variety of applications, such as ChIP-Seq, miRNA sequencing and RNA-seq. Each lane has the potential to accommodate dozens of barcoded samples (depending on sequence complexity and desired coverage). [[BioMicroCenter:CoverageCalculations|Read lengths]] vary, depending on users, between 20nt and 325nt per end. <br><br>
All questions about Illumina Sequencing can be directed to Kevin Thai at kthai@mit.edu.


==Illumina Massively Parallel Sequencing==
== Illumina Massively Parallel Sequencing ==
{|
|- style="vertical-align: top;"
|style="width: 400px;"|
{| class="wikitable" border=1
  !Service
  !Illumina Sequencing
  |-
  |INPUT || Illumina libraries
  |-
  |MIN VOLUME || 12 uL NextSeq/MiSeq per flow cell <br> 12 uL NovaSeq per lane
  |-
  |MIN CONCENTRATION || 2nM* (~0.4ng/uL for a 300bp library)<BR> Can handle samples to 0.2nM, but read count not guaranteed
  |-
  |INCLUDED SERVICES
  |
* Quality Control:Fragment Analyzer and qPCR
* Illumina Sequencing
* Demultiplexing
  |-
  |ADDITIONAL SERVICES ||
* Quality Control
* Illumina library preparation
  |-
  |DATA FORMATS
  |
*FASTQ
*BCL (stored 30d)
  |-
  |QUALITY CONTROL
  |
* [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ FASTQC]
* Basic run metrics (alignment rate, complexity)
* Basic RNAseq metrics (where applicable)
* Basic paired end metrics (where applicable)
* Contamination checks
  |-
  |PRICING || [[BioMicroCenter:Pricing#HISEQ2000_SEQUENCING|LINK]]
  |-
  |SUBMISSION
  |
* MIT - [https://mit.ilabsolutions.com/service_item/new/3381?spt_id=3863 ilabs]
* External Users - [[BioMicroCenter:Forms|form]]
  |-
|}
|
Illumina sequencing-by-synthesis is the primary workhorse of the BioMicro Center. These instruments produce millions of reads simultaneously and are used in a very broad spectrum of applications. The Center currently supports 1 NovaSeq 6000, 1 HiSeq2000, 2 NextSeq500s, and 2 MiSeqs. In addition, we have access to additional sequencing instrumentation through a long standing collaboration with the Whitehead Genome Technologies Core. <BR><BR>


[[Image:cbot_left.jpg|left]][[Image:GAIIxcollage.jpg|right]]Illumina sequencing works by binding randomly fragmented DNA to an optical flowcell. Fragments are sequenced by sequentially incorporating and imaging fluorescently labeled nucleotides in a [http://illumina.com/pages.ilmn?ID=203 “Sequencing-By-Synthesis”] reaction. Illumina recently rolled out its [http://www.illumina.com/truseq/about_truseq/truseq_sequencing_by_synthesis.ilmn TruSeq v3] reagent kits, improving read quality and reducing GC bias at high cluster densities. As a result, the capacity of the HiSeq has dramatically expanded, so that a single flowcell lane on the HiSeq now produces as many reads as an entire GAII flowcell.<br><br>
===Typical Workflow===
Illumina sequencing at the core begins with library quality control, during which the Center verifies the anchor elements and insert size using [[BioMicroCenter:RTPCR|qPCR]] and the [[BioMicroCenter:QC#AATI_FRAGMENT_ANALYZER|Fragment Analyzer]]. Users may elect to bypass this step if they provide the sample concentration and the concentration they would like to load at. Samples are then entered into the sequencing queue. Typical queues in the BioMicro Center are short, rarely exceeding 1-2 weeks, and samples are frequently run within a couple days. <B>We do guarantee a minimum number of reads per lane provided if: a) BMC performed the QC, b) the samples are high-complexity, especially in the first few nucleotides, and c) the samples are at least 2nM.</B> <br>
The [http://illumina.com/pages.ilmn?ID=204 Genome Analyzer IIx] system consists of a cBOT cluster generation station, a Genome Analyzer sequencer, and a Paired-End Module, all of which work in concert to generate and analyze flowcells. The [http://www.illumina.com/systems/hiseq_2000.ilmn HiSeq] system is similar but features expanded capacity and an integrated Paired-End Module.
<BR><BR>
For an in-depth overview of the Illumina sequencing chemistry, please refer to [http://www.ncbi.nlm.nih.gov/pubmed/19682367 Kirchner et al 2009.]
<br><br>


== Applications ==
Illumina sequencing through the BioMicro Center is only available in full lanes and not on a per read basis. You are welcome to collaborate with other laboratories in order to share a lane. However, please be sure to minimize any possibility of cross-contamination of indexes as well as [https://www.illumina.com/science/education/minimizing-index-hopping.html index crosstalk.] <BR><BR>
Illumina currently provides reagents and support for four major sequencing applications:  


* [[BioMicroCenter:ChIP_Seq|'''ChIP Seq''']]
===Data Handling===
* [[BioMicroCenter:Expression_Seq|'''RNA Seq''']]
Following sequencing, data is handled using a custom analytical pipeline. If an index was provided, samples are split by index and identified by DNA-ID. [https://en.wikipedia.org/wiki/FASTQ_format FASTQ] files and other desired formats will be placed in a delivery folder based on the project name, along with several quality control checks done for the sequencing data. We will provide an initial review of your project where we do try to identify any issues including incorrect indexes, sample contaminants, etc. Please note that our pipelines are built to find problems and are NOT designed to provide an initial analysis of your data. These analyses are built for speed, simplicity, and are not tuned in any way for your samples. We are always happy to discuss these with you. Data will be available in your lab folder for 90 days post delivery, after which most of the data is deleted. FASTQ file will still be downloadable for two years but is delivered on a case by case basis.<BR><BR>
* [[BioMicroCenter:Small_RNA_Seq|'''Small RNA Sequencing''']]
===Custom Sequencing===
* [[BioMicroCenter:Genome_Seq|'''Genome sequencing and resequencing''']]
Users may elect to prep their samples with custom oligos. If this is the case, custom sequencing oligos must be provided along with the samples at the time of submission. At least 15uL of each kind custom sequencing primer at 100 uM should be submitted per lane of HiSeq, MiSeq, or NextSeq. At least 30uL of each kind of custom sequencing primer at 100uM should be provided per NovaSeq flowcell. It is strongly recommended to contact biomicro@mit.edu before planning a custom library preparation. <BR><BR>


The following application has been published but does not yet have a kit from Illumina:


* Genotyping: Protocols are being developed for detection of SNPs, chromosomal rearrangements and other genotyping applications. <br>
|
<BR>
[[Image:cbot_left.jpg|left]]<BR><BR>
Additional services available:
* [[BioMicroCenter:Illumina_Library_Preparation|Illumina library preparation]]
* [[BioMicroCenter:BioInformaticsStaff|Bioinformatics Support]]
* [[BioMicroCenter:Servers|Data storage and Computation]]
|}


==Sample Preparation==
== Illumina Platforms ==
Illumina sequencing requires the input of libraries which have been properly fragmented, ligated to specific adapters, and, in the case of RNA inputs, converted into complementary DNA. The BioMicro Center accepts fully prepared libraries from users and also offers a variety of [[BioMicroCenter:Illumina Library Preparation|'''sample preparation services''']] for different applications.
{| class="wikitable" border=1
!width=100| SPEC
!width=250| HiSeq2000
!width=250| MiSeq
!width=250| NextSeq500
!width=250| NovaSeq6000
|-
|'''SEQUENCER'''
|[[Image:hiseq_2000.jpg|center|200px]]
|[[Image:BMC_miseq.png|center|200px]]
|[[Image:BMC_Next500.png|center|200px]]
|[[Image:BMC_Nova6000.jpg|center|150px]]
|-
| '''READS/LANE'''<BR> Low number is minimum per lane for standard Illumina libraries.
|
* 1 Lane: 150-220M
|
* v2: 8-15M
* v3: 12-25M
* Nano (v2 Reagents only): 1M
|
* High output: 250-500M
|
* SP, S1, S2 (2 lanes): 650M, 1.3B, 3.3B
*S4 (4 lanes): 8B
*Compatible with [https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology/patterned-flow-cells.html ExAmp workflow]
|-
|'''NT/DAY'''
|
* 18 nt/day
* 3 days/run
|
* 288 nt/day
* 1-3 days/run
|
* 150 nt/day
* 1-2 days/run
|
* 150 nt/day
* 1-2 days/run
|-
|'''LENGTHS AVAILABLE'''
|
*40 Single End only
|
*'''v2:''' 75nt, 300nt, 500nt
*'''v3:''' 150nt, 600nt
|
*75nt, 150nt, 300nt
|
*100nt, 200nt, 300nt, 500nt(SP only)
|-
|'''USES'''
|
* ChIPseq
* Bisulfite sequencing
* Copy Number Variation
* smRNA
* SgRNA Screens
* siRNA screens
|
* Small genome resequencing
* Targeted resequencing
* 16S Metagenomes
* smRNA
* DeNovo Sequencing
|
* SNP detection
* Exome sequencing
* Splicing analysis in RNAseq
* High coverage
* DeNovo Sequencing
* Metagenomics
|
* scRNA sequencing for many cells
* Exome sequencing
* High coverage
|-
|'''KEY NOTES'''
|
*
*Useful for counting applications
|
*Best for low complexity libraries
*Suitable for barcode counting
|
*2 color chemistry - G=dark.
*Struggles with low complexity libraries
*Struggles with large libraries
|
*Patterned flowcell
*Struggles with low complexity libraries
*ExAmp Workflow option
*Platform still evolving
|-
|'''DONATED BY'''
|Drs. Penny Chisholm and Chris Burge and HHMI
|Drs. Chris Love, Michael Birnbaum and the Dept. of Biological Engineering.
|Drs. Penny Chisholm, Doug Lauffenburger, Myriam Heiman, Li-Huei Tsai and the Dept. of Biology and the Koch Institute.
|Drs. Manolis Kellis,  Li-Huei Tsai and MIT, The MIT Stem Cell Initiative and the Dept. of Biology and the Koch Institute.  
|}


Information is also available about [[BioMicroCenter:Multiplex|'''multiplexing''']].
<BR><BR>


== QC ==
<!-- commenting GA out 2/2/17 NK === Genome Analyzer IIx ===
[[Image:GAIIxcollage.jpg|right|200px]] The Genome Analyzer II (GAII) is the oldest sequencers in the BioMicro Center and remain the most flexible. The newer generations of Illumina sequencers have been designed with increasing focus on clinical applications and have removed some of the "hands on" aspects of the older GAIIs. The GAIIs remain the only sequencers where the actual images of the flowcell can be reprocessed for example. The GAII/IIx can produce 20-40m reads per lane passing filter and typically runs read lengths of 36-150nt per side.<BR><BR>
With the addition of the MiSeq, we have reworked how we are processing GAII flowcells. We have been able to create [[BioMicroCenter:PartialFlowcells|'''partial flowcells''']] on the GAII by altering recipes. This has allowed us to move from a model like the HiSeq where we need a full flowcell before we run to a model where we can run as soon as the samples pass quality control, more like the MiSeq. However, unlike the MiSeq, we can run multiple lanes at once. Some critical caveats: First, these methods are not supported by Illumina so we cannot offer to replace failed runs. Second, unlike the HiSeq, the PhiX lane is *not* included. You must choose to sequence a lane of PhiX if you want to do control normalization. Finally, this service is completely "a la carte" so the pricing schema is quite different. <BR><BR>
{| border=1 align="right"
! # of Lanes
!width=75| cycles per day
!width=75| cycles per kit
|-align="center"
| 8
| 24
| 42
|-align="center"
| 4
| 36
| 66/33*
|-align="center"
| 2
| 48
| 106/54*
|-align="center"
| 1
| 72
| 140/81*
|-
|colspan="3" align=center |&#42;''Second number pertains to reads greater than 40 nt.''
|}


Quality control is very important for optimizing the number of reads and the quality of data produced. We run Bioanalyzer and RT-PCR for all submitted cDNA libraries for Illumina sequencing. For more information on QC methods and protocols please visit the [[BioMicroCenter:Sequencing_Quality_Control|Sequencing Quality Control]] page.
Using fewer lanes on each flowcell has allowed us to decrease the cycle time by not imaging all the lanes. In a typical 8 lane run, 20 minutes is spent doing chemistry followed by 40 minutes of imaging (each lane takes ~5 minutes to image). Therefore, a 2 lane flowcell runs twice as fast as an 8 lane flowcell. Also, since the chemistry is not running in to all of the lanes, each sequencing kit can go to a longer read length. The relationships are summarized in the chart on the left. Pricing is set on the number of lanes you are using, the number of days you are running the GAII, and the number of sequencing kits you are using. For example, if you wanted to run a 75+75 PE flowcell using 2 lanes, the cost would be the initial cost for the 2 lane PE flowcell plus an additional 3 days (one day is included in the original price) plus two additional sequencing kits. The last kit would not be completely used up (you would have an extra 18nt left that would be thrown away).<BR><BR>


<BR>
The GAII/GAIIx is ideal for:
* Unusual read lengths
* Protocol Prototyping
* Non-standard assays such as HITS-FLIP


== Data Analysis ==
''The Genome Analyzer IIs were donated to the BioMicro Center by Drs. Penny Chisholm, Chris Burge, Ernest Fraenkel and the Dept of Biology with contributions from many others ''
Each flowcell lane flowcell should produce between 10 and 120 million DNA fragments as of September 2011. Understanding this data often requires a significant investment in informatics and many applications require entirely different interpretations of the data. As part of our sequencing service we provide many of the early steps of bioinformatics for different applications. Further data processing can be arranged on a collaborative basis as resources are available. For more information, check out the links below:
!width=200| GAII/IIx: Boris, Natasha, etc 20-40m reads 1 to 8 lanes 24-72 nt/day max read length 80+80 in lane by lane
 
THE GA IS DEAD LONG LIVE THE GA Jack Daniels is DEAD LONG LIVE JACK DANIELS-->
* [[BioMicroCenter:IlluminaDataPipeline#Basics|Illumina pipeline - How it works!]]
* [[BioMicroCenter:IlluminaDataPipeline#Output_Files|Illumina pipeline output formats]]
* [[BioMicroCenter:Computing#BioInformatics_Services|Bioinformatic Consulting]]
<BR>
 
== Protocols ==
 
Protocols for all supported technologies can be found [[BioMicroCenter:Protocols| here]].
 
<BR>
 
 
== Pricing and Priority ==
Full pricing information is available on [[BioMicroCenter:Pricing|our price list]].<br>
 
Priority for Illumina sequencing is currently given to labs associated with the BioMicro Center [[BioMicroCenter:CoreDeps|Core departments]]. We are able offer our services to other MIT and [[BioMicroCenter:FAQ#NON_MIT_USERS|non-MIT]] users as space allows. <br><br><br>
<BR>
 
 
== MIT Core Collaboration ==
Because of the layout of Illumina flowcells, samples must be run in batches of 7 lanes (a pool of multiplexed samples counts as one lane). In order to ensure quick throughput, we have established a collaboration that allows us to move partial flowcells between the various centers at MIT. For users with less then 4 samples, their samples may be moved between the BioMicro Center, the [http://jura.wi.mit.edu/genomecorewiki/index.php/Main_Page Whitehead Institute Center for Genome Technologies] and the [http://web.mit.edu/biopolymers/www/ Koch Institute Biopolymer Center]. Samples will be moved only to fill out runs or to expedite processing. The Centers are committed to working together to maintain consistent quality between the different cores, so you should see no difference whether your samples are run in BioMicro or at one of our sister centers. Transfers are only available for members of the MIT community.
 
[[Media:QueueReport.pdf| View current samples queuing for Illumina]]
<BR>
 
 
 
''Initial page written by Summeet Gupta at the WI-CGT''

Latest revision as of 08:50, 12 March 2023

HOME -- SEQUENCING -- LIBRARY PREP -- HIGH-THROUGHPUT -- COMPUTING -- OTHER TECHNOLOGY

The MIT BioMicro Center has six high-throughput Illumina sequencers including one NovaSeq 6000, one HiSeq 2000, two NextSeq 500s and two MiSeqs. We support a wide variety of applications, such as ChIP-Seq, miRNA sequencing and RNA-seq. Each lane has the potential to accommodate dozens of barcoded samples (depending on sequence complexity and desired coverage). Read lengths vary, depending on users, between 20nt and 325nt per end.

Illumina Massively Parallel Sequencing

Service Illumina Sequencing
INPUT Illumina libraries
MIN VOLUME 12 uL NextSeq/MiSeq per flow cell
12 uL NovaSeq per lane
MIN CONCENTRATION 2nM* (~0.4ng/uL for a 300bp library)
Can handle samples to 0.2nM, but read count not guaranteed
INCLUDED SERVICES
  • Quality Control:Fragment Analyzer and qPCR
  • Illumina Sequencing
  • Demultiplexing
ADDITIONAL SERVICES
  • Quality Control
  • Illumina library preparation
DATA FORMATS
  • FASTQ
  • BCL (stored 30d)
QUALITY CONTROL
  • FASTQC
  • Basic run metrics (alignment rate, complexity)
  • Basic RNAseq metrics (where applicable)
  • Basic paired end metrics (where applicable)
  • Contamination checks
PRICING LINK
SUBMISSION

Illumina sequencing-by-synthesis is the primary workhorse of the BioMicro Center. These instruments produce millions of reads simultaneously and are used in a very broad spectrum of applications. The Center currently supports 1 NovaSeq 6000, 1 HiSeq2000, 2 NextSeq500s, and 2 MiSeqs. In addition, we have access to additional sequencing instrumentation through a long standing collaboration with the Whitehead Genome Technologies Core.

Typical Workflow

Illumina sequencing at the core begins with library quality control, during which the Center verifies the anchor elements and insert size using qPCR and the Fragment Analyzer. Users may elect to bypass this step if they provide the sample concentration and the concentration they would like to load at. Samples are then entered into the sequencing queue. Typical queues in the BioMicro Center are short, rarely exceeding 1-2 weeks, and samples are frequently run within a couple days. We do guarantee a minimum number of reads per lane provided if: a) BMC performed the QC, b) the samples are high-complexity, especially in the first few nucleotides, and c) the samples are at least 2nM.

Illumina sequencing through the BioMicro Center is only available in full lanes and not on a per read basis. You are welcome to collaborate with other laboratories in order to share a lane. However, please be sure to minimize any possibility of cross-contamination of indexes as well as index crosstalk.

Data Handling

Following sequencing, data is handled using a custom analytical pipeline. If an index was provided, samples are split by index and identified by DNA-ID. FASTQ files and other desired formats will be placed in a delivery folder based on the project name, along with several quality control checks done for the sequencing data. We will provide an initial review of your project where we do try to identify any issues including incorrect indexes, sample contaminants, etc. Please note that our pipelines are built to find problems and are NOT designed to provide an initial analysis of your data. These analyses are built for speed, simplicity, and are not tuned in any way for your samples. We are always happy to discuss these with you. Data will be available in your lab folder for 90 days post delivery, after which most of the data is deleted. FASTQ file will still be downloadable for two years but is delivered on a case by case basis.

Custom Sequencing

Users may elect to prep their samples with custom oligos. If this is the case, custom sequencing oligos must be provided along with the samples at the time of submission. At least 15uL of each kind custom sequencing primer at 100 uM should be submitted per lane of HiSeq, MiSeq, or NextSeq. At least 30uL of each kind of custom sequencing primer at 100uM should be provided per NovaSeq flowcell. It is strongly recommended to contact biomicro@mit.edu before planning a custom library preparation.




Additional services available:

Illumina Platforms

SPEC HiSeq2000 MiSeq NextSeq500 NovaSeq6000
SEQUENCER
READS/LANE
Low number is minimum per lane for standard Illumina libraries.
  • 1 Lane: 150-220M
  • v2: 8-15M
  • v3: 12-25M
  • Nano (v2 Reagents only): 1M
  • High output: 250-500M
  • SP, S1, S2 (2 lanes): 650M, 1.3B, 3.3B
  • S4 (4 lanes): 8B
  • Compatible with ExAmp workflow
NT/DAY
  • 18 nt/day
  • 3 days/run
  • 288 nt/day
  • 1-3 days/run
  • 150 nt/day
  • 1-2 days/run
  • 150 nt/day
  • 1-2 days/run
LENGTHS AVAILABLE
  • 40 Single End only
  • v2: 75nt, 300nt, 500nt
  • v3: 150nt, 600nt
  • 75nt, 150nt, 300nt
  • 100nt, 200nt, 300nt, 500nt(SP only)
USES
  • ChIPseq
  • Bisulfite sequencing
  • Copy Number Variation
  • smRNA
  • SgRNA Screens
  • siRNA screens
  • Small genome resequencing
  • Targeted resequencing
  • 16S Metagenomes
  • smRNA
  • DeNovo Sequencing
  • SNP detection
  • Exome sequencing
  • Splicing analysis in RNAseq
  • High coverage
  • DeNovo Sequencing
  • Metagenomics
  • scRNA sequencing for many cells
  • Exome sequencing
  • High coverage
KEY NOTES
  • Useful for counting applications
  • Best for low complexity libraries
  • Suitable for barcode counting
  • 2 color chemistry - G=dark.
  • Struggles with low complexity libraries
  • Struggles with large libraries
  • Patterned flowcell
  • Struggles with low complexity libraries
  • ExAmp Workflow option
  • Platform still evolving
DONATED BY Drs. Penny Chisholm and Chris Burge and HHMI Drs. Chris Love, Michael Birnbaum and the Dept. of Biological Engineering. Drs. Penny Chisholm, Doug Lauffenburger, Myriam Heiman, Li-Huei Tsai and the Dept. of Biology and the Koch Institute. Drs. Manolis Kellis, Li-Huei Tsai and MIT, The MIT Stem Cell Initiative and the Dept. of Biology and the Koch Institute.