BioMicroCenter:Illumina Sequencing: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(126 intermediate revisions by 8 users not shown)
Line 1: Line 1:
{{BioMicroCenter}}
{{BioMicroCenter}}


The MIT BioMicro Center has five high-throughput Illumina sequencers, including a HiSeq 2000, three Genome Analyzers and one MiSeq. We support a wide variety of applications, such as ChIP-Seq, miRNA sequencing and RNA-seq.  Each lane can potentially accomodate dozens of barcoded samples (depending on sequence complexity and desired coverage). [[BioMicroCenter:CoverageCalculations|Read lengths]] vary, depending on users, between 36nt and 150nt per end. <br>
The MIT BioMicro Center has seven high-throughput Illumina sequencers, including two HiSeq 2000s, one Genome Analyzer, two NextSeqs and two MiSeqs. We support a wide variety of applications, such as ChIP-Seq, miRNA sequencing and RNA-seq.  Each lane can potentially accomodate dozens of barcoded samples (depending on sequence complexity and desired coverage). [[BioMicroCenter:CoverageCalculations|Read lengths]] vary, depending on users, between 36nt and 325nt per end. <br>
All questions about Illumina Sequencing can be directed to Kevin Thai at kthai@mit.edu.
Questions about Illumina Sequencing can be directed to [[BioMicroCenter:People|Noelani Kamelamela]]


==Illumina Massively Parallel Sequencing==
==Illumina Massively Parallel Sequencing==
[[Image:cbot_left.jpg|left]]Illumina sequencing works by binding randomly fragmented DNA to an optical flowcell. Fragments are sequenced by sequentially incorporating and imaging fluorescently labeled nucleotides in a [http://illumina.com/pages.ilmn?ID=203 “Sequencing-By-Synthesis”] reaction. Illumina recently rolled out its [http://www.illumina.com/truseq/about_truseq/truseq_sequencing_by_synthesis.ilmn TruSeq v3] reagent kits, improving read quality and reducing GC bias at high cluster densities. For an in-depth overview of the Illumina sequencing chemistry, please refer to [http://www.ncbi.nlm.nih.gov/pubmed/19682367 Kirchner et al 2009.]
[[Image:cbot_left.jpg|left]]Illumina sequencing works by binding randomly fragmented DNA to an optical flowcell. Fragments are sequenced by sequentially incorporating and imaging fluorescently labeled nucleotides in a [http://illumina.com/pages.ilmn?ID=203 “Sequencing-By-Synthesis”] reaction. The BioMicro Center uses Illumina's [http://www.illumina.com/truseq/about_truseq/truseq_sequencing_by_synthesis.ilmn TruSeq v3] reagent kits for HiSeq, v2 chemistry for NextSeq and v2 or v3 chemistry for MiSeq. For an in-depth overview of the Illumina sequencing chemistry, please refer to [http://www.ncbi.nlm.nih.gov/pubmed/19682367 Kirchner et al 2009.]


== Platforms ==
== Platforms ==
=== HiSeq 2000 ===
=== HiSeq 2000 ===
[[Image:hiseq_2000.jpg|right]] The Illumina HiSeq 2000 is the workhorse of BMC's Illumina fleet and is optimized for maximum yield and the lowest price per basepair. Each lane on the HiSeq is typically produces between 160 and 220 million reads passing our quality filter (for high quality libraries). HiSeq flowcells have 8 lanes, one of which is committed to a control sample that is used for base normalization (lane 1).  Read lengths on the HiSeq very between 40 and 100nt per side and nearly all flowcells use barcoding to run multiple samples in each lane. <BR><BR>
[[Image:hiseq_2000.jpg|right]] The Illumina HiSeq 2000 is the workhorse of BMC's Illumina fleet and is optimized for maximum yield and the lowest price per basepair. Each lane on the HiSeq typically produces between 150 and 220 million reads passing our quality filter (for high quality libraries). HiSeq flowcells have 8 lanes, one of which is committed to a control sample that is used for base normalization (lane 1).  Read lengths on the HiSeq very between 40 and 100nt per side and nearly all flowcells use barcoding to run multiple samples in each lane. <BR><BR>
In order to optimize work flow and keep costs under control, only full flowcells are run. Since all 8 lanes of the flowcell must be run at equal lengths, submissions of single lanes must be grouped with other similar read lengths. This means that some read lengths move through our queue faster then others because more samples of that length are submitted to the BioMicro Center for sequencing. 40nt single end (SE) samples are by far the most common and move through the queue rapidly followed by short paired end (40+40) runs. Many lengths are very unusual (eg. 100nt single end) and can wait months for sequencing. We strongly recommend moving samples with unusual read requirements to one of the other platforms. If you have questions about this (or any other aspect of sequencing) please do not hesitate to contact us.<BR>
In order to optimize work flow and keep costs under control, only full flowcells are run. Since all 8 lanes of the flowcell must be run at equal lengths, submissions of single lanes must be grouped with other similar read lengths. This means that some read lengths move through our queue faster then others because more samples of that length are submitted to the BioMicro Center for sequencing. 40nt single end (SE) samples are by far the most common and move through the queue rapidly followed by short paired end (40+40) runs. Many lengths are very unusual (eg. 100nt single end) and should instead be submitted for the NextSeq unless you have a full flowcell. If you have questions about this (or any other aspect of sequencing) please do not hesitate to contact us.<BR>
<BR>
<BR>
The HiSeq2000 is ideal for:
The HiSeq2000 is ideal for:
* High numbers of multiplexed samples
* De novo sequencing
* SNP detection
* SNP detection
* ChIPseq
* ChIPseq
* Bisulfite sequencing
* Bisulfite sequencing
* RNAseq
* Gene Expression
* Exome sequencing
* Exome sequencing
* smallRNA
''The HiSeq2000s were donated to the BioMicro Center by Drs. Penny Chisholm and Chris Burge and HHMI ''


''The HiSeq2000 was donated to the BioMicro Center by Drs. Penny Chisholm and Chris Burge. ''
=== MiSeq ===
=== MiSeq ===
[[Image:BMC_miseq.png|right|200px]] The MiSeq is the newest sequencer in the BioMicro Center. Unlike the HiSeq, the MiSeq is optimized for speed. The MiSeq has a single lane that can produce approximately 5 million reads passing filter. The MiSeq does *not* have a control lane so having good base balance is critical for runs on this sequencer. Amplicons, such as 16S, can be run on the sequencer but should be constructed to have [[BioMicroCenter:PhasedSequencing|phased sequencing]]. Highly unbalanced libraries, such as RRBS, should not be run on the MiSeq. <BR><BR>
[[Image:BMC_miseq.png|right|200px]] The MiSeq is a newer sequencer in the BioMicro Center. Unlike the HiSeq, the MiSeq is optimized for speed. The MiSeq has a single lane that can produce up to 25 million reads passing filter (ideal cases). The MiSeq does *not* have a control lane so having good base balance is critical for runs on this sequencer. Amplicons, such as 16S, can be run on the sequencer but should be constructed to have complexity in the first several bases. Highly unbalanced libraries, such as RRBS, should not be run on the MiSeq. <BR><BR>
The strength of the MiSeq is its speed and read length. The MiSeq is able to sequence 12nt/hour which allows it to complete a 150+150nt paired end read, from cluster to fastq files, in a little over a day. This compares to 2-3 weeks of sequencing on the HiSeq. Because the chemistry is on the flowcell for less time, error rates are much lower for the MiSeq then the HiSeq. New kits should push read length even longer, with 250+250PE kits coming soon and the Broad Institute having reported 400+400PE runs. <BR><BR>
The strength of the MiSeq is its speed and read length. The MiSeq is able to sequence 14nt/hour which allows it to complete a 150+150nt paired end read, from cluster to fastq files, in a little less than a day. This compares to 2-3 weeks of sequencing on the HiSeq. Because the chemistry is on the flowcell for less time, error rates are much lower for the MiSeq then the HiSeq. MiSeq runs are available in 50, 150*, 300, 500 and 600*nt flavors. (*) - v3 kits can reach 25m reads. Other kits can only reach 15m<BR><BR>
The 50 cycle kit can accommodate up to 70bp read length (single-end or 30+30 paired-end).  The 300 cycle kit can accommodate up to 350bp read length, while the 500 cycle kit can accommodate up to 520bp read length. Illumina read quality at long length has declined in recent years and reads longer than 250PE have had mixed results. <BR><BR>
The MiSeq is ideal for:
The MiSeq is ideal for:
* Small genome resequencing
* Small genome resequencing
Line 30: Line 31:
* Metagenomics
* Metagenomics
* smRNA
* smRNA
* barcode sequencing
* de novo sequencing.
''The HiSeq2000 was donated to the BioMicro Center by Dr. Chris Love. ''
 
''MiSeqs were donated to the Biomicro Center by Drs. Chris Love, Michael Birnbaum and the Dept. of Biological Engineering. ''
 
=== NextSeq500 ===
[[Image:BMC_Next500.png|right|200px]] The NextSeq is the newest sequencer in the BioMicro Center. The NextSeq can be thought of as a MiSeq on steroids. Optimized for speed and yield, the NextSeq has a single lane that can produce up to 500 million reads passing filter (ideal cases). This yield does come at a slightly lower quality, and while most Illumina machines operate well above their specifications, the NextSeq has less margin. Like the Miseq, the NextSeq does *not* have a control lane, so having good base balance is critical for runs on this sequencer. In addition, the NextSeq chemistry only uses 2 fluorophores instead of 4 which can complicate some experimental designs. Amplicons, such as 16S, have not yet been tested on the sequencer and may fail. Highly unbalanced libraries, such as RRBS, have not been run on the NextSeq. <BR><BR>
[[Image:BMC_2dye.jpg|left|200px]]The strength of the NextSeq is its speed and read length coupled with yield. The NextSeq is able to sequence ~10nt/hour which allows it to complete a 150+150nt paired end read, from cluster to fastq files, in two days. Kit sizes for Nextseq are 75, 150, and 300 nt. Currently, the BMC only stocks "High Output" flowcells. At this time, we discourage use of the NextSeq for short single end reads - those are better suited to the HiSeq2000s - and we will prioritize other read lengths before 50nt runs. <BR><BR><BR><BR><BR>
<br>
The NextSeq is ideal for:
* Whole genome sequencing
* Splicing analysis in RNAseq
* Metagenomics
<br>
'''The Nextseq is NOT ideal for:'''
* low complexity libraries such as PCR amplicons
 
<LI>75 kit has 91 cycles
<LI>150 kit has 166 cycles
<LI>300 kit has 316 cycles
 
<LI>This is to allow for a 8+8 dual index. However if you want to put the extra reagents toward your sequencing read (such as 42+6+42 for 75 kit), you are welcome to do so.
<BR><BR>
<B>Data on NextSeq Runs (updated March 2, 2016):</B><BR>
We are closely monitoring the quality of NextSeq runs. With the newly released v2 chemistry for Nextseq, we see a marked increase in quality (pictured). We aim to get the highest number of reads without sacrificing data quality. If you have a preference for high number of reads (slightly lower data quality) or high data quality (slightly lower number of reads), please let us know so that we can adjust your loading concentration accordingly.
[[Image:160302_nextseq_website.png|left|600px]]
<br><br>
<br><br>
<br><br>
<br><br>
<br><br>
<br><br>
<br><br>
<br><br>
<br><br>
'''Figure: Proportion of reads with zero mismatches at cycle 75, on camera 5, measured by aligning phix spike-in reads to phix genome.'''
<br><br>
''NextSeq500s were donated to the BioMicro Center by Drs. Penny Chisholm, Doug Lauffenburger, Myriam Heiman, Li-Huei Tsai and the Dept. of Biology and the Koch Institute. ''
<br> <br>
 
=== Genome Analyzer IIx ===
=== Genome Analyzer IIx ===
[[Image:GAIIxcollage.jpg|right|200px]] The Genome Analyzer II (GAII) are the oldest sequencers in the BioMicro Center and remain the most flexible. The newer generations of Illumina sequencers have been designed with increasing focus on clinical applications and have removed some of the "hands on" aspects of the older GAIIs. The GAIIs remain the only sequencers where the actual images of the flowcell can be reprocessed for example. The GAII/IIx can produce 20-40m reads per lane passing filter and typically runs read lengths of 36-150nt per side.<BR><BR>
[[Image:GAIIxcollage.jpg|right|200px]] The Genome Analyzer II (GAII) is the oldest sequencers in the BioMicro Center and remain the most flexible. The newer generations of Illumina sequencers have been designed with increasing focus on clinical applications and have removed some of the "hands on" aspects of the older GAIIs. The GAIIs remain the only sequencers where the actual images of the flowcell can be reprocessed for example. The GAII/IIx can produce 20-40m reads per lane passing filter and typically runs read lengths of 36-150nt per side.<BR><BR>
With the addition of the MiSeq, we have reworked how we are processing GAII flowcells. We have been able to create partial flowcells on the GAII by altering recipes. This has allowed us to move from a model like the HiSeq where we need a full flowcell before we run to a model where we can run as soon as the samples pass quality control, more like the MiSeq. However, unlike the MiSeq, we can run multiple lanes at once. Some critical caveats: First, these methods are not supported by Illumina so we cannot offer to replace failed runs. Second, unlike the HiSeq, the PhiX lane is *not* included. You must choose to sequence a lane of PhiX if you want to do control normalization. Finally, this service is completely "a la carte" so the pricing schema is quite different. <BR><BR>
With the addition of the MiSeq, we have reworked how we are processing GAII flowcells. We have been able to create [[BioMicroCenter:PartialFlowcells|'''partial flowcells''']] on the GAII by altering recipes. This has allowed us to move from a model like the HiSeq where we need a full flowcell before we run to a model where we can run as soon as the samples pass quality control, more like the MiSeq. However, unlike the MiSeq, we can run multiple lanes at once. Some critical caveats: First, these methods are not supported by Illumina so we cannot offer to replace failed runs. Second, unlike the HiSeq, the PhiX lane is *not* included. You must choose to sequence a lane of PhiX if you want to do control normalization. Finally, this service is completely "a la carte" so the pricing schema is quite different. <BR><BR>
{| border=1 align="right"
{| border=1 align="right"
  ! # of Lanes
  ! # of Lanes
  !width=75| cycles per day
  !width=75| cycles per day
  !width=75| cycles per kit
  !width=75| cycles per kit
  |-
  |-align="center"
  | 8  
  | 8  
  | 24  
  | 24  
  | 45
  | 42
  |-
  |-align="center"
  | 4  
  | 4  
  | 36  
  | 36  
  | 72
  | 66/33*
  |-
  |-align="center"
  | 2  
  | 2  
  | 48  
  | 48  
  | 120
  | 106/54*
  |-
  |-align="center"
  | 1  
  | 1  
  | 72  
  | 72  
  | 180
  | 140/81*
|-
|-
|colspan="3" align=center |&#42;''Second number pertains to reads greater than 40 nt.''
|}
|}


Using fewer lanes on each flowcell has allowed us to decrease the cycle time by not imaging all the lanes. In a typical 8 lane run, 20 minutes is spent doing chemistry followed by 40 minutes of imaging (each lane takes ~5 minutes to image). Therefore, a 2 lane flowcell runs twice as fast as an 8 lane flowcell. Also, since the chemistry is not running in to all of the lanes, each sequencing kit can go to a longer read length. The relationships are summarized in the chart on the left. Pricing is set on the number of lanes you are using, the number of days you are running the GAII, and the number of sequencing kits you are using. For example, if you wanted to run a 96+96 PE flowcell using 2 lanes, the cost would be the initial cost for the 2 lane PE flowcell plus an additional 3 days (one day is included in the original price) plus a second sequencing kit. That kit would not be completely used up (you would have an extra 48nt left that would be thrown away).<BR><BR>  
Using fewer lanes on each flowcell has allowed us to decrease the cycle time by not imaging all the lanes. In a typical 8 lane run, 20 minutes is spent doing chemistry followed by 40 minutes of imaging (each lane takes ~5 minutes to image). Therefore, a 2 lane flowcell runs twice as fast as an 8 lane flowcell. Also, since the chemistry is not running in to all of the lanes, each sequencing kit can go to a longer read length. The relationships are summarized in the chart on the left. Pricing is set on the number of lanes you are using, the number of days you are running the GAII, and the number of sequencing kits you are using. For example, if you wanted to run a 75+75 PE flowcell using 2 lanes, the cost would be the initial cost for the 2 lane PE flowcell plus an additional 3 days (one day is included in the original price) plus two additional sequencing kits. The last kit would not be completely used up (you would have an extra 18nt left that would be thrown away).<BR><BR>  


The GAII/GAIIx is ideal for:
The GAII/GAIIx is ideal for:
Line 65: Line 104:
* Non-standard assays such as HITS-FLIP
* Non-standard assays such as HITS-FLIP


''The Genome Analyzer IIs were donated to the BioMicro Center by Drs. Penny Chisholm, Chris Burge, Ernest Fraenkel and the Dept of Biology with contributions from many others ''


  This GAII pricing model is an experimental model and is subject to change
== Platform Comparison ==
 
''The Genome Analyzer IIs were donated to the BioMicro Center by Drs. Penny Chisholm, Chris Burge, Ernest Fraenkel and the Dept of Biology with contributions from many others ''


=== Platform Comparison ===
{| border=1  
{| border=1 align="left"
  !width=200| SPEC
  !width=200| SPEC
  !width=200| HiSeq2000
  !width=200| HiSeq2000
  !width=200| GAII/IIx
  !width=200| GAII/IIx
  !width=200| MiSeq
  !width=200| MiSeq
!width=200| NextSeq
  |- align="center"
  |- align="center"
  | '''Machine Names'''
  | '''Machine Names'''
  | FonZie
  | SamAdams<BR>JackDaniels
  | Ryland<BR>Boris<BR>Preston
  | Boris
  | MiAmore
| MiAmore<BR>Tobias<BR>Whitehead
  | Lucille<BR>George
  |- align="center"
  |- align="center"
  | '''# reads / lane'''
  | '''# reads / lane'''
  | 160-220m
  | 150-220m
  | 20-40m
  | 20-40m
  | 4-6m
  | 8m-25m
| 200m-500m
  |- align="center"
  |- align="center"
  | ''' # lanes coprocessed '''
  | ''' # lanes coprocessed '''
  | 7+PhiX
  | 7+PhiX
  | 1 to 8
  | 1 to 8
| 1
  | 1
  | 1
  |- align="center"
  |- align="center"
  | ''' nt / day '''
  | ''' nt / day '''
  | 24
  | 18
  | 24-72
  | 24-72
  | 288
  | 288
| 150
|- align="center"
| ''' Max Read Length '''
| 100+100
| 80+80 in lane by lane
| 300+300
| 150+150
|}
|}


== Applications ==
== Additional Information ==
Illumina currently provides reagents and support for four major sequencing applications:
 
=== Submission guidelines ===


* [[BioMicroCenter:ChIP_Seq|'''ChIP Seq''']]
Submission guidelines can be found [[BioMicroCenter:FAQ#HOW_LONG_WILL_IT_TAKE_FOR_MY_HISEQ_SAMPLE_TO_BE_SEQUENCED| on our FAQ.]]
* [[BioMicroCenter:Expression_Seq|'''RNA Seq''']]
* [[BioMicroCenter:Small_RNA_Seq|'''Small RNA Sequencing''']]
* [[BioMicroCenter:Genome_Seq|'''Genome sequencing and resequencing''']]


The following application has been published but does not yet have a kit from Illumina:
=== Pricing and Priority ===


* Genotyping: Protocols are being developed for detection of SNPs, chromosomal rearrangements and other genotyping applications. <br>
Full pricing information is available on [[BioMicroCenter:Pricing|our price list]].<br>
<BR>


==Sample Preparation==
Priority for Illumina sequencing is given to labs associated with the BioMicro Center [[BioMicroCenter:CoreDeps|Core departments]] on a first-come first-served basis. We are able offer our services to other MIT and [[BioMicroCenter:FAQ#NON_MIT_USERS|non-MIT]] users as space allows. A full description of priority and queue time expectations can be found [[BioMicroCenter:FAQ#HOW_LONG_WILL_IT_TAKE_FOR_MY_HISEQ_SAMPLE_TO_BE_SEQUENCED| on our FAQ.]]
Illumina sequencing requires the input of libraries which have been properly fragmented, ligated to specific adapters, and, in the case of RNA inputs, converted into complementary DNA. The BioMicro Center accepts fully prepared libraries from users and also offers a variety of [[BioMicroCenter:Illumina Library Preparation|'''sample preparation services''']] for different applications.


Information is also available about [[BioMicroCenter:Multiplex|'''multiplexing''']].
=== Library Preparation ===
<BR><BR>


== QC ==
[[Image:BMC_IlluminaFlowcell.png|Right|300px]] Illumina sequencing requires the input of libraries with inserts between 10 and 1000bp in length and have [[specific adapters]] attached to the 5' and 3' ends. The BioMicro Center accepts custom samples of all types provided the user also submits sequencing primers (though we do not assume responsibility if the samples fail). Samples submitted for Illumina sequencing should be at ~10ng/ul and the user should provide at least 10λ of samples. This is an ideal situation but we do have protocols available to help users with much less concentrated samples. Please submit your sample along with a completed [[BioMicroCenter:Forms|Illumina sequencing form.]]<BR><BR>


Quality control is very important for optimizing the number of reads and the quality of data produced. We run Bioanalyzer and RT-PCR for all submitted cDNA libraries for Illumina sequencing. For more information on QC methods and protocols please visit the [[BioMicroCenter:Sequencing_Quality_Control|Sequencing Quality Control]] page.
In addition to accepting finished libraries, the Biomicro Center supports a number of different [[BioMicroCenter:Illumina Library Preparation|'''sample preparation methodologies''']] for different applications including RNAseq, ChIPseq and genome sequencing. All samples prepped in the BioMicro Center are barcoded for [[BioMicroCenter:Multiplex|'''multiplexing''']].
<BR><BR>


=== Quality Control ===
[[Image:BMC_fastqc.png|thumb|right|300px|screenshot from the fastqc package (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)]]
The BioMicro Center undertakes a number of quality assurance methods to ensure that we produce high quality data for you. All samples submitted for Illumina sequencing are checked for size distribution, presence of proper 5' and 3' adapters, and actual concentration using the [[BioMicroCenter:2100BioAnalyzer|Agilent Bioanalyzer]] and [[BioMicroCenter:RTPCR#Light_Cycler_480_II_Real-time_PCR_Machinesq|qPCR]]. For more information on library quality can be found on the [[BioMicroCenter:Sequencing_Quality_Control|Sequencing Quality Control]] page. <BR>
<BR>
We will skip pre-sequencing QC if the user supplies us with concentration and average fragment length information for each sample submitted. '''However''', different labs often vary substantially in their quantifications and our methods are optimized for our own instruments and operators. We cannot guarantee optimal data output and quality for samples which are quantified outside of the BioMicro Center.<br><br>
Additional quality metrics are done during all sequencing runs as part of the standard Illumina process. All samples are spiked with ~0.5% of the bacteriophage [http://en.wikipedia.org/wiki/Phi_X_174 ΦX174]. The ΦX library is primed off the standard Illumina sequencing primers and is used to both ensure the quality of the reagents used in the run and to measure the background sequencing error rates. ΦX reads will not be detected on non-standard libraries using custom priming. <BR>
<BR>
<BR>
Finally, several additional quality metrics are included in the [[BioMicroCenter:IlluminaPipeline|automated analysis pipeline]] currently under active development in the Center. These include standard metrics of base composition, GC%, library complexity and overrepresented reads that are in the TagCount and [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ Fastqc] files. In addition, we now evaluate libraries for contamination from common laboratory species (human, mouse, yeast and E.coli). More information can be found on the [http://bmc-pipeline.mit.edu/flowcell_data_guide.html Flowcell data guide page.]


== Data Analysis ==
=== Pooling considerations ===
Each flowcell lane flowcell should produce between 10 and 120 million DNA fragments as of September 2011. Understanding this data often requires a significant investment in informatics and many applications require entirely different interpretations of the data. As part of our sequencing service we provide many of the early steps of bioinformatics for different applications. Further data processing can be arranged on a collaborative basis as resources are available. For more information, check out the links below:
When determining how many samples should be combined together in a single lane, the following equations are useful:<br>
 
<UL>
* [[BioMicroCenter:IlluminaDataPipeline#Basics|Illumina pipeline - How it works!]]
<LI>'''# of lanes = (genomesize x coverage x #samples) / (#readsperlane x readlength x ends)'''<br>
* [[BioMicroCenter:IlluminaDataPipeline#Output_Files|Illumina pipeline output formats]]
* [[BioMicroCenter:Computing#BioInformatics_Services|Bioinformatic Consulting]]
<BR>


== Protocols ==
<LI>'''#samplesperlane = (#readsperlane x readlength x ends) / (genomesize x coverage)'''<br>
</UL>
where,<br>


Protocols for all supported technologies can be found [[BioMicroCenter:Protocols| here]].
<UL>
<LI>''# of lanes'' is the total number of lanes that are required to achieve the specified coverage given the other variables<br>
<LI>''#samplesperlane'' is the total number of samples that can be combined into a single lane to achieve the specified coverage given the other variables<br>
<LI>''genomesize'' is the size, in nt, of the library to be sequenced<br>
<LI>''coverage'' is the desired multiplicity of coverage for the library<br>
<LI>''#samples'' is the number of samples needing to be sequenced<br>
<LI>''#readsperlane'' is the number of reads produced by a lane on the sequencer. (See "Platform Comparison" table above for the typical outputs from each platform.)
<LI>''readlength'' is the length, in nt, of each separate read of the run<br>
<LI>''ends'' is the number of insert reads for the run. For single-end, it is 1, and for paired-end, it is 2.<br><br>
</UL>


<BR>
=== Custom Primers ===
== Pricing and Priority ==
Many non-standard Illumina protocols require custom sequencing primers. The design of these oligos is critical for the success of the experiment and we have observed several experimental failures due to improper oligo design. There are a few critical parameters in oligo design.<br>
Full pricing information is available on [[BioMicroCenter:Pricing|our price list]].<br>
<UL>
<LI>First, the oligo must only occur once in each sequence. Multiple binding will result in low quality reads.<br>
<LI>On reverse or index reads, we cannot separate the oligos by lane and so the construct must be compatible with having a cocktail of standard Illumina oligonucleotides in the mix.<br>
<LI>The Tm of the oligo '''must''' match the Tm of the sequencing primer they are designed to replace. Being even a couple degrees below the Tm can result in experimental failure. Any online Tm calculator can be used. The standard Illumina sequences are: <br>
Forward read: 5'ACACTCTTTCCCTACACGACGCTCTTCCGATCT<br>
Reverse read: 5'CGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT<br>
Multiplexing Index Read: 5'GATCGGAAGAGCACACGTCTGAACTCCAGTCAC<br>
</UL>


Priority for Illumina sequencing is currently given to labs associated with the BioMicro Center [[BioMicroCenter:CoreDeps|Core departments]]. We are able offer our services to other MIT and [[BioMicroCenter:FAQ#NON_MIT_USERS|non-MIT]] users as space allows.


<BR>
Custom oligonucleotides should submitted at 100 µM with at least 20 µL provided. At time of submission, please take the time to diagram your primer design for the sequencing technician who can verify its compatibility and help to avoid any unforseen issues.


== Turnaround Time ==
=== Data Analysis ===


Each Genome Analyzer processes 8 samples per run, or 7 samples plus a control. Multiplexed pools of samples count as one for this purpose. The control is typically used to improve sequence quality. Full flowcells can usually be run within two weeks of submission. Partial submissions of less than eight samples (or 7 with control) are put into a project queue, where they join existing samples or await others before processing. Wait times for partial submissions vary depending on demand from other users. Once processing begins, approximately six days are required for clustering, sequencing, and data analysis for a 36-base-pair read. Longer reads add approximately one hour per additional base-pair including a second read and the barcode read, if either are present.
Illumina sequencing at the BioMicro Center includes basic informatic analysis of the data. These steps include:
<br><br>
* Image analysis to locate clusters
* Basecalling
* Demultiplexing of lanes
* Alignment of sequences to a reference genome
* Quality control
* Delivery of the data to a user accessible folder
All of these steps are run by our [http://bmc-pipeline.mit.edu/flowcell_data_guide.html automated analysis pipeline]currently in active development. For users requiring further analysis, we have a staff of [[BioMicroCenter:BioInformaticsStaff||bioinformaticists]] that can assist you in analyzing your data.  
<BR><BR>


== MIT Core Collaboration ==
== MIT Core Collaboration ==
Because of the layout of Illumina flowcells, samples must be run in batches of 7 lanes (a pool of multiplexed samples counts as one lane). In order to ensure quick throughput, we have established a collaboration that allows us to move partial flowcells between the various centers at MIT. For users with less then 4 samples, their samples may be moved between the BioMicro Center, the [http://jura.wi.mit.edu/genomecorewiki/index.php/Main_Page Whitehead Institute Center for Genome Technologies] and the [http://web.mit.edu/biopolymers/www/ Koch Institute Biopolymer Center]. Samples will be moved only to fill out runs or to expedite processing. The Centers are committed to working together to maintain consistent quality between the different cores, so you should see no difference whether your samples are run in BioMicro or at one of our sister centers. Transfers are only available for members of the MIT community.  
Because of the layout of Illumina flowcells, samples must be run in batches of 7 lanes (a pool of multiplexed samples counts as one lane). In order to ensure quick throughput, we have established a collaboration that allows us to move partial flowcells between the various centers at MIT. For users with less then 4 samples, their samples may be moved between the BioMicro Center and the [http://jura.wi.mit.edu/genomecorewiki/index.php/Main_Page Whitehead Institute Center for Genome Technologies]. Samples will be moved only to fill out runs or to expedite processing. The Centers are committed to working together to maintain consistent quality between the different cores, so you should see no difference whether your samples are run in BioMicro or at one of our sister centers. Transfers are only available for members of the MIT community.  
 
[[Media:QueueReport.pdf| '''View current samples queuing for Illumina''']]
<BR>
<BR>




 
'''OLDE LINKS'''
''Initial page written by Summeet Gupta at the WI-CGT''
* [[BioMicroCenter:ChIP|'''ChIP Seq''']]
* [[BioMicroCenter:Small_RNA_Seq|'''Small RNA Sequencing''']]
* [[BioMicroCenter:Genome_Seq|'''Genome sequencing and resequencing''']]
* [[BioMicroCenter:IlluminaDataPipeline#Basics|Illumina pipeline - How it works!]]
* [[BioMicroCenter:IlluminaDataPipeline#Output_Files|Illumina pipeline output formats]]
* [[BioMicroCenter:Computing#BioInformatics_Services|Bioinformatic Consulting]]
* Protocols for all supported technologies can be found [[BioMicroCenter:Protocols| here]]

Revision as of 09:07, 29 June 2016

HOME -- SEQUENCING -- LIBRARY PREP -- HIGH-THROUGHPUT -- COMPUTING -- OTHER TECHNOLOGY

The MIT BioMicro Center has seven high-throughput Illumina sequencers, including two HiSeq 2000s, one Genome Analyzer, two NextSeqs and two MiSeqs. We support a wide variety of applications, such as ChIP-Seq, miRNA sequencing and RNA-seq. Each lane can potentially accomodate dozens of barcoded samples (depending on sequence complexity and desired coverage). Read lengths vary, depending on users, between 36nt and 325nt per end.
Questions about Illumina Sequencing can be directed to Noelani Kamelamela

Illumina Massively Parallel Sequencing

Illumina sequencing works by binding randomly fragmented DNA to an optical flowcell. Fragments are sequenced by sequentially incorporating and imaging fluorescently labeled nucleotides in a “Sequencing-By-Synthesis” reaction. The BioMicro Center uses Illumina's TruSeq v3 reagent kits for HiSeq, v2 chemistry for NextSeq and v2 or v3 chemistry for MiSeq. For an in-depth overview of the Illumina sequencing chemistry, please refer to Kirchner et al 2009.

Platforms

HiSeq 2000

The Illumina HiSeq 2000 is the workhorse of BMC's Illumina fleet and is optimized for maximum yield and the lowest price per basepair. Each lane on the HiSeq typically produces between 150 and 220 million reads passing our quality filter (for high quality libraries). HiSeq flowcells have 8 lanes, one of which is committed to a control sample that is used for base normalization (lane 1). Read lengths on the HiSeq very between 40 and 100nt per side and nearly all flowcells use barcoding to run multiple samples in each lane.

In order to optimize work flow and keep costs under control, only full flowcells are run. Since all 8 lanes of the flowcell must be run at equal lengths, submissions of single lanes must be grouped with other similar read lengths. This means that some read lengths move through our queue faster then others because more samples of that length are submitted to the BioMicro Center for sequencing. 40nt single end (SE) samples are by far the most common and move through the queue rapidly followed by short paired end (40+40) runs. Many lengths are very unusual (eg. 100nt single end) and should instead be submitted for the NextSeq unless you have a full flowcell. If you have questions about this (or any other aspect of sequencing) please do not hesitate to contact us.

The HiSeq2000 is ideal for:

  • SNP detection
  • ChIPseq
  • Bisulfite sequencing
  • Gene Expression
  • Exome sequencing
  • smallRNA

The HiSeq2000s were donated to the BioMicro Center by Drs. Penny Chisholm and Chris Burge and HHMI

MiSeq

The MiSeq is a newer sequencer in the BioMicro Center. Unlike the HiSeq, the MiSeq is optimized for speed. The MiSeq has a single lane that can produce up to 25 million reads passing filter (ideal cases). The MiSeq does *not* have a control lane so having good base balance is critical for runs on this sequencer. Amplicons, such as 16S, can be run on the sequencer but should be constructed to have complexity in the first several bases. Highly unbalanced libraries, such as RRBS, should not be run on the MiSeq.

The strength of the MiSeq is its speed and read length. The MiSeq is able to sequence 14nt/hour which allows it to complete a 150+150nt paired end read, from cluster to fastq files, in a little less than a day. This compares to 2-3 weeks of sequencing on the HiSeq. Because the chemistry is on the flowcell for less time, error rates are much lower for the MiSeq then the HiSeq. MiSeq runs are available in 50, 150*, 300, 500 and 600*nt flavors. (*) - v3 kits can reach 25m reads. Other kits can only reach 15m

The 50 cycle kit can accommodate up to 70bp read length (single-end or 30+30 paired-end). The 300 cycle kit can accommodate up to 350bp read length, while the 500 cycle kit can accommodate up to 520bp read length. Illumina read quality at long length has declined in recent years and reads longer than 250PE have had mixed results.

The MiSeq is ideal for:

  • Small genome resequencing
  • Targeted resequencing
  • Metagenomics
  • smRNA
  • de novo sequencing.

MiSeqs were donated to the Biomicro Center by Drs. Chris Love, Michael Birnbaum and the Dept. of Biological Engineering.

NextSeq500

The NextSeq is the newest sequencer in the BioMicro Center. The NextSeq can be thought of as a MiSeq on steroids. Optimized for speed and yield, the NextSeq has a single lane that can produce up to 500 million reads passing filter (ideal cases). This yield does come at a slightly lower quality, and while most Illumina machines operate well above their specifications, the NextSeq has less margin. Like the Miseq, the NextSeq does *not* have a control lane, so having good base balance is critical for runs on this sequencer. In addition, the NextSeq chemistry only uses 2 fluorophores instead of 4 which can complicate some experimental designs. Amplicons, such as 16S, have not yet been tested on the sequencer and may fail. Highly unbalanced libraries, such as RRBS, have not been run on the NextSeq.

The strength of the NextSeq is its speed and read length coupled with yield. The NextSeq is able to sequence ~10nt/hour which allows it to complete a 150+150nt paired end read, from cluster to fastq files, in two days. Kit sizes for Nextseq are 75, 150, and 300 nt. Currently, the BMC only stocks "High Output" flowcells. At this time, we discourage use of the NextSeq for short single end reads - those are better suited to the HiSeq2000s - and we will prioritize other read lengths before 50nt runs.





The NextSeq is ideal for:

  • Whole genome sequencing
  • Splicing analysis in RNAseq
  • Metagenomics


The Nextseq is NOT ideal for:

  • low complexity libraries such as PCR amplicons
  • 75 kit has 91 cycles
  • 150 kit has 166 cycles
  • 300 kit has 316 cycles
  • This is to allow for a 8+8 dual index. However if you want to put the extra reagents toward your sequencing read (such as 42+6+42 for 75 kit), you are welcome to do so.

    Data on NextSeq Runs (updated March 2, 2016):
    We are closely monitoring the quality of NextSeq runs. With the newly released v2 chemistry for Nextseq, we see a marked increase in quality (pictured). We aim to get the highest number of reads without sacrificing data quality. If you have a preference for high number of reads (slightly lower data quality) or high data quality (slightly lower number of reads), please let us know so that we can adjust your loading concentration accordingly.



















    Figure: Proportion of reads with zero mismatches at cycle 75, on camera 5, measured by aligning phix spike-in reads to phix genome.

    NextSeq500s were donated to the BioMicro Center by Drs. Penny Chisholm, Doug Lauffenburger, Myriam Heiman, Li-Huei Tsai and the Dept. of Biology and the Koch Institute.

    Genome Analyzer IIx

    The Genome Analyzer II (GAII) is the oldest sequencers in the BioMicro Center and remain the most flexible. The newer generations of Illumina sequencers have been designed with increasing focus on clinical applications and have removed some of the "hands on" aspects of the older GAIIs. The GAIIs remain the only sequencers where the actual images of the flowcell can be reprocessed for example. The GAII/IIx can produce 20-40m reads per lane passing filter and typically runs read lengths of 36-150nt per side.

    With the addition of the MiSeq, we have reworked how we are processing GAII flowcells. We have been able to create partial flowcells on the GAII by altering recipes. This has allowed us to move from a model like the HiSeq where we need a full flowcell before we run to a model where we can run as soon as the samples pass quality control, more like the MiSeq. However, unlike the MiSeq, we can run multiple lanes at once. Some critical caveats: First, these methods are not supported by Illumina so we cannot offer to replace failed runs. Second, unlike the HiSeq, the PhiX lane is *not* included. You must choose to sequence a lane of PhiX if you want to do control normalization. Finally, this service is completely "a la carte" so the pricing schema is quite different.

    # of Lanes cycles per day cycles per kit
    8 24 42
    4 36 66/33*
    2 48 106/54*
    1 72 140/81*
    *Second number pertains to reads greater than 40 nt.

    Using fewer lanes on each flowcell has allowed us to decrease the cycle time by not imaging all the lanes. In a typical 8 lane run, 20 minutes is spent doing chemistry followed by 40 minutes of imaging (each lane takes ~5 minutes to image). Therefore, a 2 lane flowcell runs twice as fast as an 8 lane flowcell. Also, since the chemistry is not running in to all of the lanes, each sequencing kit can go to a longer read length. The relationships are summarized in the chart on the left. Pricing is set on the number of lanes you are using, the number of days you are running the GAII, and the number of sequencing kits you are using. For example, if you wanted to run a 75+75 PE flowcell using 2 lanes, the cost would be the initial cost for the 2 lane PE flowcell plus an additional 3 days (one day is included in the original price) plus two additional sequencing kits. The last kit would not be completely used up (you would have an extra 18nt left that would be thrown away).

    The GAII/GAIIx is ideal for:

    • Unusual read lengths
    • Protocol Prototyping
    • Non-standard assays such as HITS-FLIP

    The Genome Analyzer IIs were donated to the BioMicro Center by Drs. Penny Chisholm, Chris Burge, Ernest Fraenkel and the Dept of Biology with contributions from many others

    Platform Comparison

    SPEC HiSeq2000 GAII/IIx MiSeq NextSeq
    Machine Names SamAdams
    JackDaniels
    Boris MiAmore
    Tobias
    Whitehead
    Lucille
    George
    # reads / lane 150-220m 20-40m 8m-25m 200m-500m
    # lanes coprocessed 7+PhiX 1 to 8 1 1
    nt / day 18 24-72 288 150
    Max Read Length 100+100 80+80 in lane by lane 300+300 150+150

    Additional Information

    Submission guidelines

    Submission guidelines can be found on our FAQ.

    Pricing and Priority

    Full pricing information is available on our price list.

    Priority for Illumina sequencing is given to labs associated with the BioMicro Center Core departments on a first-come first-served basis. We are able offer our services to other MIT and non-MIT users as space allows. A full description of priority and queue time expectations can be found on our FAQ.

    Library Preparation

    Right Illumina sequencing requires the input of libraries with inserts between 10 and 1000bp in length and have specific adapters attached to the 5' and 3' ends. The BioMicro Center accepts custom samples of all types provided the user also submits sequencing primers (though we do not assume responsibility if the samples fail). Samples submitted for Illumina sequencing should be at ~10ng/ul and the user should provide at least 10λ of samples. This is an ideal situation but we do have protocols available to help users with much less concentrated samples. Please submit your sample along with a completed Illumina sequencing form.

    In addition to accepting finished libraries, the Biomicro Center supports a number of different sample preparation methodologies for different applications including RNAseq, ChIPseq and genome sequencing. All samples prepped in the BioMicro Center are barcoded for multiplexing.

    Quality Control

    screenshot from the fastqc package (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

    The BioMicro Center undertakes a number of quality assurance methods to ensure that we produce high quality data for you. All samples submitted for Illumina sequencing are checked for size distribution, presence of proper 5' and 3' adapters, and actual concentration using the Agilent Bioanalyzer and qPCR. For more information on library quality can be found on the Sequencing Quality Control page.

    We will skip pre-sequencing QC if the user supplies us with concentration and average fragment length information for each sample submitted. However, different labs often vary substantially in their quantifications and our methods are optimized for our own instruments and operators. We cannot guarantee optimal data output and quality for samples which are quantified outside of the BioMicro Center.

    Additional quality metrics are done during all sequencing runs as part of the standard Illumina process. All samples are spiked with ~0.5% of the bacteriophage ΦX174. The ΦX library is primed off the standard Illumina sequencing primers and is used to both ensure the quality of the reagents used in the run and to measure the background sequencing error rates. ΦX reads will not be detected on non-standard libraries using custom priming.

    Finally, several additional quality metrics are included in the automated analysis pipeline currently under active development in the Center. These include standard metrics of base composition, GC%, library complexity and overrepresented reads that are in the TagCount and Fastqc files. In addition, we now evaluate libraries for contamination from common laboratory species (human, mouse, yeast and E.coli). More information can be found on the Flowcell data guide page.

    Pooling considerations

    When determining how many samples should be combined together in a single lane, the following equations are useful:

    • # of lanes = (genomesize x coverage x #samples) / (#readsperlane x readlength x ends)
    • #samplesperlane = (#readsperlane x readlength x ends) / (genomesize x coverage)

    where,

    • # of lanes is the total number of lanes that are required to achieve the specified coverage given the other variables
    • #samplesperlane is the total number of samples that can be combined into a single lane to achieve the specified coverage given the other variables
    • genomesize is the size, in nt, of the library to be sequenced
    • coverage is the desired multiplicity of coverage for the library
    • #samples is the number of samples needing to be sequenced
    • #readsperlane is the number of reads produced by a lane on the sequencer. (See "Platform Comparison" table above for the typical outputs from each platform.)
    • readlength is the length, in nt, of each separate read of the run
    • ends is the number of insert reads for the run. For single-end, it is 1, and for paired-end, it is 2.

    Custom Primers

    Many non-standard Illumina protocols require custom sequencing primers. The design of these oligos is critical for the success of the experiment and we have observed several experimental failures due to improper oligo design. There are a few critical parameters in oligo design.

    • First, the oligo must only occur once in each sequence. Multiple binding will result in low quality reads.
    • On reverse or index reads, we cannot separate the oligos by lane and so the construct must be compatible with having a cocktail of standard Illumina oligonucleotides in the mix.
    • The Tm of the oligo must match the Tm of the sequencing primer they are designed to replace. Being even a couple degrees below the Tm can result in experimental failure. Any online Tm calculator can be used. The standard Illumina sequences are:
      Forward read: 5'ACACTCTTTCCCTACACGACGCTCTTCCGATCT
      Reverse read: 5'CGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
      Multiplexing Index Read: 5'GATCGGAAGAGCACACGTCTGAACTCCAGTCAC


    Custom oligonucleotides should submitted at 100 µM with at least 20 µL provided. At time of submission, please take the time to diagram your primer design for the sequencing technician who can verify its compatibility and help to avoid any unforseen issues.

    Data Analysis

    Illumina sequencing at the BioMicro Center includes basic informatic analysis of the data. These steps include:

    • Image analysis to locate clusters
    • Basecalling
    • Demultiplexing of lanes
    • Alignment of sequences to a reference genome
    • Quality control
    • Delivery of the data to a user accessible folder

    All of these steps are run by our automated analysis pipelinecurrently in active development. For users requiring further analysis, we have a staff of |bioinformaticists that can assist you in analyzing your data.

    MIT Core Collaboration

    Because of the layout of Illumina flowcells, samples must be run in batches of 7 lanes (a pool of multiplexed samples counts as one lane). In order to ensure quick throughput, we have established a collaboration that allows us to move partial flowcells between the various centers at MIT. For users with less then 4 samples, their samples may be moved between the BioMicro Center and the Whitehead Institute Center for Genome Technologies. Samples will be moved only to fill out runs or to expedite processing. The Centers are committed to working together to maintain consistent quality between the different cores, so you should see no difference whether your samples are run in BioMicro or at one of our sister centers. Transfers are only available for members of the MIT community.


    OLDE LINKS