BioMicroCenter:PacBio: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
 
(53 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{BioMicroCenter}}
{{BioMicroCenter}}
==PAC BIO SEQUENCING ==
 
The BioMicro Center is able to offer PacBio sequencing through a collaboration with the [http://www.umassmed.edu/Content.aspx?id=42260 U.Mass Medical School Deep Sequencing Core]. Samples are collected by the BioMicro Center and sequenced at U.Mass. Data is returned to the BioMicro Center for delivery to the user.
== PACBIO SEQUEL ==
{|
|- style="vertical-align: top;"
|style="width: 400px;"|
{| class="wikitable" border=1
  !
  !SEQUEL SEQUENCING
  |-
  |CHEMISTRY <BR> FLOWCELL
  | [https://www.pacb.com/products-and-services/sequel-system/latest-system-release/ PacBio Sequel v3] <BR> 1m wells loaded by poissonian distribution
  |-
  |INPUT || Completed SMRTbell Libraries
* 12uL
* >1nM
''1nM ~ 0.65ng/ ul/ kb insert''
  |-
  |INCLUDED SERVICES ||
* Sample QC on [[BioMicroCenter:QC#AATI_FEMTO_PULSE|FemtoPulse]]
* Sequencing
* Primary analysis and CCS analysis
* Data storage for 1y
  |-
  |ADDITIONAL SERVICES ||
*[[BioMicroCenter:PacBio_Library_Preparation|PacBio Library Prep]]
*[[BioMicroCenter:PippinPrep|Pippin Prep]]
  |-
  |PRICING || [[BioMicroCenter:Pricing#PACBIO_SEQUENCING|LINK]]
  |-
  |SUBMISSION || MIT - [https://mit.ilabsolutions.com/service_item/new/3381?spt_id=9912 ilabs] <BR> External - [[BioMicroCenter:Forms|form]]
  |-
  |DONATED BY ||
MIT VPR <BR>
Simon's Foundation (Penny Chisholm and Martin Polz) <BR>
John Essigmann <BR>
Michael Birnbaum<BR>
Chris Burge <BR>
Mary Gehring <BR>
Department of Biological Engineering <BR>
Department of Chemistry <BR>
|}
|
[[Image:sequel.jpeg|right]]
|}
 
== [http://www.pacb.com/products/smrt-technology/ HOW SINGLE MOLECULE SEQUENCING WORKS ]==
== [http://www.pacb.com/products/smrt-technology/ HOW SINGLE MOLECULE SEQUENCING WORKS ]==
The RSII works by detecting DNA replication in real time. Fluors are attached to the gamma-phosphate of the nucleotides. A DNA template with a single polymerase is loaded into each well. When a nucleotide enters the active site, it can be detected by the sequencer. Basepaired nucleotides have a longer dwell time allowing them to be detected. <BR><BR>
The Sequel works by detecting DNA replication in real time. Fluors are attached to the gamma-phosphate of the nucleotides. A PacBio library with a single polymerase is loaded into each well. When a nucleotide enters the active site, it can be detected by the sequencer. Basepaired nucleotides have a longer dwell time allowing them to be detected. <BR><BR>
The RSII is able to monitor ~80,000 reactions occurring simultaneously on a SMRTcell. Each run creates a “movie” of the incorporation of bases, collecting data on the intensity of each fluor in each well over time. On board computation converts this movie into basecall files.
The Sequel is able to monitor >500,000 incorporations occurring simultaneously on a SMRTcell. Each run creates a “movie” of the incorporation of bases, collecting data on the intensity of each fluorophore in each well over time. On board computation converts this movie into basecall files. <BR><br>
Using an analysis mode called Circular Consensus Sequencing, high fidelity (HiFi) subreads can be assembled when libraries are sequenced end to end in rolling fashion multiple times.  In CCS, a 2 kbp amplicon can easily be sequenced over 10 times end to end over the course of a 20 hr movie.  A mode called Continuous Long Read Sequencing simply sequences a subread sequenced over the length of the run, either 10 or 20 hours.  In CLR mode, rare lengths of 175 kbp have been seen, but in the Center, it is more typical to see read lengths 50-70 kbp:  the longest reads are typically outliers since the average subread length is typically <50 kbp.
 
== EXPECTED RESULTS ==
== EXPECTED RESULTS ==
TYPICAL RESULTS FROM SMRTCELLS:
TYPICAL RESULTS FROM SMRTCELLS:
* 20kb fragments – 50,000 active sites – 400Mbp sequence
* 20kb fragments – 50,000 active sites – 400Mbp sequence
* 2kb fragments – 80,000 active sites – 500Mbp sequence
* 2kb fragments – 80,000 active sites – 500Mbp sequence
Typical Length Distribution
 
* Insert Figure Here
Typical Read Lengths
{|border=1 align="left"
|-
!Genomic!!Amplicon (<2 kb)
|-
|[[IMAGE:PacBioSequelTypicalv2.jpg|thumb|left|400px|Genomic DNA prepared with Template Kit 1.0 and BluePippin on Sequel 2.1 Chemistry]]||[[IMAGE:PacBioSequelTypicalv2-amp.jpg|thumb|left|400px|PCR Amplicon prepared with Express Template Kit 2.0 on Sequel 3.0 Chemistry]]
|-
|Longest Inserts of around 60 kbp in CLR||Shorter Inserts mean higher Quality Scores in CCS
|}
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
 
== APPLICATIONS ==
== APPLICATIONS ==
There are several applications for which PacBio sequencing are well suited (and many it is not!). The major ones are below:
There are several applications for which PacBio sequencing are well suited (and many it is not!). The major ones are below:
=== [http://www.pacb.com/applications/denovo/  DE NOVO ASSEMBLY / LARGE SCALE MAPPING] ===
=== [http://www.pacb.com/applications/denovo/  DE NOVO ASSEMBLY / LARGE SCALE MAPPING] ===
Assembly is the biggest strength of PacBio. While the individual base error rate is quite high (~14%), the errors are random and so resequencing rapidly lowers the error rate. Using only PacBio reads, 60x coverage can give very good assembly – typically completing genomes in 1 shot . A 4 Mbp genome can be done on a single SMRT cell. <BR><BR>
Assembly is the biggest strength of PacBio. While the individual base error rate is quite high (~14%), the errors are random and so resequencing rapidly lowers the error rate. Using only PacBio reads, 60x coverage can give very good assembly – typically completing genomes in 1 shot . A 4 Mbp genome can be done on a single SMRT cell. <BR><BR>
A second strategy is to use PacBio reads to supplement illumine reads to join scaffolds.  Here, standard MiSeq reads make up the bulk of the assembly but the separate scaffolds are spanned by PacBio reads. 7x reads has been the convention for this type of sequencing. 40 Mbp of genomic DNA can be done on a single SMRT cell. Indexing cannot be done easily with PacBio samples - particularly long reads - and so the 'quanta' for SMRTcells should be considered as 1 genome per SMRTcell.
A second strategy is to use PacBio reads to supplement Illumina reads to join scaffolds.  Here, standard MiSeq reads make up the bulk of the assembly but the separate scaffolds are spanned by PacBio reads. 7x reads has been the convention for this type of sequencing. 40 Mbp of genomic DNA can be done on a single SMRT cell. Indexing cannot be done easily with PacBio samples - particularly long reads - and so the 'quanta' for SMRTcells should be considered as 1 genome per SMRTcell.


=== [http://www.pacb.com/applications/target/index.html AMPLICON / RESEQUENCING] ===
=== [http://www.pacb.com/applications/target/index.html AMPLICON / RESEQUENCING] ===
Line 22: Line 77:
Indexing is possible with resequencing amplicons but indexes should be added during PCR and not during library preparation.
Indexing is possible with resequencing amplicons but indexes should be added during PCR and not during library preparation.


=== WHOLE TRANSCRIPT ===
=== [https://www.pacb.com/applications/rna-sequencing/ WHOLE TRANSCRIPT] ===
Unlike Illumina RNAseq, which requires fragmenting of the RNA, full length cDNAs can be sequenced on the PacBio RS. This allows direct detection of different splice isoforms. The low number of reads makes using this as a counting method challenging.  
Unlike Illumina RNAseq, which requires fragmenting of the RNA, full length cDNAs can be sequenced on the PacBio Sequel. This allows direct detection of different splice isoforms. The low number of reads makes using this as a counting method challenging.  
 
=== [http://www.pacb.com/applications/base_modification/index.html BACTERIAL BASE MODIFICATION] ===
=== [http://www.pacb.com/applications/base_modification/index.html BACTERIAL BASE MODIFICATION] ===
One aspect that makes PacBio sequencing unique is its ability to recognize chemical modifications of bases. Because the instrument works by measuring the time a nucleotide is in the active site, it has an axis of “time”. Basepairing modified bases has different kinetics which allows the sequencer to provide a likelihood score that any given base has been modified. Methylcytosine, Hydroxymethylcytosine, 6-methyl adenosine and many others have been detected with the RSII.
One aspect that makes PacBio sequencing unique is its ability to recognize chemical modifications of bases. Because the instrument works by measuring the time a nucleotide is in the active site, it has an axis of “time”. Basepairing modified bases has different kinetics which allows the sequencer to provide a likelihood score that any given base has been modified. Methylcytosine, Hydroxymethylcytosine, 6-methyl adenosine and many others have been detected with the RSII.
== SAMPLE PROCESSING AT BIOMICRO ==
PacBio Sequencing is done through a collaboration with the [http://www.umassmed.edu/Content.aspx?id=42260 U. Mass Deep Sequencing Core Facility (MDSC)]. They have operated an RSII for many years and are the HHMI regional center for the Northeast. Together with the MDSC, we have built a pipeline to get samples on to their system and data in your hands as fast as possible.  The goal is to be as seamless and painless as possible for you.
* '''Samples are dropped off in BioMicro''': We’ll serve as a central drop off point. We will have our own forms which will incorporate all the information both they and us will need, including MIT cost object.
* '''Samples will have initial prep and QC in BioMicro'''.  We have implemented all the QC metrics being used in the MDSC to all us to identify good and bad samples and not send the bad samples out to Worcester for sequencing. These will utilize equipment already in the core including the AATI Fragment Analyzer and the Sage BluePippin.
* '''Samples will be Couriered to U.Mass for Sequencing'''. Once samples are ready, they will be shipped same day to Worcester. Queue times are typically very short (each run is only a few hours).
* '''Data will be sent back to BioMicro and placed in your public fodlers'''. We have provided U.Mass with direct access to our servers to allow direct delivery of the data. We will then rename the files appropriately and provide it to the end users.
* '''Data analysis will be done by BioMicro''': While the PacBio RS has time available, informatics and cluster time are more precious resources. These will be handled on site here.
* '''Billing will be done through BioMicro/iLabs''': For simplicity, U.Mass will directly charge us for all sequencing fees. We will then pass the charges on to whatever costobject is appropriate.

Latest revision as of 07:44, 11 June 2020

HOME -- SEQUENCING -- LIBRARY PREP -- HIGH-THROUGHPUT -- COMPUTING -- OTHER TECHNOLOGY

PACBIO SEQUEL

SEQUEL SEQUENCING
CHEMISTRY
FLOWCELL
PacBio Sequel v3
1m wells loaded by poissonian distribution
INPUT Completed SMRTbell Libraries
  • 12uL
  • >1nM

1nM ~ 0.65ng/ ul/ kb insert

INCLUDED SERVICES
  • Sample QC on FemtoPulse
  • Sequencing
  • Primary analysis and CCS analysis
  • Data storage for 1y
ADDITIONAL SERVICES
PRICING LINK
SUBMISSION MIT - ilabs
External - form
DONATED BY

MIT VPR
Simon's Foundation (Penny Chisholm and Martin Polz)
John Essigmann
Michael Birnbaum
Chris Burge
Mary Gehring
Department of Biological Engineering
Department of Chemistry

HOW SINGLE MOLECULE SEQUENCING WORKS

The Sequel works by detecting DNA replication in real time. Fluors are attached to the gamma-phosphate of the nucleotides. A PacBio library with a single polymerase is loaded into each well. When a nucleotide enters the active site, it can be detected by the sequencer. Basepaired nucleotides have a longer dwell time allowing them to be detected.

The Sequel is able to monitor >500,000 incorporations occurring simultaneously on a SMRTcell. Each run creates a “movie” of the incorporation of bases, collecting data on the intensity of each fluorophore in each well over time. On board computation converts this movie into basecall files.

Using an analysis mode called Circular Consensus Sequencing, high fidelity (HiFi) subreads can be assembled when libraries are sequenced end to end in rolling fashion multiple times. In CCS, a 2 kbp amplicon can easily be sequenced over 10 times end to end over the course of a 20 hr movie. A mode called Continuous Long Read Sequencing simply sequences a subread sequenced over the length of the run, either 10 or 20 hours. In CLR mode, rare lengths of 175 kbp have been seen, but in the Center, it is more typical to see read lengths 50-70 kbp: the longest reads are typically outliers since the average subread length is typically <50 kbp.

EXPECTED RESULTS

TYPICAL RESULTS FROM SMRTCELLS:

  • 20kb fragments – 50,000 active sites – 400Mbp sequence
  • 2kb fragments – 80,000 active sites – 500Mbp sequence

Typical Read Lengths

Genomic Amplicon (<2 kb)
Genomic DNA prepared with Template Kit 1.0 and BluePippin on Sequel 2.1 Chemistry
PCR Amplicon prepared with Express Template Kit 2.0 on Sequel 3.0 Chemistry
Longest Inserts of around 60 kbp in CLR Shorter Inserts mean higher Quality Scores in CCS


























APPLICATIONS

There are several applications for which PacBio sequencing are well suited (and many it is not!). The major ones are below:

DE NOVO ASSEMBLY / LARGE SCALE MAPPING

Assembly is the biggest strength of PacBio. While the individual base error rate is quite high (~14%), the errors are random and so resequencing rapidly lowers the error rate. Using only PacBio reads, 60x coverage can give very good assembly – typically completing genomes in 1 shot . A 4 Mbp genome can be done on a single SMRT cell.

A second strategy is to use PacBio reads to supplement Illumina reads to join scaffolds. Here, standard MiSeq reads make up the bulk of the assembly but the separate scaffolds are spanned by PacBio reads. 7x reads has been the convention for this type of sequencing. 40 Mbp of genomic DNA can be done on a single SMRT cell. Indexing cannot be done easily with PacBio samples - particularly long reads - and so the 'quanta' for SMRTcells should be considered as 1 genome per SMRTcell.

AMPLICON / RESEQUENCING

PacBio reads can also be used to looks for variants in specific regions. The long reads allow for better detection of large rearrangements and understanding repetitive regions of the genome.

The use of long reads in assembly can also establish phasing of mutations. Short Illumina reads will typically not be able to span multiple mutations on a single read. PacBio reads are long enough to enable detection of multiple mutations as coming from the same strand.

Indexing is possible with resequencing amplicons but indexes should be added during PCR and not during library preparation.

WHOLE TRANSCRIPT

Unlike Illumina RNAseq, which requires fragmenting of the RNA, full length cDNAs can be sequenced on the PacBio Sequel. This allows direct detection of different splice isoforms. The low number of reads makes using this as a counting method challenging.

BACTERIAL BASE MODIFICATION

One aspect that makes PacBio sequencing unique is its ability to recognize chemical modifications of bases. Because the instrument works by measuring the time a nucleotide is in the active site, it has an axis of “time”. Basepairing modified bases has different kinetics which allows the sequencer to provide a likelihood score that any given base has been modified. Methylcytosine, Hydroxymethylcytosine, 6-methyl adenosine and many others have been detected with the RSII.