BioMicroCenter:CoverageCalculations: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(New page: {{BioMicroCenter}} ==Determining ideal read length and depth of coverage== The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. The Illum...)
 
Line 6: Line 6:
As a flowcell is being run, reads undergo internal quality control, filtering out unreliable reads.  The GAIIx averages ~25 million reliable reads (clusters) per lane, while the HiSeq2000 averages ~100 million.  Determining the ideal parameters for a sequencing run requires knowledge of the genome being sequenced.  
As a flowcell is being run, reads undergo internal quality control, filtering out unreliable reads.  The GAIIx averages ~25 million reliable reads (clusters) per lane, while the HiSeq2000 averages ~100 million.  Determining the ideal parameters for a sequencing run requires knowledge of the genome being sequenced.  


For example, a 150Mbp genome needs to be sequenced at 5X coverage, requiring 750Mbp of data output (150*5). This can be confidently sequenced using a standard 36bp single-end lane on the Illumina GAIIx, which produces ~900Mbp of data on average (36*25).  If a larger genome is being sequenced, for example, one that is 300Mbp, 1.5Gbp is the target data output (300*5), so a standard +36bp single-end lane may not be sufficient, but a 72nt single-end (25*72=1.8Gbp), or a 36nt paired end run (50*36=1.8Gbp) would be fine.
For example, say a 150Mbp genome needs to be sequenced at 5X coverage, requiring 750Mbp of data output (150Mbp*5). This can be reliably sequenced using a standard 36bp single-end lane on the Illumina GAIIx, which produces ~900Mbp of data on average (36bp/read * 25M reads).  If a larger genome is being sequenced, for example, one that is 300Mbp, 1.5Gbp is the target data output (300Mbp*5), so a standard +36bp single-end lane may not be sufficient. However, a 72nt single-end lane(72bp/read * 25M reads = 1.8Gbp), or a 36nt paired-end run (36bp/read * 2 * 25M reads = 1.8Gbp) would be fine.


Multiplexing is useful for applications requiring a lower data output per sample.  Sequencing Saccharomyces, which has a ~12.5Mbp genome, at 5X coverage requires 62.5Mbp of data.  Multiplexing 10 samples on one lane in a 36nt single read flowcell would require 625Mbp of output to achieve the desired coverage.  As stated above, the average output for one lane is ~900Mbp of data, so multiplexing the 10 samples into one lane provides sufficient coverage while saving money. It is important to note that while the multiplexing process adds 6bp barcodes to the Illumina libraries, they are read separately and therefore do not affect read length.
Multiplexing is useful for applications requiring a lower data output per sample.  Sequencing Saccharomyces, which has a ~12.5Mbp genome, at 5X coverage requires 62.5Mbp of data.  Multiplexing 10 samples on one lane in a 36nt single read flowcell would require 625Mbp of output to achieve the desired coverage.  As stated above, the average output for one lane is ~900Mbp of data, so multiplexing the 10 samples into one lane provides sufficient coverage while reducing cost. It is important to note that while the multiplexing process adds 6-bp barcodes to the libraries, they are read separately from the main read and therefore do not affect read length.


Paired-end runs sequence DNA in both the forward and reverse directions from the two ends of the same DNA fragments, allowing for the use of long-range sequence information during alignment of the genome.  Paired-end, long read (>80nt) runs are preferred for some applications such as de novo sequencing.
Paired-end runs sequence DNA in both the forward and reverse directions from the two ends of the same DNA fragments, allowing for the use of long-range sequence information during alignment of the genome.  Paired-end, long read (>80nt) runs are preferred for some applications such as de novo sequencing.

Revision as of 12:59, 27 June 2011

HOME -- SEQUENCING -- LIBRARY PREP -- HIGH-THROUGHPUT -- COMPUTING -- OTHER TECHNOLOGY

Determining ideal read length and depth of coverage

The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. The Illumina GAIIx is capable of single-end and paired-end sequencing from 36nt to a maximum of 150nt in units of 36, while the HiSeq2000 is capable of single-end and paired-end sequencing from 40nt to 100nt in units of 40.

As a flowcell is being run, reads undergo internal quality control, filtering out unreliable reads. The GAIIx averages ~25 million reliable reads (clusters) per lane, while the HiSeq2000 averages ~100 million. Determining the ideal parameters for a sequencing run requires knowledge of the genome being sequenced.

For example, say a 150Mbp genome needs to be sequenced at 5X coverage, requiring 750Mbp of data output (150Mbp*5). This can be reliably sequenced using a standard 36bp single-end lane on the Illumina GAIIx, which produces ~900Mbp of data on average (36bp/read * 25M reads). If a larger genome is being sequenced, for example, one that is 300Mbp, 1.5Gbp is the target data output (300Mbp*5), so a standard +36bp single-end lane may not be sufficient. However, a 72nt single-end lane(72bp/read * 25M reads = 1.8Gbp), or a 36nt paired-end run (36bp/read * 2 * 25M reads = 1.8Gbp) would be fine.

Multiplexing is useful for applications requiring a lower data output per sample. Sequencing Saccharomyces, which has a ~12.5Mbp genome, at 5X coverage requires 62.5Mbp of data. Multiplexing 10 samples on one lane in a 36nt single read flowcell would require 625Mbp of output to achieve the desired coverage. As stated above, the average output for one lane is ~900Mbp of data, so multiplexing the 10 samples into one lane provides sufficient coverage while reducing cost. It is important to note that while the multiplexing process adds 6-bp barcodes to the libraries, they are read separately from the main read and therefore do not affect read length.

Paired-end runs sequence DNA in both the forward and reverse directions from the two ends of the same DNA fragments, allowing for the use of long-range sequence information during alignment of the genome. Paired-end, long read (>80nt) runs are preferred for some applications such as de novo sequencing.