BioMicroCenter:CoverageCalculations

Determining ideal read length and depth of coverage
The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. The Illumina GAIIx is capable of single-end and paired-end sequencing from 36nt to a maximum of 150nt in units of 36, while the HiSeq2000 is capable of single-end and paired-end sequencing from 40nt to 100nt in units of 40.

As a flowcell is being run, reads undergo internal quality control, filtering out unreliable reads. The GAIIx averages ~25 million reliable reads (clusters) per lane, while the HiSeq2000 averages ~100 million. Determining the ideal parameters for a sequencing run requires knowledge of the genome being sequenced.

For example, say a 150Mbp genome needs to be sequenced at 5X coverage, requiring 750Mbp of data output (150Mbp*5). This can be reliably sequenced using a standard 36bp single-end lane on the Illumina GAIIx, which produces ~900Mbp of data on average (36bp/read * 25M reads). If a larger genome is being sequenced, for example, one that is 300Mbp, 1.5Gbp is the target data output (300Mbp*5), so a standard +36bp single-end lane may not be sufficient. However, a 72nt single-end lane(72bp/read * 25M reads = 1.8Gbp), or a 36nt paired-end run (36bp/read * 2 * 25M reads = 1.8Gbp) would be fine.

Multiplexing is useful for applications requiring a lower data output per sample. Sequencing Saccharomyces, which has a ~12.5Mbp genome, at 5X coverage requires 62.5Mbp of data. Multiplexing 10 samples on one lane in a 36nt single read flowcell would require 625Mbp of output to achieve the desired coverage. As stated above, the average output for one lane is ~900Mbp of data, so multiplexing the 10 samples into one lane provides sufficient coverage while reducing cost. It is important to note that while the multiplexing process adds 6-bp barcodes to the libraries, they are read separately from the main read and therefore do not affect read length.

Paired-end runs sequence DNA in both the forward and reverse directions from the two ends of the same DNA fragments, allowing for the use of long-range sequence information during alignment of the genome. Paired-end, long read (>80nt) runs are preferred for some applications such as de novo sequencing.