BioMicroCenter:CoverageCalculations

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(Determining ideal read length and depth of coverage)
Current revision (21:21, 10 January 2013) (view source)
(Determining ideal read length and depth of coverage)
 
Line 2: Line 2:
==Determining ideal read length and depth of coverage==
==Determining ideal read length and depth of coverage==
-
The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. The Illumina GAIIx is capable of single-end and paired-end sequencing from 36nt to a maximum of 150nt in units of 36, while the HiSeq2000 is capable of single-end and paired-end sequencing from 40nt to 100nt in units of 40.  
+
The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. Often, it is useful to calculate the expected average coverage. <BR><BR>
-
As a flowcell is being run, reads undergo internal quality control, filtering out unreliable reads.  The GAIIx averages ~25 million reliable reads (clusters) per lane, while the HiSeq2000 averages ~100 million.  Determining the ideal parameters for a sequencing run requires knowledge of the genome being sequenced.
+
Coverage for genomic samples can be calculated as:
-
For example, say a 150Mbp genome needs to be sequenced at 5X coverage, requiring 750Mbp of data output (150Mbp*5). This can be reliably sequenced using a standard 36bp single-end lane on the Illumina GAIIx, which produces ~900Mbp of data on average (36bp/read * 25M reads).  If a larger genome is being sequenced, for example, one that is 300Mbp, 1.5Gbp is the target data output (300Mbp*5), so a standard +36bp single-end lane may not be sufficient. However, a 72nt single-end lane(72bp/read * 25M reads = 1.8Gbp), or a 36nt paired-end run (36bp/read * 2 * 25M reads = 1.8Gbp) would be fine.
+
  no.reads(1/2) * readlength * no.cluster
 +
  ---------------------------------------
 +
              genome size
-
Multiplexing is useful for applications requiring a lower data output per sample.  Sequencing Saccharomyces, which has a ~12.5Mbp genome, at 5X coverage requires 62.5Mbp of data.  Multiplexing 10 samples on one lane in a 36nt single read flowcell would require 625Mbp of output to achieve the desired coverage.  As stated above, the average output for one lane is ~900Mbp of data, so multiplexing the 10 samples into one lane provides sufficient coverage while reducing cost. It is important to note that while the multiplexing process adds 6-bp barcodes to the libraries, they are read separately from the main read and therefore do not affect read length.
+
For ChIP samples, the following modified formula can be used:
-
Paired-end runs sequence DNA in both the forward and reverse directions from the two ends of the same DNA fragments, allowing for the use of long-range sequence information during alignment of the genome.  Paired-end, long read (>80nt) runs are preferred for some applications such as de novo sequencing.
+
      no.reads * readlength * no.cluster
 +
  -----------------------------------------
 +
  no.sites * site.length / % reads in sites
 +
 
 +
Some standard genome sizes:
 +
{| border=1
 +
|Species
 +
|Length
 +
|-
 +
|E coli
 +
|4 Mbp
 +
|-
 +
|S.cerevisiae
 +
  |12.5Mbp
 +
|-
 +
|C.elegans / Drosophila
 +
|100-150Mbp
 +
|-
 +
|Human / Mouse / Rat
 +
|3Gbp
 +
|}

Current revision

Image:BioMicroCenter-header6.jpg

Determining ideal read length and depth of coverage

The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. Often, it is useful to calculate the expected average coverage.

Coverage for genomic samples can be calculated as:

 no.reads(1/2) * readlength * no.cluster
 ---------------------------------------
             genome size 

For ChIP samples, the following modified formula can be used:

      no.reads * readlength * no.cluster
  -----------------------------------------
  no.sites * site.length / % reads in sites

Some standard genome sizes:

Species Length
E coli 4 Mbp
S.cerevisiae 12.5Mbp
C.elegans / Drosophila 100-150Mbp
Human / Mouse / Rat 3Gbp
Personal tools