BioMicroCenter:CoverageCalculations: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(New page: {{BioMicroCenter}} ==Determining ideal read length and depth of coverage== The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. The Illum...)
 
 
(One intermediate revision by one other user not shown)
Line 2: Line 2:


==Determining ideal read length and depth of coverage==
==Determining ideal read length and depth of coverage==
The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. The Illumina GAIIx is capable of single-end and paired-end sequencing from 36nt to a maximum of 150nt in units of 36, while the HiSeq2000 is capable of single-end and paired-end sequencing from 40nt to 100nt in units of 40.  
The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. Often, it is useful to calculate the expected average coverage. <BR><BR>


As a flowcell is being run, reads undergo internal quality control, filtering out unreliable reads.  The GAIIx averages ~25 million reliable reads (clusters) per lane, while the HiSeq2000 averages ~100 million.  Determining the ideal parameters for a sequencing run requires knowledge of the genome being sequenced.
Coverage for genomic samples can be calculated as:


For example, a 150Mbp genome needs to be sequenced at 5X coverage, requiring 750Mbp of data output (150*5). This can be confidently sequenced using a standard 36bp single-end lane on the Illumina GAIIx, which produces ~900Mbp of data on average (36*25).  If a larger genome is being sequenced, for example, one that is 300Mbp, 1.5Gbp is the target data output (300*5), so a standard +36bp single-end lane may not be sufficient, but a 72nt single-end (25*72=1.8Gbp), or a 36nt paired end run (50*36=1.8Gbp) would be fine.
  no.reads(1/2) * readlength * no.cluster
  ---------------------------------------
              genome size


Multiplexing is useful for applications requiring a lower data output per sample.  Sequencing Saccharomyces, which has a ~12.5Mbp genome, at 5X coverage requires 62.5Mbp of data.  Multiplexing 10 samples on one lane in a 36nt single read flowcell would require 625Mbp of output to achieve the desired coverage.  As stated above, the average output for one lane is ~900Mbp of data, so multiplexing the 10 samples into one lane provides sufficient coverage while saving money. It is important to note that while the multiplexing process adds 6bp barcodes to the Illumina libraries, they are read separately and therefore do not affect read length.
For ChIP samples, the following modified formula can be used:


Paired-end runs sequence DNA in both the forward and reverse directions from the two ends of the same DNA fragments, allowing for the use of long-range sequence information during alignment of the genomePaired-end, long read (>80nt) runs are preferred for some applications such as de novo sequencing.
      no.reads * readlength * no.cluster
  -----------------------------------------
  no.sites * site.length / % reads in sites
 
Some standard genome sizes:
{| border=1
|Species
|Length
|-
|E coli
|4 Mbp
|-
|S.cerevisiae
|12.5Mbp
  |-
|C.elegans / Drosophila
|100-150Mbp
|-
|Human / Mouse / Rat
|3Gbp
|}

Latest revision as of 19:21, 10 January 2013

HOME -- SEQUENCING -- LIBRARY PREP -- HIGH-THROUGHPUT -- COMPUTING -- OTHER TECHNOLOGY

Determining ideal read length and depth of coverage

The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. Often, it is useful to calculate the expected average coverage.

Coverage for genomic samples can be calculated as:

 no.reads(1/2) * readlength * no.cluster
 ---------------------------------------
             genome size 

For ChIP samples, the following modified formula can be used:

      no.reads * readlength * no.cluster
  -----------------------------------------
  no.sites * site.length / % reads in sites

Some standard genome sizes:

Species Length
E coli 4 Mbp
S.cerevisiae 12.5Mbp
C.elegans / Drosophila 100-150Mbp
Human / Mouse / Rat 3Gbp