Pooling libraries for sequencing
The throughput of modern sequencers grows higher and higher, allowing hundreds of millions of reads in a single microfluidic chamber. For many purposes this is actually far more than a single library requires. To keep sequencing cost-effective, researchers often pool together multiple libraries before sequencing, each with a unique molecular barcode (or unique combination of multiple barcodes). The sequencer then reads each library molecule's biological base sequence as well as the barcode sequence; these barcodes are matched back to the sequences expected from the libraries, and thus each molecule can be attributed to its library of origin even though the libraries were mixed.
The pool of libraries needs to be prepared at a specific molarity for compatibility with the sequencer. Thus when you combine libraries into a pool to sequence together, you must take care to match not only their relative molarities but also the absolute total molarity of the pool.
When preparing the libraries, be sure to use a different multiplexing index (or in Illumina's dual-index scheme, a different pair of indices) for each library you wish to pool. You cannot pool two libraries with the same index or there will be no way to distinguish them after sequencing. If you have a small number of libraries, you may also need to verify that your specific combination of index sequences will work together; see e.g. the TruSeq Library Prep Pooling Guide for more information.
The target concentration is expressed as a molarity (usually in the nM scale), so it is essential to measure the molarity of each library rather than just a mass concentration as typically used for nucleic acid samples (e.g. ng/μL). There are various methods with strengths and weaknesses:
- Nanodrop (UV spectrophotometry): very easy and fast for small numbers of samples, with low input requirement, but also very unreliable since many potential contaminants absorb at similar wavelengths to DNA's; gives mass concentration and purity information (not very useful for sequencing libraries since they are greatly diluted before sequencing) but not length distribution
- Qubit/PicoGreen (fluorometry): somewhat time-consuming, but mostly unaffected by contamination and sensitive to very low inputs (depending on assay volume); gives mass concentration information but not size distribution
- Bioanalyzer/TapeStation (microfluidic analog of electrophoresis): somewhat time-consuming and unreliable; gives detailed information about length distribution and molarity, though quantitative error may be high
- SYBR qPCR: rather time-consuming; very accurate molarity measurements, if libraries can be accurately normalized by average length; insensitive to everything except sequenceable molecules with correct adapters
- TaqMan qPCR: rather time-consuming; very accurate molarity measurements unaffected by length; insensitive to everything except sequenceable molecules with correct adapters
Generally researchers use a combination of Bioanalyzer/TapeStation for length distribution and one of the other methods for molar quantification.
There are several conflicting factors to balance:
- The final molarity of the pool must be at least as much as required in the protocol for your sequencing platform (e.g. the Illumina "Denature and Dilute Libraries Guides"), although you can deviate from the protocol if your libraries are too dilute. The total molarity is the sum of all the individual libraries' molarities in the final volume, e.g. if you add 5 μL of a 10 nM library to 5 μL of a 20 nM library, you have 10 μL of a 15 nM pool.
- On the other hand, library preparation usually yields far higher molarities than required, so it may be prudent to first mix a pool at a higher molarity (e.g. 20 nM for a protocol that only requires 4 nM) and then dilute the pool together.
- If some of your libraries are much more concentrated than others, you can predilute those individually before mixing them all together.
- Try to minimize the volume of libraries that you use for pooling, since you might want to sequence them again later but you might not want the exact same pool with the exact same loading ratios.
- At the same time, try not to design a pooling scheme that requires you to pipet smaller volumes than you can accurately pipet. For example, some pipets are rated for 0.1–2.5 µL, but you may not feel comfortable measuring less than 0.5 µL.
- If the total volume of all the libraries you need to combine is greater than your desired final volume, then obviously you need to change your plan: try decreasing the final combined molarity.
Simple worksheet to help with planning library pools
Why your results still won't be perfect
There are several reasons why you might not see a perfect balance of read counts even if you calculate your pooling numbers very carefully:
- If you used a method other than qPCR to quantify your libraries, you pooled according to total molecules rather than sequenceable molecules, and these numbers may be different.
- Shorter library molecules cluster more efficiently on an Illumina flow cell, so libraries with longer molecules may be underrepresented in the final counts.
- Sequence reads of very low quality are removed by the Illumina sequencer's software, so these clusters are not reflected in the post-filter read counts, and depending on the configuration adapter dimers may not be reported either.
At any rate, what really matters in the end is the number of usable reads (alignable, confidently alignable, high-quality base calls, passing other downstream filters, etc.) and this is difficult to predict with any pre-sequencing quality control.