The BioMicro Center supports a broad variety of standard library preparation methods for RNAseq. The choice of method is highly dependent on the type of input, the amount of input RNA available, and the quality of the input RNA. The key in all RNAseq methods is the avoidance of ribosomal RNA, which would typically dominate the library preparation. Below area summary of the methods we utilize routinely in the core. For High-Throughput RNA library preparation, please check out our new page for methods designed specifically for large sample batches.
Please note some methods are currently in transition to try to improve data quality and reduce library preparation costs.
|Amount of RNA||Quality of RNA||Method Recommended|
|>25ng||RIN:9.0||Illumina NeoPrep or Illumina Truseq (>100ng)|
|10pg-25ng||RIN:9.0||Clontech SMARTer Low Input|
|1ng-1ug||RIN:5.0||NuGEN Ovation RNA-Seq System V2 or Clontech totalRNA pico|
|smallRNA||NA||NEB or BIOO SmallRNA Kit|
Illumina's TruSeq chemistry is the primary RNAseq method used in the BioMicro Center. This chemistry uses polyT beads to isolate the mRNA from the rRNA and tRNA. The use of these beads requires that the RNA be of very high quality or only the 3' end of transcripts will be isolated. Purified mRNA is then fragmented with metal and random priming is used to convert the sample to cDNA. Once double-stranded cDNA is generated, the sample is transferred to the SPRIworks for the remainder of sample preparation.
The Illumina Neoprep allows us to automate this chemistry on a hands-free system using electrowetting technology. The Neoprep reduces input material and significantly reduces cost and turnaround time. However, the instrument is limited to batches of 16 samples and has a total of 24 available indexes, which significantly limits multiplexing possibilities. We anticipate (though have not yet demonstrated) that batch effects will be reduced with the Neoprep as the entire process from polyA selection through amplification is automated in the unit so natural fluctuations in temperature/time should be minimized.
For samples with degraded RNA or samples where you are interested in looking at non-polyA RNAs, the BioMicro Center utilizes the Epicenter ribominus kit. This kit uses *check me* magnetic beads coupled to rRNA and tRNA sequences to remove these sequences from the solution. The remaining mRNA fragments can then be converted in to cDNA. Once double-stranded cDNA is generated, the sample is transferred to the SPRIworks for the remainder of sample preparation.
For samples with less then 100ng of input, the BioMicro Center utilizes the Clontech SMARTer system. This system differs from the TruSeq chemisry in that it begins with cDNA generation using polyT priming and proprietary chemistry. The use of polyT priming requires the RNA to be of high quality. Full length double-stranded cDNAs are generated and amplified by PCR. These cDNAs are then fragmented and transferred to the SPRIworks for the remainder of sample preparation. Data from this system is of similar quality to samples created with Illumina TruSeq chemistry.
For samples with less then 100ng of input and restricted input amounts, our kit of choice is the NuGEN Ovation. This kit utilizes non-random nonamers designed to not amplify ribosomal RNA to create double stranded cDNA fragments. The fragmented cDNA is then transferred to the SPRIworks for the remainder of sample preparation. The downside of this kit is that the non-random nonamers cannot amplify all areas of the genome and certain portions of genes are often lost. Still, for many samples, this kit is the only way to generate RNAseq libraries.
Additional Chemistries Available in the BioMicro Center
Strand Specific Sequencing
|For samples with high amounts of input, we can modify the chemistry of cDNA creation to allow detecting the strand of the RNA using the dUTP 2nd strand marking protocol that preformed best in J. Levin et al 2010. In this method, actinomycin D is added to the 1st strand synthesis to prevent reinitiation of the RT-PCR enzymes. Then, dUTP is substituted in for dTTP in the second strand, allowing a clear distinction between the forward and reverse strands. After library construction, but before amplification, the dUTP containing strand is degraded and only the reverse strand (template strand) remains. This method is highly efficient but requires significant amount of RNA as the actinomycin causes a significant reduction in 1st strand yield.|
For some applications of RNAseq, such as splice choice determination, having a precise knowledge of the insert size is critical. While the SPRIworks does provide some size selection (typically restricting fragments to between 150 and 350bp), this can be too wide for some methodologies. In these cases, after libraries are amplified, they can be run on the Sage BluePippin (either singly or pooled). Here the size distribution can be much tighter, with most of the DNA fragments being within a 50nt range.
Comparison of the RNAseq methods
The BioMicro Center has done testing in head-to-head competitions of the TruSeq, NuGEN v2 and Clontech kits. These data were presented at AGBT 2012 as part of a poster. The authors were: Avanti Shrikumar, Zachary Banks, Manlin Luo, Ryan Sinapius, Paola Favaretto, Jessica Hurt, Chris Burge, and Stuart S. Levine. Selections of the poster are shown below:
| Summary |
Sequencing of the transcriptome (RNAseq) has become an increasingly important tool in the molecular biology toolkit and is rapidly replacing microarrays as the primary method for determining genome-wide expression levels. Several vendors have created pre-packaged kits for creating RNAseq libraries for Illumina sequencing. These kits differ significantly in methodology and in the amounts of input required. Here we provide a head to head test of five different RNAseq kits in a core setting. The kits were evaluated on two experimental samples with similar expression patterns, murine embryonic stem cells and the same cells with a single factor knocked down by RNAi, to determine the sensitivity of each method. Each kit was additionally evaluated across three different concentrations of RNA input. We found that the different methodologies show different RPKM levels for each transcript and also vary in their technical reproducibility. The different methods resulted in small but largely distinct lists of differentially expressed genes that we compared to genes with known expression changes
|Experimental Design: ES cells were transfected with siRNA targeting a splicing factor or a control siRNA. The tested splicing factor normally blocks inclusion of specific exons that had been previously identified by RT-PCR. Reduction of the splicing factor’s levels should lead to an increase in the amounts of these specific transcripts. RNA was collected from the cells and analyzed by RNA-seq. Samples were sequenced to a depth of at least 7.5m reads of 40nt length on either a GAIIx or a HiSeq2000||Testing Matrix: A single sample of control and splicing factor knockdown samples were serially diluted. Aliquots of the diluted samples were tested against the 5 methods to determine the ability of each kit to identify differentially expressed genes in a biologically challenging situation as well as to identify their sensitivity to different input amounts and quantify the amount of technical variation. * indicates the amount tested was below the minimum recommended input.|
|Fraction of reads aligned: Reads aligning once or multiple times to the mouse genome (mm9) are shown. Reads were aligned with Bowtie. NuGen samples show an increase in non-aligned reads at low input amounts.||3’ Bias:
Read densities were calculated along the exonic portions of each transcript. Transcripts were grouped by length and plotted as metagenes with both 5’ and 3’ ends locked. Clear 3’ bias can be observed in the samples processed using the Clontech kit.
|Variation in RPKM score across the transcript.(right) The evenness of coverage within the transcript was measured by comparing the read density in largest two exons. Only exons greater than 200nt and genes with PRKMs over 10 were included in this analysis (n = 1,876 genes including 3,676 exons) (left) the average number of exons in the above data set with very low coverage (<5 reads) is shown.|
Choosing a read length and read depth
A few notable references. Please feel free to add more:
Error fetching PMID 19015660:
Error fetching PMID 18550803:
Error fetching PMID 20711195:
- Error fetching PMID 18978772:
- Error fetching PMID 19015660:
- Error fetching PMID 18550803:
- Error fetching PMID 20711195: