RNA-Seq uses recently developed deep-sequencing technologies. In general, a population of RNA (total or fractionated, such as poly(A)+) is converted to a library of cDNA fragments with adaptors attached to one or both ends (). Each molecule, with or without amplification, is then sequenced in a high-throughput manner to obtain short sequences from one end (single-end sequencing) or both ends (pair-end sequencing).The reads are typically 30–400 bp, depending on the DNA-sequencing technology used. In principle, any high-throughput sequencing technology
25 can be used for RNA-Seq, and the Illumina IG
18–21,23,24, Applied Biosystems SOLiD
22 and Roche 454 Life Science
26–28 systems have already been applied for this purpose. The Helicos Biosciences tSMS system has not yet been used for published RNA-Seq studies, but is also appropriate and has the added advantage of avoiding amplification of target cDNA. Following sequencing, the resulting reads are either aligned to a reference genome or reference transcripts, or assembled
de novo without the genomic sequence to produce a genome-scale transcription map that consists of both the transcriptional structure and/or level of expression for each gene.
Although RNA-Seq is still a technology under active development, it offers several key advantages over existing technologies (). First, unlike hybridization-based approaches, RNA-Seq is not limited to detecting transcripts that correspond to existing genomic sequence. For example, 454-based RNA-Seq has been used to sequence the transcriptome of the Glanville fritillary butterfly
27. This makes RNA-Seq particularly attractive for non-model organisms with genomic sequences that are yet to be determined. RNA-Seq can reveal the precise location of transcription boundaries, to a single-base resolution. Furthermore, 30-bp short reads from RNA-Seq give information about how two exons are connected, whereas longer reads or pair-end short reads should reveal connectivity between multiple exons. These factors make RNA-Seq useful for studying complex transcriptomes. In addition, RNA-Seq can also reveal sequence variations (for example, SNPs) in the transcribed regions
22,24.
| Table 1Advantages of RNA-Seq compared with other transcriptomics methods |
A second advantage of RNA-Seq relative to DNA microarrays is that RNA-Seq has very low, if any, background signal because DNA sequences can been unambiguously mapped to unique regions of the genome. RNA-Seq does not have an upper limit for quantification, which correlates with the number of sequences obtained. Consequently, it has a large dynamic range of expression levels over which transcripts can be detected: a greater than 9,000-fold range was estimated in a study that analysed 16 million mapped reads in
Saccharomyces cerevisiae18, and a range spanning five orders of magnitude was estimated for 40 million mouse sequence reads
20. By contrast, DNA microarrays lack sensitivity for genes expressed either at low or very high levels and therefore have a much smaller dynamic range (one-hundredfold to a few-hundredfold) (). RNA-Seq has also been shown to be highly accurate for quantifying expression levels, as determined using quantitative PCR (qPCR)
18 and spike-in RNA controls of known concentration
20. The results of RNA-Seq also show high levels of reproducibility, for both technical and biological replicates
18,22. Finally, because there are no cloning steps, and with the Helicos technology there is no amplification step, RNA-Seq requires less RNA sample.
Taking all of these advantages into account, RNA-Seq is the first sequencing-based method that allows the entire transcriptome to be surveyed in a very high-throughput and quantitative manner. This method offers both single-base resolution for annotation and ‘digital’ gene expression levels at the genome scale, often at a much lower cost than either tiling arrays or large-scale Sanger EST sequencing.