|Home | About | Journals | Submit | Contact Us | Français|
The latest high-throughput DNA sequencing technology can now be applied on a large scale to capture the complete set of mRNA transcripts in a cell, using a technique called RNA-seq. Although RNA-seq is only 2 years old, it has rapidly swept through the field of genomics, and it is now being used to analyze the transcriptomes of organisms ranging from bacteria to primates. The depth of sequencing allows researchers to quantify the level of expression of genes, to discover alternative isoforms in eukaryotic species, and even to characterize the operon structure of bacterial genomes.
Sequencing the mRNA in a cell has been used as a high-throughput method for finding genes since the early days of the human genome project. Beginning in the early 1990s, the expressed sequence tag (EST) method was used to capture fragments of thousands of human genes  prior to the sequencing of the genome. EST sequencing relies on the fact that eukaryotic genes are polyadenylated after transcription, and the long poly-A tract can be used to capture the transcripts via reverse transcription PCR (RT-PCR). The EST method was subsequently applied to many other species, and EST databases (notably dbEST) became a vital resource for genome annotation. Recently, a ‘next-gen’ version of EST sequencing has emerged, allowing researchers to capture and sequence mRNA at dramatically lower cost, and higher volume, than was ever possible with the EST method. The new RNA-seq methods [2-5] are being applied to a rapidly growing variety of species, cell types, and scientific questions, revealing far more about the transcriptomes of these species than was known just a few years ago. The field is advancing so rapidly that a brief review cannot cover the work of the past 2 years; this review is just a sampling of a few highlights.
Sultan et al.  analyzed approximately 8 million short reads and found that RNA-seq could detect 25% more genes as compared to microarrays. About one-third of transcripts in their experiments mapped to genomic regions not annotated as genes. Of the 94,241 splice junctions, 4096 were novel, and many of these indicated exon skipping events. This result has been amplified by subsequent studies that generated even more sequences and showed even larger numbers of novel splicing events. Trapnell et al.  generated approximately 430 million paired-end reads to recover 13,692 known isoforms from mouse myoblast cells, but also detected 12,712 novel isoforms, of which 7395 contained novel splice junctions while the rest represented novel combinations of known exons. This latter study also demonstrated the power of a new algorithm capable of detecting and quantifying alternative isoforms when aligning RNA-seq reads to a genome. In an RNA-seq study using liver RNA samples from humans, chimpanzees, and rhesus macaques, Blekhman et al.  found that alternative splicing events vary between closely related primates and also between the sexes within species. Wang et al.  generated approximately 600 million short reads from 15 cell types and found that 92-94% of human genes are alternatively spliced, and that many alternative splicing events are tissue-specific. RNA-seq is also being used to study genetic variation among individuals (expression quantitative trait loci, or eQTLs). Pickrell et al.  and Montgomery et al.  combined RNA-seq data and HapMap data from 69 Nigerian individuals and 63 Caucasian individuals, respectively, and both groups identified variants responsible for alternative splicing as well as variation in expression levels among individuals.
In single-celled organisms, RNA-seq can reveal novel insights about polycistronic transcripts. In the first transcriptome analysis of Trypanosoma brucei, thousands of splicing and polyadenylation sites were identified and many genes were found to be differentially expressed between the parasite's two life-cycle stages . In prokaryotes, RNA-seq can provide an extremely detailed transcription map, at the single-base level, as has been shown recently in an archaeal species, Sulfolobus solfataricus, and in a pathogen bacterium, Helicobacter pylori. In S. solfataricus, over 1000 transcriptional start sites were detected and 80 novel protein-coding genes were discovered . In H. pylori, hundreds of transcriptional start sites within operons were found, as well as approximately 60 novel small RNA genes .
The power of RNA-seq stems from its ability to generate deep coverage of the entire transcriptome of a cell with just a single run of a high-throughput sequencer, such as the Illumina HiSeq, which can produce up to 200 billion bases in a single run. The potential to characterize all genes, to capture alternative isoforms, and to measure differential expression has already been demonstrated in dozens of studies, but hundreds of species, and countless experimental conditions, are yet to be explored. Several groups have developed methods besides poly-A selection to capture all RNAs in a cell, for example, random hexamer priming [13,15], which allows them to analyze prokaryotic transcriptomes or to look at noncoding RNA in eukaryotes. It now appears that RNA-seq will replace microarray technology in the coming years, as it appears to be not only more comprehensive but also much more accurate than microarrays, particularly for transcripts with low expression levels . As this new method becomes even more widely adopted, it should greatly expand our understanding of the complex interplay of genes in all phases of cell development.
This was supported in part by National Institutes of Health grants R01-LM006845 and R01-GM083873.
The electronic version of this article is the complete one and can be found at: http://f1000.com/reports/b/2/64
The author declares that he has no competing interests.