In recent years, a technologic revolution has shifted DNA sequencing from traditional Sanger methods to "next-generation" sequencing (see review [1
]). Applying these new sequencing methods to cDNA libraries, termed RNA-Seq, generates a wealth of information beyond that obtained from sequencing genomic DNA (see review [2
]). RNA-Seq provides insights at multiple levels into the transcription of the genome as it yields sequence, splicing, and expression-level information leading to the identification of novel transcripts [3
] and sequence alterations. For research into somatic mutations in cancer (for example, The Cancer Genome Atlas [5
]), this method has the advantage of enriching for changes in coding sequences, which are more likely to affect function, compared with sequencing genomic DNA. Chromosomal rearrangements, including translocations, are an important class of mutations in cancer [8
]. Although chromosomal rearrangements can be detected by next-generation sequencing of genomic DNA [9
], RNA-Seq is a powerful tool to identify those rearrangements that lead to chimeric transcripts and are more likely to have functional consequences in cancer [3
Despite these advantages of RNA-Seq, the complexity of the transcriptome and the wide dynamic range of expression levels render whole-transcriptome sequencing an expensive proposition, particularly at the depth required to call mutations and identify structural rearrangements or aberrant splice forms in low-abundance mRNAs. Mortazavi and colleagues [12
] reported that 40 million reads were required to provide onefold coverage of a transcriptome, whereas the calling genotypes with high confidence may require coverage levels of at least fivefold to 20-fold [13
]. This magnitude of coverage invariably results in vast oversampling of abundant transcripts, which adversely affects the efficiency and overall power of the approach.
Cost and efficiency considerations have prompted the emergence of methods that allow "targeted" next-generation sequencing. Two suitably high-throughput approaches to enrich specific sequences from genomic DNA have been developed: multiplexed molecular inversion probes (MIPs) [14
] and capture by hybridization to oligonucleotide probes on microarrays [17
] or in solution [20
]. MIPs are similar to PCR primers in that they enrich loci defined by two flanking specific sequences. Thus, they are not appropriate for the discovery of novel chromosomal rearrangements such as translocations. By contrast, capture by hybridization can enrich DNA fragments that extend beyond the probe sequence, including sequences that are not contiguous in the reference sequence. Solution hybrid selection is a capture method that uses a complex mixture of RNA baits derived from PCR-amplified oligodeoxynucleotides to select hybridizing sequences in a library of DNA fragments [20
]. To date, however, hybridization-based capture approaches have been applied primarily to genomic DNA, typically for the purpose of enriching exonic DNA of interest. Although targeted sequencing of genomic DNA facilitates mutation-discovery/profiling, it is unable to interrogate the myriad additional genomic alterations affecting DNA and mRNA that are critical to tumor biology and therapeutic development.
In this study, we explore the feasibility and power of "targeted RNA-Seq," the application of hybridization capture methods to transcriptome analysis. When applied to 467 cancer-related genes, this novel approach increased the coverage of low-abundance transcripts to levels that enabled reliable identification of sequence changes. In addition, this method provided information about relative expression levels, facilitated the discovery of novel splice variants, and enabled detection of novel fusion transcripts and isoforms thereof that would otherwise have escaped detection. As such, this method fills an important niche in cancer research, as well as other areas of genomics, by generating all the multifaceted genomic and gene-expression information in a single, straightforward experiment.