Analysis of the mammalian transcriptome and transcriptional network in ex vivo cells requires technologies that provide a comprehensive and unbiased view of the tissue-specific promotome (the complete set of promoters) from small amounts of RNA and the intron-exon structure of the transcripts associated with different transcription start sites (TSSs), which are marked by a cap-site in most eukaryotic RNA polymerase II-derived RNAs.
Among sequencing-based techniques to measure gene expression, tag-based methods are common. They involve reading a short sequence of a transcript that is still long enough to be mapped onto the genome. We have used Cap Analysis Gene Expression (CAGE)1–3
, a cap-trapping based method which allows for systematic 5′ end profiling of capped RNAs, for the first comprehensive single-base resolution maps of TSS and promoters from human and mouse tissues4
and for deciphering transcriptional networks in the human leukemia cell line THP-15
. Such large-scale characterization of TSSs showed an unprecedented complexity of the transcriptome. In contrast to classic gene models, the emerging view suggests that most genes have multiple TSSs differing by multiple bases4
and driven by various core promoters and that newly capped 5′ ends can also be created post-transcriptionaly6
. Transcription can be initiated by promoters that are broad in shape, often associated with CpG islands, or by sharp promoters, which are narrow in shape and are often associated with TATA-boxes4
. These promoter structures have functional implications, being associated to tissue specificity, as for example sharp promoters are, different exon usages, translation initiation sites or classes of non-coding RNAs (ncRNAs). Within the locus of a coding gene, transcription can start within and downstream of the open reading frame such as for the non-coding RNAs that can originate in genomic regions corresponding to the 3′ ends of protein coding geness4
. Additionally, the capped transcriptome includes non-coding RNAs that are associated with initiation and termination of transcription6,7
However, there are outstanding problems that could not yet be addressed with the existing technologies. CAGE requires a large quantity of starting material (~50μg of total RNA) precluding TSS transcriptome analysis of small samples, such as homogeneous cells preparation after microdissection or samples derived from cellular sub-fractionation.
Furthermore, newly identified promoters must be assigned to gene models. Although CAGE identifies new promoters, determining their connection to either downstream known gene structures or to independent novel RNAs is limited to low-throughput gene-by-gene validations. RNA shotgun sequencing approaches (RNA-seq) have been unable to distinguish multiple 5′ ends of a given gene, identifying only their most extreme boundaries at best. This constrains the functional annotation of promoters, from which accurate inference of transcriptional regulatory networks depends5
and limits the study of ncRNAs overlapping known genes. Paired-end sequencing of full-length cDNA, like the GIS (Gene Identification Signature) ditag approach8
, allows for the determination of TSS and termination sites in polyadenylated mRNAs, but does not yield information on internal exons. In addition it requires large quantities of purified mRNAs.
Here we present nanoCAGE and CAGEscan technologies, which provide a genome-wide profiling of TSSs from small quantities of RNA and link them to the anatomy of transcribed RNAs. nanoCAGE was carried out with as little as 10 ng of total RNA, the equivalent of the RNA content of a thousand cells. CAGEscan provided important insights on the complexity of the promotome-transcript structure, identifying among others, RNAs that originate from a given TSS but terminate in unrelated downstream genes. Our data also provide an estimate of RNA types that populate the various cell compartments, suggesting a nuclear role for intron- and intergenic regions-derived RNAs, as well as for retrotransposon elements and antisense RNAs.