Previously, the RNA 3′ ends of the model organism, A. thaliana,
were poorly characterized, but defining the sites of 3′ end formation is essential for genome annotation and to understand the regulation of gene expression. We resolved the heterogeneity in 3′ end formation using quantitative DRS data to analyze cleavage sites separately, based on preference. This led to an understanding of A. thaliana
3′UTRs and 3′ end formation that is consistent with the detailed experimental dissection of the cauliflower mosaic virus (CaMV) poly(A) signal carried out by Hohn and co-workers: these analyses identified a U-rich upstream sequence element that enhanced 3′ end formation32
and showed that while each possible point mutation to the AAUAAA hexamer could be tolerated32
, deletion of this hexamer abolished 3′ end formation33
. At the time, this experimental work could not be generalized to other plant 3′UTRs. This is largely explained by our analysis, which reveals that 3′ end formation within the same 3′UTR is extremely heterogeneous; quantitative differences in cleavage site preference are associated with multifunctional overlapping poly(A) signals of relatively loosely defined sequence; and accurate and efficient 3′ end formation is combinatorial. As a result, the density and complexity of overlapping functional poly(A) signals in each A. thaliana
3′UTR makes the identification of sequences corresponding to those of the CaMV poly(A) signal difficult, if not impossible.
Alternative cleavage and polyadenylation within human 3′UTRs is intimately connected to miRNA-mediated regulation as human miRNA target sites are mostly found in 3′UTRs34-36
. In contrast, A. thaliana
miRNA target sites are generally found in open reading frames35
. This distinction likely relates to differences in the extent of miRNA-target base-pairing and the resulting sensitivity of such duplexes to translocating ribosomes36
. We speculate that the heterogeneity of RNA 3′ end formation we detect here may preclude robust miRNA-mediated regulation targeted to A. thaliana
3′UTRs. As a result, an interplay between differences in mRNA 3′ end formation and miRNA targeting may have contributed to the evolution of current target site distinctions.
We discovered that most poly(A)+ antisense RNAs derive from convergent gene pairs with overlap restricted to their 3′UTRs. One might expect such a gene arrangement to be rare because of the potential for either transcription interference or RNAi to compromise gene expression4,5,37
. However, nearly one fifth of all our DRS reads derived from such gene pairs and we found no general trend of either anti-correlated or relatively reduced expression at these loci in our dataset. This might be because the seedling RNA we analyzed is derived from multiple cell types where transcription at convergent overlapping gene pairs could be spatially separated. Alternatively, depending on either allele-specific expression, or pulses of transcription, endogenous overlapping gene pairs may not necessarily be subject to transcription interference or RNAi. Expression of similar 3′ convergent gene pairs in the same cell type has been detected in D. melanogaster
without resultant RNAi28
. Previous analyses of A. thaliana
convergent gene pair expression, albeit with less definitive datasets, also found no evidence of their regulation by RNAi38,39
. These findings stand in contrast to the paradigm of siRNA-dependent anti-correlated expression defined for the convergent gene pair SRO5
. However, our analysis casts doubt on the robustness of the data presented in that study suggesting that the conclusions should be revisited. Regardless, what is clear is that the convergent overlapping gene pairs identified here share 3′UTRs. Rather than being avoided, this genomic architecture may be favored because it drives genome compaction through the elimination of intergenic sequence. Our analysis indicates that the multi-functionality of U- and A-rich poly(A) signals enables this arrangement by facilitating 3′ end formation in sense and antisense RNAs. Since this is consistent with the recent analysis of C. elegans
, this influence of 3′ end formation on genome organization may be quite general. It will be interesting to apply DRS to related species with larger or polyploid sequenced genomes to address whether shared 3′UTRs are restricted to compact genomes and select against transposon insertion. Additionally, we found mean expression levels at these overlapping gene pairs to be higher than at other genes. Perhaps, physical interactions between promoter and 3′ end regions (gene loops) juxtapose the promoters of convergent gene pairs with the same terminator creating a nexus that facilitates local recycling of factors essential for transcription2,40
We show that DRS avoids internal priming problems that confound oligo(dT) primed analyses of polyadenylated cleavage sites. Presumably, the environment of the sequencing flow-cell favors annealing of the 3′ poly(A) tail over intramolecular A-rich sequences. DRS obviates not only problems associated with reverse transcription12,14,15
, but also the complex sample preparation and amplification that can affect quantification of RNA-seq data41
. DRS has limitations too, since read-lengths are relatively short and indels may affect read alignments. Nevertheless, DRS should be a useful addition not only to the study of regulated 3′ end formation between samples, but other aspects of transcriptome analysis too. Overall, our findings suggest that viewing gene expression by sequencing RNA directly, rather than through the prism of reverse transcriptase-dependent copies, is not only feasible, but can have important consequences for the interpretation of transcriptome-wide data that in this case enabled new and revised insight into what the genome actually encodes, how it is organized and how that affects gene expression.