The biogenesis of eukaryotic RNA transcripts is made possible by a myriad of protein complexes that act in concert, orchestrating key tasks such as transcription, splicing, capping, 3′ end cleavage and polyadenylation. Although gene regulation by transcription is a widely studied process, the effects of other terminal processes such as polyadenylation that regulate transcript stability, transport and expression of RNA transcripts are poorly understood. Therefore, to help with the investigations of polyadenylation locations, 3′ UTRs, their different isoforms and usage, we built comprehensive maps of polyadenylation sites and quantified the usage of each polyadenylated gene isoform. The complete polyadenylation landscape is made publicly available through our web-portal, xPAD (), which integrates the widely used UCSC genome browser (26
) to enable detailed investigation of any genomic region by intuitive queries involving gene name, keywords or gene position.
Figure 5. Illustration of xPAD. xPAD integrates the UCSC genome browser to provide a web-interface to visualize both the precise polyadenylation locations of different isoforms, as well as their expression levels across tissues of interest. The complete gene structure (more ...)
Important functional consequences of polyadenylation have emerged recently (14
). In particular, the disparate usage of various polyadenylation sites within the same 3′ UTR is known to result in reprograming of gene expression in proliferating cells (4
), embryonic cells (9
) and cancer cells (38
). The availability of precise polyadenylation sites and their usage provides not only independent confirmation to earlier reports but also generates additional insights into the participation of some functional modules that may be important targets of tandem APA -mediated reprogramming of gene expression (Supplementary Figure S5
). Two such modules are noteworthy: the Rho GTPases pathway and the RNA processing machinery. Rho family small GTPases are a class of important cellular signaling molecules with strong implications in cancer (61
). Rather than Rho GTPases themselves, their regulators are emerging as the major targets in cancer (61
), a theme that recurs in our network analysis, where all four Rho GTPases in the network are not targeted, whereas their regulators and effectors within the network are targeted (Supplementary Figure S5
, Rho GTPases module). Rho GTPases are activated by all of these regulators in the network, which leads to actin re-organization, a process important for cancer cell migration and metastasis (49
). Another important gene in the RNA processing machinery is POLR2K
, the only subunit common to all three polymerases that dose dependently improves the assembly of the Pol III pre-initiation complex (51
). As Pol III products (tRNAs, 5
S rRNAs and 7SL rRNAs) are required for protein synthesis and tumors have unusually high levels of Pol III activity (63
), it is possible that the upregulation of POLR2K
could facilitate Pol III assembly and thus contribute to cell proliferation and cancer development.
The different polyadenylated transcript isoforms could be regulated the transcriptional (64
) as well as posttranscriptional (60
) levels, and the role of these two processes in polyadenylation may not be readily separable (66
). As most regulatory mechanisms that act in 3′ UTRs are thought to be repressive and not activating (5
), it is likely that if the causative factor is a canonical UTR-regulation mechanism such as that of miRNAs, then it must be downregulated in cancer. An alternative explanation is that non-canonical, isoform-dependent, UTR regulatory factors that do not repress, but activate (e.g. polyadenylation/cleavage proteins) short isoform levels are upregulated in cancer. Identification of potential isoform-dependent regulatory roles for Pabpc1, H3K36Me3, and the ATATAT motif that is linked to the yeast heterogeneous nuclear ribonucleoproteins (hnRNP)-like protein Hrp1 highlights the possibility of such isoform-dependent factors. Hrp1 contains two tandem RNA Recognition Motifs domain, an arrangement shared by various human hnRNPs. Although the protein sequence of Hrp1 is similar to all human hnRNPs, the Hrp1 mRNA sequence is most similar to hnRNPA3. Several hnRNP proteins seem upregulated in tumor, including one of the hnRNPA3 isoforms that is upregulated in four of the five tumor samples. Similarly, across all five samples, Pabpc1 reads in tumor samples are higher than that of normal tissues; manifesting greater than 2-fold up-regulation in lung and kidney samples. Given the emerging theme that polyadenylation is interlinked to other key processes such as transcription elongation and posttranscriptional regulation (64
), some of these regulatory marks such as H3K36Me3 may play a dual role in controlling both transcription and polyadenylation of the isoforms.
In summary, we built a comprehensive polyadenylation and APA map, which is made publically available as a webserver (johnlab.org/xpad) and as a UCSC track hub (http://www.johnlab.org/xpad/Hub/UCSC.txt
). We have highlighted potential uses of the resource in studying polyadenylation-mediated gene regulation and generated new insights into the regulation of polyadenylation with an emphasis on gene isoform-dependent signatures in cancer. Although these observations will need to be followed up with more detailed studies, xPAD serves as a unique tool to investigate various questions in biology, relating to polyadenylation, APA-mediated gene regulation, gene expression analysis, and discovery of new genes and gene isoforms. We aim to further expand xPAD with additional samples (tissue-types, cancer-subtypes, and cell lines), with the goal of defining gene-isoform signatures that may be useful for diagnosis or prognosis of specific human diseases, particularly in cancer (39
). These studies which provide cell-type specific annotation and usage of 3′ UTRs are expected to also help other genome-wide studies of UTRs such as miRNA target analysis (68
), which could benefit by incorporating the expression levels of individual 3′ UTR isoforms.