The numerous, very dissimilar types of bioinformatic data conspire to make integration a central problem for efficient and effective application of biological findings. Integration of data of three particular types is the goal of this paper. Gene splicing is the focus, held up as an example of how sequencing, splicing, and RNA folding data types might be used to guide research that could illuminate major mechanisms of cell biology such control of levels of ribonucleoprotein species.
Function and dysfunction of gene splicing impact embryogenesis, cell motility and viability, cell cycle arrest, and many other mechanisms of metazoan cell biology [
1]. This paper stems from three remarkable observations involving splicing. The spliceosome is a large complex of protein subunits and five ribonucleoprotein subunits, the latter incorporating snRNAs. One of the snRNAs is the 164-nt RNU1. Predicted 2D molecular shapes of RNU1 include four "hairpins," conformations in which pairs of nucleic acids form a double-stranded stem while single-stranded nucleic acids form a loop. The first two of the RNU1 hairpins are already known to be bioactive through functional assays of regulation of the gene cyclin H (CCNH) [
2]. The fourth hairpin, denoted herein as
H, has a loop of four nts and a stem of 12 pairs of nts including eight C-G bonds (hence is very stable).
Our deep sequencing to detect small RNAs in three samples of post-mortem human prefrontal cortex produced abundant reads corresponding to a 16-nt sequence from the 3' side of the stem of H. We denote herein the 16-nt sequence as S.
Regarding small RNA context [
3], Kawaji et al. engaged in unbiased exploration of 19- to 40-base sequences from small RNAs. Their pioneering report provided evidence of abundant small RNAs originating from familiar noncoding RNAs (ncRNAs) including tRNAs, snoRNAs, snRNAs, and rRNAs. Regarding tRNAs, 3' ends fragments are transported from the nucleus to accumulate in the cytoplasm, as reported by Liao et al. [
4]. Bidirectional promoters suggested that small RNAs can be derived from double stranded RNAs (dsRNAs) with subsequent cleavage. Shi et al. [
5] found abundant transcriptional representation of sequences immediately adjacent to--that is, offset from--predicted pre-miRNAs in the simple tunicate
Ciona intestinalis (sea squirt). Langenberger et al. [
6] also found transcripts offset from miRNAs in human samples, albeit at low levels unrelated to levels of the adjacent miRNAs. Taft et al. [
7] first reported ~18 nt RNAs in FANTOM4 data that map within -60 to +120 nt of transcription start sites of genes of humans and other metazoans. Taft et al. [
8] then found miRNA-like small RNAs derived from the ends of snoRNAs in humans and other eukaryotes. Moreover, Taft et al. [
9] reported 17- or 18-nt RNAs with 3' ends that map precisely to the splice donor site of internal exons of mice and other metazoans. Regarding snoRNAs, Ender et al. [
10] assayed human cancer cell RNAs and reported a number of human snoRNAs with miRNA-like processing signatures, evidently targeting an mRNA. Likewise, Saraiya et al. [
11] used sequencing to find a 26-nt RNA from the flagellated protozoan
Giardia lambia, again with miRNA-like processing and apparent RNAi activity. Other non-miRNAs of about 16 nts that are subsequences of known miRNAs have been shown by Li et al. to participate in gene regulation, targeting the 3'UTRs of target genes as efficiently as sequentially enclosing miRNAs [
12]. Importantly, Li et al. documented a long list of small RNAs, some with known sources and some not. In a generalising study, Langenberger et al. [
13] discovered from sequencing data that certain small RNA subsequences of a variety of human ncRNAs are highly overrepresented in the transcriptome, extending all the above reports. They analysed low molecular weight RNAs isolated from frozen prefrontal cortex, as did we in preparation of the present report. A rapidly developing line of research on small RNAs derived from tRNAs is represented by work of Haussecker et al. [
14].
Additional sources of small ncRNA are the vault RNAs, ~100-nt Pol III transcripts in the enigmatic vault organelles of eukaryotic cells. There are three described human vault RNAs from a cluster on chromosome 5 [
15]. Stadler et al. [
16] reported differential vault RNA expression in five human cancer cell lines and consensus patterns of small RNAs from vault RNAs across species. Vault particles are associated with multidrug resistance and intracellular transport. Persson et al. [
17] discovered that human vault RNAs produce several small RNAs via mechanisms different from the canonical miRNA pathway, but at least one such small RNA associates with Argonaute proteins and guides sequence-specific cleavage of mRNAs to regulate gene expression. In particular Persson et al. discovered regulation of CYP3A4 (one of 57 human cytochrome P450 proteins) in MCF7 cells by a small byproduct of vault RNA transcription. The CYP3A4 enzyme is important in the initial metabolism of many marketed drugs [
18]. Importantly, the experiments of Persson et al. might explain the association of abundance of vault particles with drug resistance.
It seems quite likely that nature must put such abundant, selected subsequences of the above types to some purpose, implying unrevealed pathways that are presently without definitive annotations or even realisation [
3]. For example, nuclear-localized small RNAs might be epigenetic regulators of gene expression [
9]. Thus block patterns of small RNA transcription sources might greatly improve and simplify ncRNA annotation [
13].
Regarding neurological bioactivity, Smalheiser et al. [
19] discovered in adult mouse hippocampus that certain species of 25- to 30-nt small RNAs derived from specific sites within well known noncoding RNAs were dramatically increased as a consequence of odorant discrimination training. This work reveals the potential importance of byproducts of ncRNA synthesis in neuroscience, possibly a universe of gene regulation parallel to that of the miRNAs.
Consistent with the above prior work, we found that reads representing the 16-nt sequence
S appear in every sample more than ten times as frequently as reads from the other three RNU1 hairpins and at frequencies comparable to those of abundant brain miRNAs. Further compounding interest in the 16-nt sequence
S from hairpin
H are, in the manner of miRNA target predictions, two putative target regions (lengths 9 and 11 nts) in the 3'UTR of splicing regulator gene SFRS1. Thus the 16-nt byproduct of RNU1 synthesis (from promotion of splicing) might also inhibit expression of SFRS1 (inhibition of splicing or at least inhibition of formation of spliceosome components). This might be a form of auto-regulation essential to homeostasis of splicing. Our neuroscience interests provide focus on SFRS1 protein product because it modulates several forms of synaptic plasticity considered to be involved in the very essence of memory [
20].
Thus there is a triple intersection of bioinformatics: annotated function of an ncRNA, abundance in brain of a small RNA evidently processed from the same ncRNA source, and sequence alignment of the complement of the same small RNA with the 3'UTR of a major gene having the same function. These in silico coincidences demand investigation of potential miRNA-like mechanisms involving the RNU1 hairpin H, especially with regard to SFRS1. Needed are functional validations of nuclear RNU1 targets. Considering the huge impact of splicing function in nature and dysfunction in disease, elucidation of splicing homeostasis would carry a significant potential for progress toward novel diagnostic tools and drug platforms.
Regarding RNU1 context, hairpins studied by O'Gormann et al. [
2] (which do not include
S) were found to be bioactive, as mentioned above. Additionally, it has long been known that pre-mRNA splicing can be regulated both positively and negatively by reversible phosphorylation of spliceosomal SR proteins [
21,
22]. Thus it would be no surprise that additional layers of complexity might exist to regulate bioactivity of SFRS1 protein. Moreover, Kohtz et al. [
23] showed at an early date that SFRS1 protein cooperates with U1 small nuclear ribonucleoprotein particle (snRNP) in binding pre-mRNA, so there is already a direct, mechanistic link of RNU1 in U1 with SFRS1 protein. However, demonstrating that a small RNA byproduct of RNU1 transcription goes on to bind to SFRS1 mRNA and inhibit expression of that gene would be, to our knowledge, a novel splicing feedback loop discovered by virtue of modern, unbiased sequencing.
In summary, alignments of abundant reads, hairpin structures, and logical targets are known to be important in some cases and as yet unrecognised alignments are likely to be important in others--provided such colligations can be efficiently discovered.