Eukaryotic genes contain intervening sequences or introns that need to be removed from precursor messenger RNA (pre-mRNA) in a complex process termed splicing. During pre-mRNA splicing, relatively short exonic sequences are recognized by spliceosome, a large RNA-protein complex. During splicing, introns are removed and exons are joined together to form mature RNA. In addition to splice site (SS) signals at the exonic 5' and 3' ends, accurate discrimination of exons and introns requires additional auxiliary elements [
1-
3]. These conserved but degenerate motifs have been termed exonic (ESEs) and intronic (ISEs) splicing enhancers and exonic (ESSs) and intronic (ISSs) splicing silencers that activate or repress splicing, respectively. These elements are thought to bind splicing regulatory factors, including the serine/arginine-rich (SR) proteins and the heterogeneous nuclear ribonucleoproteins [
1]. Consistent with this concept, splicing regulatory motifs were shown to associate with a single stranded conformation that is more accessible to protein-RNA interactions [
2]. Combinatorial interaction of splicing factors bound by these motifs is important for both constitutive and alternative splicing of pre-mRNAs because they contribute to the regulation of gene expression and proteomic diversity across higher eukaryotes [
3-
6].
Several systematic computational approaches and
in vivo or
in vitro selection methods have been employed to identify these motifs in the genomic sequences. For example, the RESCUE-ESE (Relative Enhancer and Silencer Classification by Unanimous Enrichment), a computational approach used in conjunction with experimental validation, predicted specific hexanucleotide sequences as candidate ESEs based on significantly higher frequency of occurrence in exons than in introns and also significantly higher frequency in exons with weak SSs than in exons with strong SSs [
7]. The number of putative exonic enhancer and silencer octamers were computationally identified by their enrichment in internal non-coding exons versus unspliced pseudoexons and 5' untranslated regions of transcripts in intronless genes [
8]. A cell-based fluorescence-activated screen (FAS), an
in vivo splicing reporter system was used to identify ESSs that demonstrated consistent silencing results in a splicing reporter construct [
9]. Evolutionary conserved intronic splicing regulatory elements were found by considering intronic boundaries surrounding orthologous exons in
Homo sapiens,
Canis familiaris,
Rattus norvegicus and
Mus musculus obtained from UCSC genome-wide multiple alignments [
10]. Putative splicing regulatory sequences were reported based on evolutionary conserved wobble positions between human and mouse orthologous exons, along with overabundance of sequence motifs compared to their random expectation [
11]. Exonic and intronic elements have also been predicted based on strand asymmetry [
12]. Neighborhood Inference (NI) approach predicted ESEs and ESSs with activity in regulating biochemical processes based on the local density of known sites in sequence space [
13]. Finally, a recent study based on deep re-sequencing of human transcriptome [
14] uncovered a new repertoire of plausible intronic hexamers supporting the tissue-specific splicing events.
A large fraction of spliceosomal components are highly conserved across eukaryotes, including
Tetrapoda (four-footed) organisms [
1,
6,
15-
17], where the genes encoding well-known RNA binding proteins involved in splicing regulation are enriched with ultraconserved elements [
18]. Three quarters of RESCUE-ESEs are shared between humans and mice [
17]. Most of the human RESCUE-ESEs [
7] have a pronounced bias towards exonic boundaries in more distantly related vertebrate organisms [
17]. A number of experimental reports showed that genes from distantly related
Tetrapoda organisms were correctly expressed and post-transcriptionally modified in transgenic animals [
19,
20]. These observations suggest that splicing regulatory motifs shared by tetrapods may further enrich known elements for functionally important sequences. However, no systematic studies have been carried out.
In this work, we predict an extensive set of cis-acting elements identified in a large set of Tetrapoda exons and characterize their overlap with previously identified silencers/enhancers. Unlike in previous methods, we did not restrict the size of ESE/ISE/ESS/ISSs oligomers unless they are longer than 8 nt. Our prediction is based on the assumption that auxiliary splicing elements have pronounced statistically significant density increase/decrease towards the exonic boundaries compared to the deep intronic or exonic sequences. This assumption allows using the identified elements to improve performance of splicing prediction methods. Predicted ISEs/ISSs close to the annotated exons were examined for increased evolutionary conservation as compared to oligos with no predicted functionality. Finally, we investigated association of the elements placed in context with the single-stranded configuration of local pre-mRNA structure.