The results presented here corroborate and extend previous observations on the connection between intron length and the strength of splicing signals. Generally, we observed that, as expected, longer introns that might present greater problems for intron definition and, consequently, for efficient splicing than shorter introns were associated with stronger splicing signals. However, there was a notable compensatory relationship between different types of signals, namely, the splice sites and the ESEs. In the range of relatively short introns (approximately, < 1.5 kb in length), the enhancement of the splicing signals in longer introns seemed to occur within the exons and was manifest in the increased concentration of ESEs. In contrast, for longer introns, this effect was not detectable, and what was seen instead, was an increase in the strength of the donor and acceptor splice sites. Since the ESEs are located in protein-coding exons, it appears likely that accumulation of A-rich hexamers beyond a certain limit is incompatible with functional constraints operating at the level of protein sequence evolution. Hence the compensation in the form of evolution of the splice sites themselves toward greater strength.
The threshold separating "short" and "long" introns used here was different from the boundary at ~200 nucleotides that separates introns spliced via intron definition from those spliced via the exon definition [33
]. We also repeated the analyses described in the text with an additional partition of short introns at the 200 nucleotide cut-off. The "intron-defined" short introns (<200 nucleotides) were found to have correlations of the same sign as those that are "exon defined" (200 to 1500 nucleotides) but of lower significance (data not shown). Thus, more complex phenomena seem to be at play than, simply, the distinction between the intron and exon definitions. Apparently, even among introns that are spliced via the exon definition, the relative contributions of the splice sites and additional, exonic splicing signals depend on the length of the intron.
All of the above relationships consistently held for both constitutive and alternative exons. Curiously, however, the connection between splice site strength and intron length was somewhat stronger for the alternative exons whereas the opposite was true of the dependence between ESE density and intron length, which was most pronounced for constitutive exons. This suggests yet another level of compensation between different types of splicing signals.
The correlations between intron length and ESE density that were readily observable for ESE hexamers, were not seen when we analyzed a different class of ESEs that have been defined as octamer motifs [23
] (Figures S8 and S9; see Additional File 1
). However, this observation is somewhat hard to interpret because the octamer ESEs have been identified in non-coding exons [23
]. Thus, it remains to be determined whether the octamer ESEs are less important in coding than in non-coding exons or they function in a manner different from that of the hexamer exons and independent of intron length.
In addition to the above dependences, we observed a correlation between intron length and sequence conservation in the exon ends, especially, in synonymous positions, throughout the entire length range of introns (Figure ). This suggests that splicing signals other than the ESE hexamers, which were originally defined for short exons [21
], or the ESE octamers [23
] analyzed here, might exist in exon sequences, particularly, those that flank long introns; such signals that remain to be specifically characterized, might comprise a new class of ESEs.
The greater strength of different types of splicing signals around longer introns could evolve either via a neutral evolution route or as an adaptation. The neutral scenario would apply to a case when an intron becomes shorter as a result of deletion that is followed by splice site amelioration. In contrast, when the length of an intron increases due to an insertion, the splicing signals would evolve under positive selection, adapting to the new situation by restoring splicing efficiency.
All the trends in the relationships between the strength of splicing signals, evolutionary conservation of synonymous positions in exon ends and ESE sites, and intron length that are reported here and elsewhere [6
] are manifest in weak correlations that are made statistically (highly) significant by the large, genome-wide size of the analyzed samples of exons and introns. On the one hand, this illustrates the power of genome-scale analysis in detecting subtle but potentially functionally relevant signals in sequences. On the other hand, the dependence of these observations on the vast amounts of data makes the analysis susceptible to systematic biases in that data, e.g., those in nucleotide composition. We attempted to eliminate the potential effects of such biases whenever we could discern them and found the connections withstood the controls, even though some correlations were weakened. The weak correlations between intron size and splice signal strength and exonic sequence features (including ESE density and conservation) suggest that intron definition is governed by multiple signals, some of which remain to be recognized. In particular, it seems likely that some of the signals that are important for efficient splicing of exons flanking long introns reside within those introns. This possibility seems to be compatible with the recent findings that long introns show a greater evolutionary conservation than short introns in Drosophila [35
] and that mammalian long introns are enriched in multispecies conserved sequence elements compared to short introns [36
On balance, despite the weakness of the observed correlations, the coherence of different types of signals uncovered in this study, some of which are limited to single-genome analysis (splice site strength and ESE density) whereas others involve sequence conservation in different genomes, strongly suggests that these observations are functionally relevant for the splicing mechanism rather than spurious. In particular, these findings provide an incentive for an experimental search for new types of ESEs.