Here we have shown a phylogenetically conserved mechanism of 5' ss selection by shifted base-pairing to U1 snRNA, with important implications for genomics, evolution, and human disease. Shifted base-pairing provides a basis for the efficient recognition of a subset of 5' ss that are predicted to be very weak (). This unprecedented mechanism also reveals that the interaction between the 5' ss and U1 is not as rigid as previously believed, allowing for alternative base-pairing arrangements that result in efficient splicing. The plasticity of the interaction between the 5' ss and U1 is probably tolerated because the U1 snRNP defines the 5' ss early on, and is displaced from the spliceosome prior to catalysis
1,31. Furthermore, the 5' ss and U6 snRNA do not appear to show such base-pairing flexibility. Shifted base-pairing between atypical 5' ss and U6 would imply that an extra nucleotide has to be inserted between the 5' ss/U6 helix and the scissile bond. Since the 5' ss/U6 helix is at the spliceosomal catalytic core
35, subtle perturbations of the positioning of this helix could impair catalysis. Thus, whereas U1 has enough flexibility to recognize the atypical 5' ss in a shifted register, U6 probably needs to base-pair in the conventional register to allow the first trans-esterification step to occur at the correct position.
Early in splicing, 5' ss and neighboring sequences are also bound by proteins that influence base-pairing to U1 and hence 5' ss selection
22. For instance, the U1-snRNP-specific polypeptide U1C binds to the 5' ss prior to base-pairing with U1
36,37. Shifted base-pairing between the 5' ss and U1 could also rely on proteins by mechanisms that might differ from those for canonical base-pairing. In addition, proteins involved in 5' ss selection perhaps account for the differences in splicing patterns seen for different mutations at atypical 5' ss, as well as for the differences in rescue by suppressor U1s ( and ,
Supplementary Fig. 3 and 4 online).
We ruled out the possibility that atypical 5' ss are recognized by the U1 snRNA variant U1A7
26 instead of U1. We have shown that suppressor U1A7 snRNAs did not rescue mutations at atypical 5' ss (
Supplementary Fig. 5 online), and that the U1A7-specific decoy D7 did not compromise recognition of any 5' ss (). Since these atypical 5' ss were the most likely 5' ss to be recognized by U1A7, considering their perfect complementarity (11 bp), our data also suggest that U1A7 is unlikely to function in splicing. Nevertheless, it remains possible that U1A7 is involved in processes other than splicing, as is U1
18,38,39, or that other U1 variants
26 play a role in 5' ss selection.
A mechanism distinct from shifted base-pairing was proposed for one unusual intron in the
HOP2 gene in
S. cerevisiae40. Mutational analysis of this non-canonical 5' ss suggested that it is recognized via an alternative base-pairing arrangement with U1, involving a bulged nucleotide at position +2 or +3 of the 5' ss. In the case of the human atypical 5' ss we analyzed here, our mutational analyses and suppressor U1 data for position −1 (, lanes 4,5) rule out the possibility of a bulged nucleotide in the interaction between these atypical 5' ss and U1: the rescue of the −1G mutation in
SMN2 by the U1 suppressor C10 indicates that the exonic positions of the atypical 5' ss base-pair to U1 in the shifted register. This observation rules out a base-pairing register between the atypical 5' ss and U1 that involves a bulged nucleotide at the 5' ss, as this arrangement implies that position −1 would not base-pair to position 10 of U1.
Our study leaves open the possibility that other subclasses of atypical 5' ss base-pair to U1 in other 'shifted' registers. We searched SpliceRack
6 for other base-pairing arrangements between 5' ss and U1, by shifting the 5' end of U1 by two or three positions downstream, as well as by shifting it by one-three positions upstream (data not shown). We found very few (15 or less) 5' ss for each of these categories. Furthermore, most of these 5' ss can establish a similar number of base pairs to U1 in the canonical register, as opposed to the atypical 5' ss analyzed in this study (
Supplementary Table 2 online). We conclude that if other shifted base-pairing arrangements between naturally-occurring 5' ss and U1 actually occur, the number of 5' ss recognized by these putative mechanisms should be far lower than the counts for atypical 5' ss presented here (). Finally, we did not find any obvious candidate 5' ss that could be recognized by shifted base-pairing to U11 snRNA or to the other two U1 variants
26 (data not shown).
Interestingly, a +5 A to G mutation at the atypical 5' ss (AGA/GUUAAGUAU) in intron 2 of the human
RARS2 gene results in exon 2 skipping and is associated with pontocerebellar hypoplasia
41. The pathogenic effects of this mutation, which paradoxically changes a non-consensus to a consensus nucleotide, can now be explained by weakening of shifted base-pairing between this 5' ss and U1: an A-Ψ base pair at position +5 is substituted by a weaker wobble G-Ψ base pair in the shifted register. Indeed, we found that this transition at a similar atypical 5' ss tested in the
SMN1/2 context compromised exon 7 inclusion, and exon 7 inclusion could be partially rescued by the U1 suppressor C5, which restored shifted base-pairing (, lanes 8 and 9). Thus, shifted base-pairing can explain the effects at the molecular level of the +5 A to G mutation in intron 2 of the human
RARS2 gene. These observations further strengthen the shifted base-pairing hypothesis, and highlight its implications for molecular diagnosis of 5' ss mutations
10,41,42.
Atypical 5' ss that are recognized by shifted base-pairing to U1 snRNA are found in a wide range of eukaryotic genomes. Even though the estimated number of these atypical 5' ss in the genome is rather low at present, further experimental analysis of the tolerance of mutations at these 5' ss will very likely expand the set of predicted atypical 5' ss. Furthermore, experimental analysis of the numerous 5' ss sequences that can potentially base-pair to U1 with similar stability in both registers should allow a reassessment of their mechanism of recognition. In addition, characterization of this alternative mechanism of 5' ss selection should prompt a recalculation of the 5' ss motifs recognized in each base-pairing register, as these two categories of 5' ss should have different consensus motifs (). This in turn could lead to improved splice-site prediction tools, considering that all current 5' ss scoring methods estimate these atypical 5' ss to be very weak (). Finally, this study should facilitate the development of improved algorithms to find genes and exons in sequenced genomes, as well as to predict the effects of disease-causing mutations and SNPs that map at these atypical 5' ss.