Alternative splicing can generate a large number of isoforms starting from a single premessage mRNA. A well-known example of production of molecular diversity by alternative splicing is the Drosophila Dscam
gene, which can potentially generate as many as 38000 isoforms (46
). In human, most genes are alternatively spliced (47
). Most (>70%) isoforms change protein coding regions, and therefore potentially produce novel protein products (25
). On this basis, it is frequently argued that alternative splicing provides a mechanism for complex organisms such as human to generate a large number of novel molecular components from a relatively small number of genes (6
The basic assumption in this view is that products of alternative splicing are functional. However, little is known about the protein sequences and resulting protein structure of alternative isoforms. In an effort to decipher the functionality of isoforms by other means, numerous bioinformatics studies have analyzed various properties. It has been found that a small fraction of alternative splicing shows clear signs of functionality, particularly those exhibiting tissue specificity (48
), expression in high abundance (49
) and cross-species conservation (50
). Conserved isoforms especially tend to preserve protein coding frames and are less likely to be subject to NMD (24
). Compared to the large body of literature on the effects of alternative splicing at the message sequence level, relatively little is known about its affect on protein structure and function.
There have been a few studies regarding the impact of alternative splicing on tertiary structure of proteins. Detailed analysis of structural characteristics of alternative isoforms from ENCODE project by Tress et al
) is largely in agreement with our conclusions—the vast majority of isoforms show little indication of being functional and stable. Homma et al
. have analyzed the location of alternative splice sites and relation to SCOP domain boundaries. They also found that variants encoding unstable protein products tend to be species specific and are expressed at a significantly lower levels compared to stable variants (53
). Wang et al
. examined 3D models of alternative isoforms via threading and molecular dynamics, and concluded that at least in some cases isoforms are capable of producing proteins with stable conformations (28
It is clear that the impact of alternative splicing on protein structure, stability and function remains poorly understood. Although there seems to be a general agreement that alternative splicing events conserved across species probably result in stable and functional protein products, there seems to be no consensus for species–specific isoforms. In other work we argue that a large fraction of isoforms are products of occasional splicing mistakes in selection of splice sites (29
). That hypothesis is supported by observations that the increase in number of isoforms increases with expression level and number of introns in a gene; that most isoforms are expressed at low abundance levels and that few show clear tissue specificity (54
). The main principle of the noise hypothesis is that large error rates can be tolerated as long as adequate levels of functional products are produced and toxic effects on the system are avoided. If these requirements are satisfied there will be no further selection pressure to reduce the frequency of alternative splicing, and a great diversity of isoforms can be generated. A prediction of this model is that most non-conserved isoforms will be non-functional, and therefore will tend to disrupt the implied proteins structure in a random manner.
At the sequence level, we find that many alternative isoforms are predicted to produce proteins that are significantly smaller than the corresponding major isoforms. Removing isoforms that are predicted to be subject to NMD does not change this outcome. These findings are in a qualitative agreement with other studies regarding the impact of alternative splicing on protein sequences (28
). Wang et al
. have analyzed alternative isoforms annotated in the SWISS-PROT database and found that deletions account for 57% of all annotated events, while insertions account for only 5% of all annotated splicing events.
Using conservation as a proxy for functionality, we find that small changes in sequence length are more likely to be conserved. Replacements that do not change the protein length show the strongest conservation signal. These observations make sense, since the smaller the change, the less likely it is to be disruptive to protein structure, increasing the likelihood of maintaining function or possibly generating new function. At the level of 3D structure, we compared the impact of in-frame deletions introduced by alternative splicing to that of randomly selected in-frame exon deletions. Random deletions are unlikely to result in a stable protein fold, and so provide a reference set for testing the viability of deletions observed in real isoforms. We find that isoforms observed only in human show the same distribution as the random ones, for all structural parameters. Deletions that are conserved across multiple species tend to be more structurally conservative—the distances between ends of deletions tend to be smaller, they expose less hydrophobic surface and lose fewer contacts. From this observation, we conclude that most species-specific isoforms are unlikely to result in stable conformations.
Our analysis of disease genes did not reveal any surprising results. If alternative splicing had a negative impact on the normal functions of these genes, we should have observed strong selection pleasure to reduce the frequency and severity of such events, since in this set protein function is tightly coupled to fitness. No such pressure was observed. The distributions of number of alternative isoforms, fractional abundance of alternative transcripts, number of NMD isoforms and protein length changes in disease genes were nearly identical to those for all genes. If any pressure exists to reduce frequency of alternative splicing, it is not particularly pronounced in this set of genes.
The evidence presented in this study strongly suggests that the majority of alternative isoforms do not code for functional protein products and have little impact on phenotype, yet they are common, with nearly every gene producing several alternatives. These trends are consistent with noise in the splicing process due to stochastic fluctuations of various splicing factors. Noise is an inherit part of any complex biological process, and selection forces will have optimized the fidelity of the splicing system so as to produce sufficient levels of the functional components and to reduce harmful effects from non-functional products. Evidently these requirements are satisfied at a significant level of noise. An accidental positive aspect of the high level of noise is that it provides an additional pool of variability in which novel functional forms can be discovered.