Alternative pre-mRNA splicing endows genes with the potential to produce a menagerie of protein products. After pre-mRNA is transcribed, a complex system of regulation determines which one of several possible versions of mature mRNA will be produced (reviewed in [
1]). Alternative splicing is particularly important in human gene expression, as it affects half or more of human genes [
2,
3]. The diversity-generating capacity of alternative splicing can be staggering: one notable example, the
dscam gene of
Drosophila melanogaster, is hypothetically capable of producing 38,016 unique alternative isoforms [
4]. However, functional roles for most alternative isoforms remain undiscovered.
It has been known for more than a decade that nonsense and frameshift mutations that induce premature termination codons can destabilize mRNA transcripts
in vivo [
5,
6]. First investigated in yeast and humans, NMD was subsequently observed in a wide range of eukaryotes and is now thought to occur in all eukaryotes [
7]. How cells manage to distinguish a premature termination codon from a normal termination codon has been the subject of intense investigation. Important details have emerged that establish the following mechanistic framework model for NMD in mammals (Figure ).
During pre-mRNA processing, the spliceosome removes intron sequences. As this occurs, a set of proteins called the exon-junction complex is deposited 20-24 nucleotides upstream of the sites of intron removal [
8-
11]. The components of this complex serve the dual roles of facilitating export of the mature mRNA to the cytoplasm and remembering the gene structure [
12]. According to the current model, as a ribosome traverses the mRNA in its first pioneering round of translation, it displaces all exon-junction complexes in its path [
13-
16]. For normal mRNAs, whose termination codons are on or near the final exon, the ribosome will have displaced all exon-junction complexes. By contrast, if any exon-junction complexes remain when the ribosome reaches the stop codon, a series of interactions ensues that leads to the decapping and degradation of the mRNA. This model explains the basis of the '50 nucleotide rule' for mammalian NMD: if a termination codon is more than about 50 nucleotides upstream of the final exon, it is a PTC and the mRNA that harbors it will be degraded [
17]. The mechanisms for NMD differ among yeast [
18], flies [
19], and mammals - and may be different still in other eukaryotes.
Degradation of PTC
+ mRNAs is generally thought to occur as a quality-surveillance system -preempting translation of potentially dominant-negative, carboxy-terminal truncated proteins [
20]. PTC
+ transcripts are aberrantly produced in several ways. The somatic recombination that underlies immune-system diversity frequently generates recombined genes whose transcripts contain a PTC [
21]. Inefficient or faulty splicing will often generate a frameshift in the resulting mRNA, inducing a PTC to come into frame. Also, the high processivity of RNA polymerase yields a relatively high error rate, 1 in 10,000 bases [
22,
23], commonly introducing premature stops. DNA mutations are a source of potentially heritable PTCs. It is estimated that 30% of inherited disorders in humans are caused by a PTC [
24]. The numerous diseases whose pathogenesis has been linked to NMD-inducing PTC mutations include aniridia due to the
PAX6 gene [
25], Duchenne muscular dystrophy due to the
dystrophin gene [
26], and Marfan syndrome due to the
FBN1 gene [
27].
In addition to its quality-control role in degrading aberrantly produced PTC
+ mRNAs, NMD has also been shown experimentally to act on a handful of wild-type PTC
+ mRNAs [
28-
35]. In
Caenorhabditis elegans, for example, expression of the ribosomal proteins L3, L7a, L10a and L12 and the SR proteins SRp20 and SRp30b are regulated posttranscriptionally via the coupling of alternative splicing and NMD [
31,
32]. In each case, productive isoforms were shown to be produced
in vivo, as well as unproductive isoforms with a PTC. Regulated splicing to generate the unproductive isoforms is used as a means of downregulating protein expression, as these mRNA isoforms are degraded by NMD rather than translated to make protein. This system, which we have termed regulated unproductive splicing and translation (RUST), is also used in humans [
28-
30]. For example, the SR protein SC35 has been shown to autoregulate its own expression using RUST [
29]. When levels of SC35 protein are elevated, SC35 binds its own pre-mRNA, inducing the production of PTC
+ SC35 mRNA. The PTC
+ SC35 mRNA is destabilized by NMD, resulting in lower levels of SC35 protein. A similar autoregulatory RUST system was also recently discovered to control production of polypyrimidine tract binding protein (PTB) [
35].
In a previous study, we found that 35% of human mRNA alternative isoforms reliably inferred from expressed sequence tags (ESTs) are PTC
+, rendering them apparent targets of NMD (see [
36] and a conference report at [
37]. Therefore, many wild-type alternative mRNA isoforms may not be translated into functional protein, but instead are targeted for degradation by NMD. The vast majority of PTC
+ isoforms identified in that study represent previously unrecognized potential targets of NMD. However, EST databases contain expressed sequence for many isoforms that are otherwise uncharacterized. Therefore, it was not obvious how many of the isoforms identified in that study as PTC
+ were functionally relevant or even previously known. It was also not obvious to what extent those PTC
+ isoforms represented instances of RUST regulation or simply errors or deregulation in pre-mRNA processing. Regardless, it is clear that NMD has a vital role in regulating mammalian gene expression, as inhibition of NMD is embryonic lethal for mouse [
38].
To understand the biological significance of PTC+ isoforms and the prevalence of NMD on wild-type transcripts, it is necessary to expand beyond existing isolated RUST examples, while retaining a focus on functionally characterized genes. For this reason, we analyzed the human alternative isoforms described in the SWISS-PROT database. Common routes for gene isoform sequences to be determined and entered into databases include the cloning of intronless mini-genes and the sequencing of unexpected PCR bands. By either method, gene structure cannot be directly observed, and therefore PTCs may be overlooked. Further computational and experimental analyses will also often be oblivious to these features. Because the cloning and characterization of many isoforms predates our current understanding of NMD action, we hypothesized that unrecognized potential targets of NMD may be present even in curated databases like SWISS-PROT. We found that many of these alternative protein isoforms derive from PTC+ mRNAs. This is particularly surprising as SWISS-PROT is a heavily curated database of expressed protein sequences. According to the current NMD model, these PTC+ mRNAs should be degraded, and therefore the protein isoforms should not be expressed at high abundance. To resolve this apparent conflict, we examined existing experimental evidence and found that, in several cases, results described in the scientific literature are readily explained by NMD action.