Yeast ribosomal protein genes have resisted recent intron loss
Introns are over-represented in the RPGs of both
Candida albicans and
S. cerevisiae
[1],
[2]. While this shared over-representation may reflect selection pressure to maintain RPG introns prior to the divergence of these two species from a common ancestor, it may also reflect the action of selection in more recent history, since their divergence from a common ancestor. This distinction is important, since selection pressure to maintain RPG introns in more recent history is more likely to be relevant to the biology of
S. cerevisiae. To determine if RPGs have resisted intron loss compared to other genes since the divergence of
C. albicans and
S. cerevisiae (~200–800 million years ago
[10]), we assessed the fates of
S. cerevisiae introns in paralogs (a.k.a. gene pairs) that were duplicated ~100 million years ago by whole-genome duplication (WGD)
[11]. To determine the fates of introns after genome duplication, we took advantage of the well-annotated genome of
S. cerevisiae, which has been exhaustively searched for introns
[12],
[13]. With these annotations, we identified 121 intron-containing genes among 554 WGD-derived gene pairs obtained from Yeast Gene Order Browser
[14]. Assuming that intron loss has largely dominated intron evolution in hemiascomycetous yeast species
[15], we inferred intron loss if one of the WGD-derived gene copies had fewer introns than the other. Using this criterion, we calculated the number of apparent intron losses in RPG pairs compared to all other gene pairs. Strikingly, this simple accounting revealed that 16 of 23 non-RPG pairs have a gene with fewer introns than its copy, whereas none of the 46 RPG pairs did. Nonetheless, this analysis ignores intron losses that occurred independently in both gene copies and assumes that intron gain did not occur.
To better assess whether WGD-derived RPG pairs have been biased for either intron gain or loss (including losses in both gene copies), we reconstructed the hypothetical intron distribution of the pre-WGD ancestor that existed prior to the WGD event. For each of the 554
S. cerevisiae duplicated gene pairs, we assigned the presence or absence of an intron in the hypothetical pre-WGD ancestral ortholog based on intron annotations and predictions from the genomes of the pre-WGD (so-called protoploid) species (
C. albicans,
Lachancea waltii,
L. thermotolerans,
L. kluyveri,
Eremothecium gossypii,
Kluyveromyces lactis, and
Zygosaccharomyces rouxii) and the genomes of the post-WGD species (
Vanderwaltozyma polyspora,
Naumovia castellii,
C. glabrata, and
S. bayanus). A complete list of intron predictions and annotations can be found in
Table S1. Our analysis revealed 73 intron-containing genes that were likely present in the pre-WGD ancestor from which the duplicated gene pairs in
S. cerevisiae were descended (). Based on this hypothetical intron distribution of the pre-WGD ancestor, we inferred the number of
S. cerevisiae WGD-derived gene pairs that have gained or lost an intron for each post-WGD gene pair (). From this improved analysis, we identified 5
S. cerevisiae non-RPG pairs that appear to have independently lost introns from both gene copies after gene duplication. This was in addition to 14 non-RPG pairs in which one of two introns were lost (, right and middle columns, respectively). Once again, we inferred no intron losses in
S. cerevisiae RPG pairs (, left column). Thus, RPG introns appear to have been biased against loss in the lineage leading to
S. cerevisiae during the last ~100 million years.
Next, we asked whether intron gains contributed to the bias for introns in
S. cerevisiae RPGs. For a given
S. cerevisiae gene, we inferred that an intron was gained if introns were absent in both the pre-WGD ancestor and the majority of post-WGD orthologous gene pairs. Using this criterion, we did not infer intron gains in any of the
S. cerevisiae RPGs. On the other hand, two introns in non-RPGs (i.e.
USV1 and
BMH2) have possibly been gained in the
S. cerevisiae lineage (
Table S1); however, since both of these introns are located in the 5′ UTR and are not well annotated in other species, it is therefore difficult to be confident of this conclusion. Taken together, the bias for introns in
S. cerevisiae RPG pairs appears to have been dominated not by intron gains in RPGs, but by intron losses in non-RPGs.
Introns repress ribosomal protein gene expression
Having found a bias against RPG intron loss, we sought to determine if RPG introns have a function in gene expression. To mimic the effect of RPG intron loss, we created
S. cerevisiae mutant intron deletion strains (henceforth denoted as Δi). Each Δi mutant was created with a precise deletion of a single RPG intron, such that only an intronless copy of the gene remained at the endogenous locus (See
Methods).
Because RPGs are among the most highly expressed genes in the genome, we tested the model that introns are required in
cis for high levels of gene expression by assessing the expression profiles of 16 Δi mutants compared to a wild-type strain. We also considered the possibility that Δi mutations may affect other genes in
trans, in particular, the WGD-derived gene copies of RPG pairs. To measure changes in expression of the gene from which an intron was deleted (in addition to 124 RPG and 911 non-RPG features) we used custom splicing-sensitive microarrays designed to detect pre-, mature, and total mRNA species (using intron, junction, and exon probes, respectively
[16]). To assess the effect of Δi mutation on gene expression, we plotted the expression change for the intronless gene (, red lines) compared to all the other genes on the microarray (, boxplots). Thus, the most significant expression changes lie outside the whiskers of the boxplot and are, by definition, statistical outliers. Intron deletion mutations, as assessed by microarray, typically had only modest effects on gene expression (, compare red lines to boxplots). Nonetheless, these effects were biased toward increased expression of the intronless gene (14 out of 16), rather than decreased expression ( “up” and “down,” respectively). Moreover, the four most substantial expression changes increased the expression of the intronless gene ( “outlier”). These data suggest that yeast introns are generally not required for the high expression levels of RPGs. Further, only a few genes showed substantial increases in expression, which suggests that splicing may be more inefficient for these genes than most other RPGs.
We also sought to determine if any of the deleted introns were required for splicing regulation. As controls, we deleted the introns of
RPS14A and
RPS14B, as it has been known for some time that S14 binds to the
RPS14B intron (but not the
RPS14A intron) to inhibit splicing and to cause rapid degradation
[7],
[17]. As expected, deletion of the
RPS14B intron led to a substantial increase in its expression compared to the other genes on the microarray ( “outlier”), whereas deletion of the
RPS14A intron had little effect on expression ( “down”). Thus, our microarrays have the sensitivity required to detect the derepression of
RPS14B expression. An unexpected and novel finding is the substantial effect that Δi mutations have on the expression of the two gene copies encoding ribosomal protein S9 (hereafter referred to as S9). Our microarray experiments revealed that
RPS9A and
RPS9B Δi mutations increased the expression of the intronless genes ( “outlier”) and also decreased the expression of the wild-type gene copies (). We hypothesized that the decreased expression of the wild-type
RPS9A and
RPS9B genes was caused by decreased splicing efficiency due to negative feedback. Therefore, we tested whether Δi mutations caused an increase in the ratio of pre-mRNA to total mRNA of the wild-type gene copies by calculating the Intron Accumulation Index of these genes, which is a measure of inefficient splicing
[18]. Of all the mutants tested by microarray, only
RPS9A and
RPS9B showed substantial increases in the Intron Accumulation Index compared to the other intron containing genes on the array (, compare blue lines to boxplots). Taken together, these data suggest that the
RPS9A and
RPS9B genes require introns to repress their own expression. Further, derepression of
RPS9A resulted in increased repression of
RPS9B through splicing inhibition (and vice versa), suggesting that these genes cross-regulate.
Our custom microarray platform is precise; however, it lacks control probe sets needed for highly accurate quantification. As such, our microarrays “compress” fold-changes compared to equivalent determination by qPCR. To validate our most surprising observations, we assessed RPS9A and RPS9B expression by RT-qPCR. Importantly, we designed at least one qPCR primer to the 3′UTR in an effort to maximize specificity and to minimize artifacts caused by primer cross-hybridization to the other gene copy. As expected, qPCR measurements validated our microarray results for both RPS9A and RPS9B genes in the rps9bΔi and rps9bΔi mutants (, second and third columns). In the case of the rps9aΔi mutant, Δi mutation was associated with a substantial increase (>4-fold of wild-type) in RPS9A expression and a modest decrease (<2-fold of wild-type) in RPS9B expression (, second column). Conversely, in the rps9bΔi mutant, Δi mutation was associated with a modest increase (<2-fold of wild-type) in RPS9B expression and a substantial decrease (>8-fold of wild-type) in RPS9A expression (, third column).
Having validated the surprising effects of deleting the
RPS9A and
RPS9B introns, we hypothesized that the genes reciprocally cross-regulate through a shared negative feedback circuit. We made two strong predictions from this hypothesis: 1) deletion of both the
RPS9A and
RPS9B introns should eliminate cross-regulation, and therefore, derepress both gene copies and 2) the wild-type gene copy should compensate for a derepressed copy by an equal and opposite number of transcripts. First, to determine if repression of
RPS9A expression in the
rps9bΔi mutant required the
RPS9A intron (and vice versa), we created a double
rps9a/bΔi mutant and tested the effect on expression by RT-qPCR. As predicted, both
RPS9A and
RPS9B were derepressed in the
rps9a/bΔi mutant (, fourth column). Second, we sought to determine if changes in the number of
RPS9A transcripts were compensated by a nearly equal and opposite change in number of
RPS9B transcripts. We first estimated the percent of transcripts encoding S9 contributed by the
RPS9A and
RPS9B genes (6% and 94%, respectively) from a published RNA-seq data set from a wild-type strain
[19]. In order to calculate the number of transcripts in each Δi mutant, we then simply multiplied the percent of transcripts encoding S9 (as determined by RNA-seq) by the relative change in expression (as determined by qPCR) for each Δi mutant. As predicted for the
rps9aΔi mutant, a substantial relative increase in
RPS9A expression mutant was nearly equally compensated by a modest relative decrease in
RPS9B expression, such that the total number of transcripts encoding S9 was nearly unchanged (, second column). In the
rps9bΔi mutant, however, a modest relative increase in
RPS9B expression mutant was only partially compensated at the expense of nearly all
RPS9A transcripts (, second column). In this case, it appears that
RPS9A defied our prediction and presumably because its contribution to the total number of S9 transcripts was limiting. Lastly, deletion of both introns increased the total number of transcripts encoding S9 to 170% of wild-type levels (, fourth column). Taken together, these data suggest that the
RPS9A and
RPS9B genes reciprocally cross-regulate by a common intron-dependent mechanism. Further, the large relative effects detected for
RPS9A compared to
RPS9B may simply reflect the large difference in expression level between the two gene copies.
Drosophila RpS9 autoregulates through alternative splicing and NMD
Reminiscent of the cross-regulation between
S. cerevisiae RPS9A and
RPS9B genes, several metazoan RPGs have been shown to autoregulate through alternative splicing coupled to NMD (so-called “Regulated Unproductive Splicing and Translation” or RUST): a process in which the synthesis of productively-spliced mRNA is repressed in favor of unproductive mRNA isoforms encoding premature termination codons (PTC+)
[20]–
[23] (reviewed in
[24]). While this process is conserved between distantly related eukaryotes, there is no known overlap between the genes regulated by RUST in yeast and metazoans to facilitate mechanistic comparisons. Intriguingly, an alternatively-spliced
RpS9 PTC+ mRNA isoform was recently identified in
Drosophila melanogaster
[25]. Thus, we considered the possibility that other
RPS9 orthologs autoregulate in a manner analogous to
RPS9A and
RPS9B cross-regulation.
We hypothesized that D. melanogaster RpS9 expression is regulated in response to excess protein production by alternative splicing coupled to NMD. Therefore, we predicted that increased RpS9 expression would result in increased abundance of the PTC+ mRNA isoform. To test this hypothesis, we measured the affect of exogenous RpS9 overexpression and NMD inhibition on alternative splicing of RpS9 messages using RT-qPCR primer sets specific to endogenous RpS9 mRNA isoforms (). We first verified that the previously identified RpS9 PTC+ isoform in S2 cells was degraded by NMD through RT-PCR amplification of RpS9 transcripts from S2 cells incubated with either of two dsRNAs targeting Upf1 (). To then test the effect of increased RpS9 expression on the abundance of the PTC+ mRNA isoform, we exogenously overexpressed a cDNA copy of RpS9 (). In S2 cells overexpressing RpS9, we detected an increase in the abundance of the PTC-containing mRNA isoform (, top panels, compare red and blue points) and a decrease in the total RpS9 expression as compared to the empty vector control (, bottom left panel, compare red and blue points). As expected, we observed a UPF1-dependent decrease in total endogenous RpS9 abundance in response to increased RpS9 expression (, compare bottom left and right panels, blue points). Taken together, these results suggests that Drosophila RpS9 autoregulates by RUST, in which excess expression shifts the balance of alternative splicing from the synthesis of productively spliced messages towards the synthesis of unproductive RpS9 PTC+ messages that are selectively degraded by NMD.
Diverse forms of RPS9 alternative splicing are associated with structured and conserved RNA sequences
We hypothesized that
RpS9 autoregulation had an important function and would thus be conserved in other animals. Further, we hypothesized that conserved RNA structures were involved in the cross-regulation of
RPS9A and
RPS9B in
S. cerevisiae and the autoregulation of
RpS9 in
D. melanogaster, because
E. coli S4 (the bacterial ortholog), requires an RNA structure to autoregulate by translational repression. Therefore, we predicted that
RPS9 orthologs would be associated with alternatively-spliced mRNA isoforms, conserved RNA structures, and PTCs. To identify such messages, we summarized expressed sequence tags (ESTs) data from diverse animals. Indeed, EST coverage extends outside exons and into introns, which support the existence of rare unspliced or alternatively-spliced transcripts (<5% maximum coverage) (, gray bars). To identify ESTs that specifically support alternative splice site usage or cassette exon inclusion, we mapped putative EST exon-exon junctions that spanned both 5′ GT and 3′ AG splice sites (, blue and red bars, respectively). With the exception of
Petromyzon marinus, ESTs from various vertebrates (e.g.
H. sapiens,
Rattus norvegicus,
Xenopus tropicalis,
Danio rerio, and
Oryzias latipes) reveal cassette exons that introduce PTCs from the last canonical intron ( and
Figure S1).
P. marinus and
D. melanogaster ESTs, on the other hand, reveal alternative 5′ splice sites that also introduce PTCs from a homologous intron ( and
Figure S1). Most intriguingly,
Ciona intestinalis ESTs also support alternative 5′ splice site usage, but in a non-homologous intron compared to those of other animals (). Thus, our surveys of animal ESTs suggest that animal
RPS9 orthologs are often alternatively-spliced to utilize RUST. Further, the conservation of alternatively-spliced cassette exons within the last intron among distantly related vertebrates (e.g. ~400 million years between humans and fish
[10]) suggest that these isoforms are functional.
Also consistent with function, PTC positions in
RPS9 orthologs were associated with high nucleotide conservation (). To determine if
RPS9 orthologs were also associated with thermodynamically-stable and structurally-conserved RNA structures, we screened the gene bodies of
RPS9 orthologs for statistically significant RNA structures using RNAz
[26] on alignments obtained from the UCSC Genome Browser
[27]. In order to examine both intronic and exonic sequences, we obtained sets of nucleotide alignments from closely-related groups of organisms: mammals, drosophilids, teleosts, and hemiascomycetous yeasts. Scanning
RPS9 ortholog alignments in 400 bp windows, we identified predicted RNA structures (P>0.9), specifically within the last intron of mammalian, drosophilid, and teleost
RPS9 orthologs, each overlapping with PTC positions (, green lines and red octagons). Similarly, sequence alignments of
RPS9 orthologs from hemiascomycetous yeasts also revealed predicted RNA structures specifically within the single yeast intron, which if unspliced, would introduce a PTC (). Due to the lack of sequences similar to the
C. intestinalis RPS9 gene corresponding to the PTC in its third intron, we did not test this region for conserved elements and predicted RNA structures. In any case, these data indicate the potential for autoregulation among distantly related
RPS9 orthologs through the use of different forms of alternative splicing, perhaps through structured RNA elements.