We have critically defined the transcriptional anatomy of the
Math5 gene, and characterized alternatively spliced mRNAs. In contrast to the adult cerebellum,
Math5 mRNA is not significantly spliced in the developing retina. This conclusion is supported by six independent lines of evidence: (1) Northern analysis; (2) RT-PCR analysis of natural RNAs in the presence of graded betaine concentrations; (3) PCR of IVT-derived RNAs; (4) triplex competitive RT-PCR; (5) EST informatics; and (6) ribonuclease protection assays. Our findings differ sharply from the recent report of Kanadia and Cepko
[18]. Three major factors contribute to the technical artifacts observed by these authors: (1) intense secondary structure in the >85% GC-rich segment of
Math5 RNA and cDNA, which blocks the progression of polymerase enzymes, creating a powerful negative selection; (2) RT template switching
in vitro; and (3) the existence of a vanishingly small population of aberrantly spliced
Math5 mRNAs (). In view of these results, further investigation of
Ngn3 splicing may be warranted (
Figure S4).
The GC-rich coding segment of
Math5 () evidently forms a “Gordian knot” of secondary structure (), so dense that it favors the amplification of minor cDNA products, representing less than 1% of
Math5 molecules. G+C sequence bias is a well known problem in cDNA profiling studies
[42],
[43]. The folded hairpin structure of
Math5 mRNA is relaxed in the presence of betaine.
In vivo, local melting is presumably catalyzed by DNA- and RNA-binding proteins, allowing
Math5 replication, transcription and translation. However, the tight RNA secondary structure may have consequences for Math5 protein expression. For example, translation may require specific mRNA unwinding activity, creating another potential mode of post-transcriptional regulation
[44]. Indeed, mRNA hairpins are known to impede ribosome elongation
[45] and G+C content is inversely correlated with translation efficiency
[46]. If translation of the GC-rich
Math5 mRNA were hypersensitive to ribosome functional status, this may contribute to the disruption of RGC development in
Bst/+ mice, which have a mutation in the
Rpl24 riboprotein gene and severe optic nerve hypoplasia
[47].
On the basis of these results, we believe that the most likely explanation for the plethora of deleted
Math5 cDNAs () is RNA template-switching during the reverse transcriptase reaction, at points of sequence micro-homology (,
Table S3)
[33]. Indeed, RT polymerases are required to switch templates during normal retroviral replication, as part of the first and second transfer steps
[48]. Aberrant switching
in vivo can generate intramolecular deletions, and the frequency is positively correlated with the amount of RT pausing
[49] and RNaseH activity
[50]. In practice, template switching and related phenomena are well known hazards in PCR-based expression studies, and have been collectively termed “RT-facts”
[30],
[51],
[52],
[53],
[54].
The process of eukaryotic splicing produces a variety of functional and nonproductive mRNAs during normal gene expression. While alternative splicing greatly extends the genetic repertoire
[55], particularly in the nervous system
[56], a significant fraction of Pol-II transcripts are mis-spliced, such that no protein or stable RNA species is synthesized, similar to the ECO isoform. Frequent errors include exon skipping, intron retention, and activation of cryptic splice sites. The resulting aberrant RNAs may outnumber correctly spliced mRNAs among initial spliceosomal products
[57],
[58]. For protein-coding genes with multiple exons, the majority of aberrant RNAs contain a premature truncation codon (PTC) and are degraded through the nonsense-mediated decay (NMD) pathway
[59]. This is not generally possible for single-exon genes, which require distinct quality control mechanisms to eliminate defective mRNAs
[60]. The intronless class represents 5–15% of mammalian genes
[61],
[62] and includes histones, GPCRs and many Zn finger, HMG, and bHLH domain transcription factors.
The process of splice site recognition is also far more complicated than the local pairing of 5′ and 3′ consensus sequences. It requires the
holo definition of exon or intron elements in context, with integration of multiple splice enhancer and silencer effects
[63],
[64],
[65],
[66]. In this way, intronless genes may have selectively acquired sequence features that resist mRNA splicing
[67],
[68],
[69]. Detailed sequence comparisons of intronless vs. intron-containing human genes have revealed differences in oligonucleotide frequencies and context-dependent codon biases
[67]. The most striking characteristic of intronless genes in this analysis was the overrepresentation of GC-rich 4- to 6-mers, after correcting for base composition. The
Math5 cDNA matches this pattern extremely well (not shown), exhibiting sequence features that are characteristic of intronless genes. Moreover, the GGG triplet, which binds U1 snRNP as an intronic splice enhancer
[70],
[71], is depleted within the
Math5 coding region, despite the high G+C content. These global compositional features are not considered by the Spliceport algorithm that was used by Kanadia and Cepko to predict
Math5 splice sites. This web-based tool performs statistical analysis of
k-mers in a 160 nt window surrounding putative donor and acceptor sites, based on human genome search data
[72]. The analysis predicted the alternative Cb splice acceptor, which is utilized at low frequency in the adult cerebellum (FGA score

=

1.33); however, the Cb donor site was not identified and statistical support for donor sites in the
Math5 transcript was relatively low (max FGA score

=

0.26). Indeed, the mouse genome contains many more weak, potential splice sites than are actually utilized
in vivo.
Among the numerous
Math5 species reported by Kanadia and Cepko, only one PCR product, termed ECO, is compatible with mRNA splicing. On the basis of our results, we believe this solitary cDNA is derived from an aberrantly spliced transcript, which has escaped normal quality control. First, the RNA encodes no protein and has no demonstrated function. In other contexts, long ncRNAs such as
Xist and
Air, have been shown to have regulatory roles
[19], and a small number of bifunctional mRNAs have alternate coding and noncoding isoforms
[21]. Second, the ECO isoform is very rare, representing less than 1% of
Math5 mRNA, and is thus unlikely to have a significant role in regulating
Math5 function or modulating retinal cell fate determination.
An intriguing result from our study is the discovery that 11% of mature
Math5 transcripts in the adult cerebellum are
bona fide spliced mRNAs. These are predicted to encode a shorter Math5 protein, which lacks 20 amino acids from the C-terminus and may exhibit unique molecular properties (
Figure S3). However, its function is not known, and
Math5 mutants have no overt cerebellar phenotype
[17].
Despite the intriguing hypothesis advanced by Kanadia and Cepko, our results show splicing of Math5 mRNA into noncoding isoforms does not occur in the developing retina at levels greater than 1% of transcripts. Further studies are needed to determine the exact mechanism of Math5 action, how progenitors are transformed into neurons, and how noncoding RNAs, including microRNAs, may regulate Math5 expression, RGC development, and the diversification of ganglion cell subtypes.