A ubiquitous characteristic of RNAs is their propensity to undergo posttranscriptional modification. Posttranscriptional processing of RNA produces an exceptional number and structural diversity of modified nucleosides. There are currently 96 naturally occurring posttranscriptional modified nucleosides that have been identified.20
presents the phylogenetic distribution of these modified nucleosides. As noted from this table, tRNAs are the most abundantly modified RNA, and is now known that such modifications play significant structural and functional roles.
Phylogenetic distribution of the posttranscriptional modifications identified in RNAs
Without question, MS has been the primary analytical tool used to identify and characterize posttranscriptionally modified nucleosides.21
Initially, intact RNAs were enzymatically digested to nucleosides, and the resulting mixture could be analyzed using liquid chromatography-mass spectrometry (LC-MS) approaches. While this approach cannot reveal the sequence placement of any modified nucleosides, the combination of chromatographic retention times and molecular mass measurements are sufficient to identify previously characterized modified nucleosides.
More recently, a number of MS-based approaches have been developed to reveal the presence of modified nucleosides that also identify the sequence locations of any modified nucleosides within the original RNA. The analytical strategy is based upon the unique attributes of posttranscriptionally modified nucleosides - with only one exception (i.e., pseudouridine), all posttranscriptionally modified nucleosides have a higher molecular mass than their unmodified counterpart (e.g., methylation results in a 14 Da mass increase). Because mass spectrometry is sensitive to changes in molecular mass, the presence of such modifications in oligonucleotides, such as endonuclease digestion products or intact RNAs, is noted by this increase in molecular mass. The analytical strategy for recognizing modified nucleosides is predicated on calculating predicted molecular masses from the gene or cDNA sequence of the RNA of interest. Subsequent analysis by MS is then used to determine whether the predicted masses are present, which signify no posttranscriptionally modified nucleosides are in that RNA sample, or if anomalous masses are detected, which can then be attributed to the presence of one or more posttranscriptionally modified nucleosides assuming the original gene sequence was correct.
The standard analytical methodology employed is referred to as RNase mapping.22
As typically implemented, isolated RNAs are enzymatically digested with RNA endonucleases (RNases) having high specificity (e.g., RNase T1, which cleaves at the 3′-side of all unmodified guanosine residues) prior to their analysis by MS or tandem MS (MS/MS) approaches. The first approach developed utilized the on-line chromatographic separation of endonuclease digests by reversed phase HPLC coupled directly to ESI-MS.23-26
. The presence of posttranscriptional modifications was revealed by mass shifts from those expected based upon the RNA sequence. Ions of anomalous m/z
values can then be isolated for tandem MS sequencing to locate the sequence placement of the posttranscriptionally modified nucleoside.
Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has also been used as an analytical approach for obtaining information about posttranscriptionally modified nucleosides.22, 27-30
MALDI-based approaches can be differentiated from ESI-based approaches by the separation step. In MALDI-MS, the mass spectrometer is used to separate the endonuclease digestion products which are often analyzed without any prior fractionation occurring by chromatographic means. Because MALDI-MS generates singly charged ions and most RNase digestion products are of lower (1 - 5 kDa) molecular mass, RNase maps are often obtained directly, even from samples as large as ribosomal RNAs.27
During our developments of various RNase mapping protocols, it was noted that in some instances ions of a particular m/z
value would be detected which did not correspond to any predicted RNase digestion products nor could such ions be attributed to posttranscriptional modifications. To clarify the origin of these ions, an analytical approach was developed which incorporated a stable isotope label during the RNase digestion process ().27
Product peaks of complete RNase digestion in 50% 18
O-labeled water can be readily distinguished from incomplete digestion products and contaminants by the presence of a doublet corresponding to 16
O and 18
O incorporation at the 3′-terminus of the RNase product. In this manner, all endonuclease-related products are readily identified by the presence of this doublet.
Figure 1 Endonuclease cleavage mechanism. The 3′-cyclic phosphate (I) is an intermediate in this reaction. Because an oxygen from water present in the buffer is incorporated at the 3′-phosphate (II), a facile means of stable isotope labeling is (more ...)
Several additional advantages of this labeling strategy were then realized. Because the label resides on the 3′-terminus of any RNase digestion product, this label can be used to define all 3′-termini fragment ions during MS/MS analysis.27, 31
This strategy not only simplifies sequence assignment for RNase mapping, but it also facilitates the de novo interpretation of oligonucleotide fragment ions during tandem MS. Additional analytical advantages accrue when this labeling strategy is combined with the high resolution and high mass measurement accuracy (MMA) afforded by Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FTICRMS).31
The constraints introduced by the digestion specificity of endonucleases (number of specific nucleotide known), 3′-phosphate labeling with 18O and MMA, nucleobase composition and posttranscriptional modifications can be identified solely by measuring the molecular mass of each RNase product. At a MMA of 10-ppm, the base composition can be uniquely defined for any unmodified oligoribonucleotide up to the 5-mer level with one exception (A5p vs. G5). Defining the presence of a 3′-terminal phosphate group by use of the 18O-label extends these identifications to the 10-mer level. Moreover, when the number of a particular residue is known, such as is the case when RNase T1 is used (G=1), MMA and knowledge of the 3′-phosphate uniquely define the base composition for oligoribonucleotides below mass 11,000, and there are only 5 isobaric oligoribonucleotide compositions within 10 ppm at the 40-mer level or below.
Because any modification that increases the mass of an oligoribonucleotide, such as methylation, results in an increase in the number of possible base compositions in the absence of any compositional constraints, the effect of MMA and 18
O-labeling on compositional uniqueness of modified oligoribonucleotides was examined.31
With no residue constraints, at a MMA of 5 ppm, a unique base composition can be determined up to the 7-mer level for oligoribonucleotides containing as many as 4 methylations (). When the number of any one particular residue is constrained, the use of 18
O-labeling and MS analysis at a MMA of 10 ppm can uniquely identify methylated oligoribonucleotides up to the 20-mer level so long as there are no more than 4 methylated nucleosides per RNase digestion product. It is anticipated that these strategies will assume a larger role in RNA characterization and provide new opportunities for the development of higher-throughput approaches for identifying posttranscriptional modifications in RNA.32
Selected examples of isobaric RNase T1 fragments whose base composition and methylation status can be uniquely defined at the 5 ppm MMA level
It is worth noting that MS approaches have also been developed to allow for the detection of pseudouridine within RNA samples. Among all posttranscriptionally modified nucleosides, pseudouridine is the most common and abundant modification. However, because it is a structural isomer of uridine (), its presence cannot be inferred by the strategies described above which are all predicated on detecting differences in predicted and measured molecular mass. To overcome this limitation, two general strategies have been developed. The first requires the selective derivatization of pseudouridine, which yields a “mass tag” that is readily identified by MS.28, 30
The second approach takes advantage of the unique C-C (rather than C-N) glycosidic bond of pseudouridine. Because the presence of the C-C glycosidic bond leads to unique fragmentation patterns during MS/MS, an LC-ESI/MS/MS approach was developed using multiple reaction monitoring to identify pseudouridine within endonuclease digestion products.33
Structures of uridine and pseudoridine.