|Home | About | Journals | Submit | Contact Us | Français|
Ribonucleic acids (RNAs) are continuing to attract increased attention as they are found to play pivotal roles in biological system. Just as genomics and proteomics have been enabled by the development of effective analytical techniques and instrumentation, the large-scale analysis of non-protein coding (nc)RNAs will benefit as new analytical methodologies are developed which are appropriate to RNA analysis. Mass spectrometry offers a number of advantageous for RNA analysis arising from its ability to provide mass and sequence information starting with limited amounts of sample. This Briefings will highlight recent developments in the field that enable the characterization of RNA modification status, RNA tertiary structures, and ncRNA expression levels. These developments will also be placed in perspective of how mass spectrometry of RNAs can help elucidate the link between the genome and proteome.
Ribonucleic acids (RNAs) are involved in key cellular functions in all living cells. Among all known bio-organic molecules within living cells, RNA molecules are the only ones that store genetic information and act as catalysts. A nonexclusive list of significant cellular RNAs includes mRNAs, which are translated into proteins, and the non-protein-coding RNAs (ncRNAs) such as miRNAs, siRNAs, tRNAs, rRNAs, snRNAs and snoRNAs. Remarkably, the world of ncRNAs continues to grow in size and scientific interest. The latest example includes double-stranded siRNA and single-stranded miRNA, a new class of highly conserved ncRNAs whose functions are generally unknown, but are believed to be important for regulating gene expression by targeting homologous mRNA for cleavage or by interfering with their translation, respectively.1 Another example is the recent evidence, which continues to accumulate, demonstrating that snRNA and snoRNA are significant components of RNA processing complexes, assuming roles previously assigned to proteins.2
Recently, new strategies, termed ‘experimental RNomics’, have been developed which demonstrate that the number of ncRNAs in genomes of model organisms is much greater than previously believed.1 While these strategies can identify ncRNA genes, compatible experimental strategies for characterizing ncRNAs at the posttranscriptional level are lacking. Moreover, recent experimental results, while providing fascinating insights into RNA biology, still do not reveal the functional role(s) of many newly discovered ncRNAs. Clearly, analytical approaches for RNA characterization, which are compatible with existing genomic-scale methods, will enable further investigations into the functional significance of ncRNAs.
Mass spectrometry (MS) offers a number of advantages in biomolecule analysis because of its high sensitivity and its ability to provide both mass and structural information. In recent years, MS has become an indispensable tool in proteomics. In a similar fashion, MS has the potential of becoming just as indispensable in experimental RNomics. Because the Human Genome Project created a significant research interest in the field of sequencing of nucleic acids, including RNAs, by mass spectrometry-based approaches, that subject matter has been covered in numerous reviews.3-9 Sequence characterization of oligonucleotides by tandem mass spectrometry requires precursor ion isolation followed by activation through collisions with a target gas,10-16 and has been enabled by the development of software algorithms which allow for the computer-aided interpretation of oligonucleotide fragment ion data.16-19
While RNA sequencing by MS can be considered a mature technique, there are several other areas of RNA analysis where MS offers unique advantages. This Briefings will focus on recent developments in the use of MS to characterize modified nucleosides in RNA, to determine higher order structures of RNA, and to quantify ncRNA expression levels. These developments will also be placed in perspective of how MS of RNAs can help elucidate the link between the genome and proteome.
A ubiquitous characteristic of RNAs is their propensity to undergo posttranscriptional modification. Posttranscriptional processing of RNA produces an exceptional number and structural diversity of modified nucleosides. There are currently 96 naturally occurring posttranscriptional modified nucleosides that have been identified.20 Table 1 presents the phylogenetic distribution of these modified nucleosides. As noted from this table, tRNAs are the most abundantly modified RNA, and is now known that such modifications play significant structural and functional roles.
Without question, MS has been the primary analytical tool used to identify and characterize posttranscriptionally modified nucleosides.21 Initially, intact RNAs were enzymatically digested to nucleosides, and the resulting mixture could be analyzed using liquid chromatography-mass spectrometry (LC-MS) approaches. While this approach cannot reveal the sequence placement of any modified nucleosides, the combination of chromatographic retention times and molecular mass measurements are sufficient to identify previously characterized modified nucleosides.
More recently, a number of MS-based approaches have been developed to reveal the presence of modified nucleosides that also identify the sequence locations of any modified nucleosides within the original RNA. The analytical strategy is based upon the unique attributes of posttranscriptionally modified nucleosides - with only one exception (i.e., pseudouridine), all posttranscriptionally modified nucleosides have a higher molecular mass than their unmodified counterpart (e.g., methylation results in a 14 Da mass increase). Because mass spectrometry is sensitive to changes in molecular mass, the presence of such modifications in oligonucleotides, such as endonuclease digestion products or intact RNAs, is noted by this increase in molecular mass. The analytical strategy for recognizing modified nucleosides is predicated on calculating predicted molecular masses from the gene or cDNA sequence of the RNA of interest. Subsequent analysis by MS is then used to determine whether the predicted masses are present, which signify no posttranscriptionally modified nucleosides are in that RNA sample, or if anomalous masses are detected, which can then be attributed to the presence of one or more posttranscriptionally modified nucleosides assuming the original gene sequence was correct.
The standard analytical methodology employed is referred to as RNase mapping.22 As typically implemented, isolated RNAs are enzymatically digested with RNA endonucleases (RNases) having high specificity (e.g., RNase T1, which cleaves at the 3′-side of all unmodified guanosine residues) prior to their analysis by MS or tandem MS (MS/MS) approaches. The first approach developed utilized the on-line chromatographic separation of endonuclease digests by reversed phase HPLC coupled directly to ESI-MS.23-26. The presence of posttranscriptional modifications was revealed by mass shifts from those expected based upon the RNA sequence. Ions of anomalous m/z values can then be isolated for tandem MS sequencing to locate the sequence placement of the posttranscriptionally modified nucleoside.
Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has also been used as an analytical approach for obtaining information about posttranscriptionally modified nucleosides.22, 27-30 MALDI-based approaches can be differentiated from ESI-based approaches by the separation step. In MALDI-MS, the mass spectrometer is used to separate the endonuclease digestion products which are often analyzed without any prior fractionation occurring by chromatographic means. Because MALDI-MS generates singly charged ions and most RNase digestion products are of lower (1 - 5 kDa) molecular mass, RNase maps are often obtained directly, even from samples as large as ribosomal RNAs.27
During our developments of various RNase mapping protocols, it was noted that in some instances ions of a particular m/z value would be detected which did not correspond to any predicted RNase digestion products nor could such ions be attributed to posttranscriptional modifications. To clarify the origin of these ions, an analytical approach was developed which incorporated a stable isotope label during the RNase digestion process (Figure 1).27 Product peaks of complete RNase digestion in 50% 18O-labeled water can be readily distinguished from incomplete digestion products and contaminants by the presence of a doublet corresponding to 16O and 18O incorporation at the 3′-terminus of the RNase product. In this manner, all endonuclease-related products are readily identified by the presence of this doublet.
Several additional advantages of this labeling strategy were then realized. Because the label resides on the 3′-terminus of any RNase digestion product, this label can be used to define all 3′-termini fragment ions during MS/MS analysis.27, 31 This strategy not only simplifies sequence assignment for RNase mapping, but it also facilitates the de novo interpretation of oligonucleotide fragment ions during tandem MS. Additional analytical advantages accrue when this labeling strategy is combined with the high resolution and high mass measurement accuracy (MMA) afforded by Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FTICRMS).31
The constraints introduced by the digestion specificity of endonucleases (number of specific nucleotide known), 3′-phosphate labeling with 18O and MMA, nucleobase composition and posttranscriptional modifications can be identified solely by measuring the molecular mass of each RNase product. At a MMA of 10-ppm, the base composition can be uniquely defined for any unmodified oligoribonucleotide up to the 5-mer level with one exception (A5p vs. G5). Defining the presence of a 3′-terminal phosphate group by use of the 18O-label extends these identifications to the 10-mer level. Moreover, when the number of a particular residue is known, such as is the case when RNase T1 is used (G=1), MMA and knowledge of the 3′-phosphate uniquely define the base composition for oligoribonucleotides below mass 11,000, and there are only 5 isobaric oligoribonucleotide compositions within 10 ppm at the 40-mer level or below.
Because any modification that increases the mass of an oligoribonucleotide, such as methylation, results in an increase in the number of possible base compositions in the absence of any compositional constraints, the effect of MMA and 18O-labeling on compositional uniqueness of modified oligoribonucleotides was examined.31 With no residue constraints, at a MMA of 5 ppm, a unique base composition can be determined up to the 7-mer level for oligoribonucleotides containing as many as 4 methylations (Table 2). When the number of any one particular residue is constrained, the use of 18O-labeling and MS analysis at a MMA of 10 ppm can uniquely identify methylated oligoribonucleotides up to the 20-mer level so long as there are no more than 4 methylated nucleosides per RNase digestion product. It is anticipated that these strategies will assume a larger role in RNA characterization and provide new opportunities for the development of higher-throughput approaches for identifying posttranscriptional modifications in RNA.32
It is worth noting that MS approaches have also been developed to allow for the detection of pseudouridine within RNA samples. Among all posttranscriptionally modified nucleosides, pseudouridine is the most common and abundant modification. However, because it is a structural isomer of uridine (Figure 2), its presence cannot be inferred by the strategies described above which are all predicated on detecting differences in predicted and measured molecular mass. To overcome this limitation, two general strategies have been developed. The first requires the selective derivatization of pseudouridine, which yields a “mass tag” that is readily identified by MS.28, 30 The second approach takes advantage of the unique C-C (rather than C-N) glycosidic bond of pseudouridine. Because the presence of the C-C glycosidic bond leads to unique fragmentation patterns during MS/MS, an LC-ESI/MS/MS approach was developed using multiple reaction monitoring to identify pseudouridine within endonuclease digestion products.33
In addition to the above-described developments for obtaining the primary structure (i.e., sequence) of RNAs, mass spectrometry is playing an ever-increasing role in the determination of their secondary and tertiary structures. For ncRNAs, structure is intimately related to function, and RNAs play pivotal roles in protein synthesis as well as other cellular processes which involve conformational changes. To date, high resolution RNA structures are generally determined by nuclear magnetic resonance (NMR) or x-ray crystallography. Although these techniques can reveal RNA structural information at the atomic level, obtaining similar information from dynamic assemblies is both difficult and time consuming. Moreover, these two techniques suffer from intrinsic limitations with respect to sample size and crystallization requirements.
A common biochemical approach for examining RNA structures and the conformational changes they experience during interactions with other biomolecules is to probe an RNA with a variety of chemical and/or enzymatic reagents.34, 35 As typically implemented, these approaches require end-labeling of the RNA with a fluorescent or radioactive tag, followed by strand cleavage which is analyzed using polyacrylamide gel electrophoresis (PAGE). This approach can suffer from limitations arising from the labeling step, which renders only those cleavage products possessing a labeled end detectable, and from the PAGE step, which has inherent limitations associated with resolving and/or identifying the cleavage products.
Recently, Fabris and co-workers migrated these probing approaches to a mass spectrometry-based readout platform, leading to their development of an MS3D strategy applicable to RNA structures.36-39 They were able to probe RNA structures and RNA-protein interactions by using the solvent-accessibility reagents dimethylsulfate (DMS), kethoxal (KT) and 1-cyclohexyl-3-(-2-morpholinoethyl)-carbodiimide metho-p-toluene sulfonate (CMCT) in combination with ESI-FTICRMS. These structure probes form covalent adducts with specific nucleobases that are not involved in base pairing or other interactions. The probing results using these reagents correctly reflected the accessible bases in RNA with stable secondary structure (Figure 3).37
As with the identification of posttranscriptional modifications, Fabris also demonstrated that RNase mapping can be used to pinpoint any solvent-accessible base positions by comparing mass values obtained before and after solvent probing. The high resolution and MMA of FTICRMS was exploited to develop a higher-throughput probing strategy based on the simultaneous application of multiple chemical probes to the RNA of interest.36, 38 These researchers have used their RNA-specific MS3D approach to characterize RNA hairpins of the HIV-1 packaging signal and their complexes with the nucleocapsid protein p7,37 as well as the structure of the putative feline immunodeficiency virus ribosomal frameshifting pseudoknot.39 This latter publication is significant as it highlights the combination of rigorous experimental data obtained from RNA structures and the use of molecular modeling to highlight structural features that can not be appreciated by examining solvent protection data alone. It is anticipated that this, and other MS3D techniques,40 will assume greater importance in RNA characterization in the future.
The final area in which recent developments in MS offer new experimental approaches of interest to those involved in RNA characterization is the move towards quantitative measurements by MS. Most current methods for RNA quantification target total RNA or mRNA and generally involve reverse transcription to DNA, which is then analyzed.41 A limitation of such methods is that after reverse transcription, qualitative and quantitative information regarding the original RNA editing events (e.g., posttranscriptional modification) is lost. Alternative approaches for quantifying at the RNA level often require time-consuming 2-D electrophoretic or chromatographic separation of individual RNA species, followed by radioactive labeling for identification.42 Correlating with the difficulty of such techniques is the fact that very few organisms (Escherichia coli, Saccharomyces cerevisiae and Mycoplasma capricolum) have had their entire set of tRNAs (one of the most abundant ncRNAs in the cell) characterized and quantified.43-45
An MS-based approach for RNA quantification generally involves total hydrolysis of RNA to nucleosides and subsequent quantification using LC/MS.46, 47 As stable isotope labeling in combination with MS has become a routine quantification approach in proteomics, MS-based quantification of RNA at the oligonucleotide level using 18O labeling has recently been demonstrated in the similar manner done for proteomics.48 One RNA sample is digested with RNase T1 in 18O-labeled (“heavy”) water and the second RNA sample is digested with RNase T1 in normal (“light”) water. The two samples are then combined and analyzed by MALDI-MS. Relative ion abundances of the light- and heavy-water digestion products, which are separated by 2 Da due to the isotopic mass of 18O, reveal relative quantification information from the two RNA samples (Figure 4). The accuracy and reproducibility of this approach has been reported by examining 18 known RNA samples and 4 unknown RNA samples. The coefficients of variation for quantification were found to be generally below 15% when using MALDI-MS. The approach yields accurate quantitative information for heavy-to-light ratios greater than 1:2.48 Although MALDI-MS is used in the approach, it should be compatible with ESI-MS as well. This approach allows relative quantitation of the original intact RNA and should be readily applicable to the quantitative determination of posttranscriptional modifications.
The three areas summarized above provide a foundation by which MS can reveal significant information relating to the structures and expression levels of ncRNAs. Without question, understanding the mechanisms by which genomes yield proteomes will require analytical tools sensitive to changes at the RNA level. For example, posttranscriptional modification has been shown to ensure the proper formation of a coactivator complex required for nuclear receptor signaling.49 As the Systems Biology paradigm begins guiding further investigations of biological processes, the approaches discussed above will become more important and must be capable of even higher-throughput levels of analysis. As readily accessible databases have facilitated the utilization of MS in proteomics, so can it be assumed that a necessary next step will be to coordinate DNA, RNA (including posttranscriptional status) and 3D databases so that all investigators can apply these technique developments to organisms and systems of interest in their own labs. With these tools and techniques, the link between the genome and proteome can be brought to light.
Financial support of PAL in these areas has been provided by the National Institutes of Health (GM58843) and the University of Cincinnati.