|Home | About | Journals | Submit | Contact Us | Français|
Thermodynamic parameters for GU pairs are important for predicting the secondary structures of RNA and for finding genomic sequences that code for structured RNA. Optical melting curves were measured for 29 RNA duplexes with GU pairs to improve nearest neighbor parameters for predicting stabilities of helixes. The updated model eliminates a prior penalty assumed for terminal GU pairs. Six additional duplexes with the 5′GG/3′UU motif were added to the single representation in the previous database. This revises the ΔG°37 for the 5′GG/3′UU motif from an unfavorable 0.5 kcal/mol to a favorable −0.2 kcal/mol. Similarly, the ΔG°37 for the 5′UG/3′GU motif changes from 0.3 to −0.6 kcal/mol. The correlation coefficients between predicted and experimental ΔG°37, ΔH°, and ΔS° for the expanded database are 0.95, 0.89, and 0.87, respectively. The results should improve predictions of RNA secondary structure.
The explosion of biological data in the genomics era has filled databanks with large amounts of genetic information. Understanding of these data and making correlations are vital for maximally advancing the fields of biology and medicine. This necessitates accurate methods in bioinformatics and computational chemistry. One important area that bioinformatics and computational chemistry address is finding, predicting, and determining RNA structure from sequence.1
RNA participates in a variety of cellular functions involving gene expression and regulation. RNA typically folds in a hierarchical way.2,3 Base pairs form to generate motifs such as helixes and loops. Higher order interactions between these features result in three-dimensional structures. On that basis, knowledge of secondary structure is critical for the prediction of tertiary structure. Secondary structure prediction algorithms utilizing experimental thermodynamic data4−9 have relied on nearest neighbor models.10−13 Finding regions of genome sequences that code for structured RNA often also relies on nearest neighbor models.1,14−16 Because RNA molecules and their reverse complements can fold similarly, the thermodynamics of GU pairs provides information about the reading direction because their complement, CA, forms less stable base pairs.17
Prediction of GU pairs is also important because they are the most common non-Watson–Crick pair and have functions in a wide variety of RNAs. For example, GU pairs are found within two helical regions and at the junction of a helix and multibranch loop in eukaryotic 5S rRNA.18,19 A GU pair in the third position of the acceptor stem in tRNAAla20 distorts helix geometry21 and is important in Escherichia coli for recognition by alanine aminoacyl tRNA synthetase.22−24 Local helix geometry due to a conserved GU pair may also be important for binding of a yeast intron with hPrp8 or L32 protein.25,26 The U-rich tail of guide RNAs bind to a purine-rich region in unedited pre-mRNA to generate recurring 5′AGA/3′UUU motifs that may help RNA editing proteins bind to the major groove.27,28 The 5′ leader of HIV-1 can switch between helixes containing GU pairs to promote translation or packaging of its genome.29
GU pairs expose the exocyclic amino group of guanine in the minor groove, presenting a unique site for hydrogen bonding to facilitate function and molecular recognition. For example, a GU amino group at the splice site in the Tetrahymena thermophilia group I intron helps bind and align the splice site30−32 and stabilize the transition state of the splicing reaction.33 Šponer et al. reported a common tertiary interaction involving a GU pair, where the exocyclic NH2 of the G and the 2′OH of the U form hydrogen bonds, respectively, with the 2′OH and carbonyl oxygen of a cytidine in a GC pair of another helix.34
GU pairs can be metal ion binding sites.35−39 Colmenarejo and Tinoco observed that Co(NH3)63+ preferably binds to 5′GU/3′UG and 5′GG/3′UU over 5′UG/3′GU pairs, whereas Mg(H2O)62+ binds tightest to 5′UG/3′GU.40 This preference may explain why 5′UG/3′GU is the most prevalent tandem GU motif in rRNA.41 The propensity for binding metal ions allows design of sequences that bind heavy metals to facilitate solving of X-ray structures.37,38
Prediction of GU pairs often relies on a nearest neighbor model for folding stability. The database of RNA sequences from which GU nearest neighbor parameters were derived12 is relatively small, however, compared to that for Watson–Crick nearest neighbors.11 To expand the database, optical melting experiments were carried out on 29 oligoribonucleotide duplexes. Linear regression analysis on the expanded database provides a revised set of individual nearest neighbor (INN) parameters,42 which are reported herein. The parameters provide stability increments for internal and single terminal GU pairs. Stability increments for additional terminal GU pairs have been reported by Nguyen and Schroeder.43
Oligonucleotides were designed to expand the previous database12 to provide all possible combinations of base pair triplets containing GU pairs flanked by Watson–Crick pairs in different orientations (Table (Table1)1) and to have a substantial representation of each nearest neighbor containing a GU pair. An additional six sequences containing the 5′GG/3′UU doublet provided nine new representations for that motif, which had only one representation. Care was taken to select self-complementary sequences that do not favorably form alternative secondary structures, such as hairpins or loops.
Sequences for the following oligoribonucleotide duplexes were purchased from Integrated DNA Technologies (IDT): r(AGGCUU)2, r(AUGCGU)2, r(AGUCGAUU)2, r(CUGGCUAG)2, r(5′CAGAGGAGAC/3′GUCUUUUCUG), r(CAGUCGAUUG)2, r(CCGAAUUUGG)2, r(CGGAAUUUCG)2, r(CGGAUAUUCG)2, r(CGGGCGUUCG)2, r(CUGGAUUCAG)2, r(GAGAGCUUUC)2, r(GAGGAUCUUC)2, r(5′GAGUGGAGAG/3′CUCAUUUCUC), r(GGUUCGGGCC)2, and r(GUGAAUUUAC)2 (the / denotes a nonself-complementary duplex). Purity was checked by NMR except for those forming duplexes with adjacent GU pairs, which were checked by thin layer chromatography. All other sequences were synthesized and purified as previously described.44 All sequences were desalted with Sep-Pak C18 cartridges (Supporting Information).
RNA duplexes with concentrations from 10–6–10–3 M were melted in 0.5 mM Na2EDTA, 1 M NaCl, and 20 mM sodium cacodylate, pH 7, which maintains a stable pKa over a wide temperature range.45 Absorbance at 280 nm, typically from 15 to 80 °C, was measured on a Beckman Coulter DU 640 spectrophotometer.
Spectra were acquired on a Varian Inova 500 or 600 MHz spectrometer. The buffer for NMR was 80 mM NaCl, 18.8 mM NaH2PO4, 1.16 mM Na2HPO4, 0.02 mM Na2EDTA, pH 6.0, to which 15 μL of D2O was added to provide a lock signal. One-dimensional 1H spectra were acquired with the water 1H signal suppressed with a binomial 1:1 shaped pulse.46 Two-dimensional 1H–1H NOESY and 1H–1H TOCSY spectra were acquired with the water signal suppressed by a WATERGATE-type pulse sequence with flipback.47,48 Two-dimensional 1H–1H NOESY spectra for r(AUGCGU)2 were also measured in D2O.
Melting curves for each duplex were fit to a two-state model with MeltWin 3.552 to derive values for ΔH° and ΔS°. The melting temperature, TM, was plotted against ln(CT/a) to provide another measure of ΔH° and ΔS°:
Here R is the gas constant (1.987 cal K–1 mol–1), CT is the total concentration of strands, and a is 1 for self-complementary duplexes and 4 for non-self-complementary duplexes. Sequences were added to the database if ΔH° values derived from averaging fits of melting curves agreed within 15% with these derived from eq (2), consistent with the two-state model.
Nearest neighbor thermodynamic parameters were obtained with a regression function reported by Xia et al.11 Matrix calculations were performed with R53 and independently verified with Mathematica 8.054 and Octave.55 All three software packages yielded nearly identical results.
Terms representing free energy contributions from non-GU nearest neighbors, that is, helix initiation (ΔG°init), symmetry (ΔG°sym), terminal AU pairs (ΔG°term AU), and Watson–Crick nearest neighbors (ΔG°j (WC NN)),11 were subtracted from the free energy found from the TM–1 vs ln(CT/a) plots (ΔG°i(duplex)) to provide an experimental free energy attributable to the GU components of each duplex:
where i and j are labels for each different duplex and INN parameter, respectively, NN stands for nearest neighbor parameter, and mij is the number of terminal AU pairs. For example,
Here, ΔG°37(GU component) contains four 5′GG/3′CU nearest neighbors and two 5′GG/3′UU nearest neighbors. Values for Watson–Crick nearest neighbors from Xia et al.11 were used because experimental measurements on 22 duplexes not included in the fitting by Xia et al.11 are predicted within experimental error (Supporting Information). Making the new GU parameters consistent with the Xia et al.11 parameters provides compatibility with loop parameters derived with Xia et al.11 nearest neighbor parameters and allows easy adoption by programs using those parameters.
Each experimental duplex ΔG°37 was given an error limit of ±4% to account for systematic errors unless the percent difference between parameters found from TM–1 vs ln(CT/a) and averaged curve fits was greater. For the seven latter cases, this percent difference was doubled to provide an error limit. Error limits for ΔH° were assumed to be 12%.11 The symmetry contribution, 0.43 kcal/mol in ΔG°37, has no error56 and was therefore subtracted from ΔG°37 of self-complementary duplexes before calculating the error limit.
The GU component free energies were placed into M × 1 matrix G, where M is the number of duplexes.
S is an M × N matrix containing the counts of each nearest neighbor doublet in a duplex, where N is the number of GU nearest neighbor parameters being fit. GNN is an N × 1 matrix that contains the nearest neighbor parameters to be derived from G and S.
The general law of error propagation was used to calculate the variances for each duplex.57,58 Multiplication of both sides of eq (5) by an M × M matrix, σ–1, containing the variances in the diagonals yielded error-weighted matrices from which thermodynamic parameters were derived.
The values in GNN are thus Sσ–1·Gσ. The variances of each INN parameter are obtained with singular value decomposition (SVD) (ref (11), Supporting Information). Nearest neighbor parameters for ΔH° were found through the same process, and ΔS° parameters were calculated from ΔS° = (ΔH° – ΔG°)/TM.
Nearest neighbor parameters for Watson–Crick pairs were obtained from fitting published data for 112 duplexes, which included the 90 duplexes that Xia et al. previously fit, and 22 additional duplexes (Supporting Information). The symmetry contribution, if present, was subtracted from each thermodynamic parameter derived from the TM–1 vs ln(CT/a) plot. Matrix calculations were carried out as described above to generate ΔG°37 and ΔH° for each nearest neighbor parameter, with all three software packages yielding similar results.
The F-test was used to test the hypothesis that a least-squares model can fit the dependence of Gσ on Sσ and GNN.59,60 If the F-value is larger than the critical F-value for N and N − v degrees of freedom at the 5% significance level, where N is the number of duplexes and v the number of nearest neighbor parameters, or if the p-value is less than 0.05, then the hypothesis that there is a dependence of Gσ on GNN may be accepted.60
The paired t-test was used to evaluate the significance of the differences between predictions of thermodynamic properties with the updated parameters and those reported by Mathews et al.12 and the difference between experimental values and predictions by each set of nearest neighbor parameters. The difference between each pair of a set with b values of a variable, X, before and after treatment is defined as μ(XD) = μ(X1) – μ(X2), where X2 represents the response of X1 to treatment.61 The null hypothesis states that μ(XD) = 0. To test this and the alternative hypothesis that μ(XD) ≠ 0, the mean and standard deviation of the difference between each block of values is found.
A t-ratio is defined as
If the t-ratio is greater than t-value for (b – 1) degrees of freedom or less than its negative, then the null hypothesis is rejected at the 0.05 significance level.
For example, in using the paired t-test to evaluate how well experimental ΔG°’s are predicted by nearest neighbor parameters, b is the number of duplexes whose ΔG°’s are being tested and XD is the difference between the predicted and experimental ΔG° for each sequence.
The probability density function (PDF), f(t), of the Student’s t-distribution was used as a measure of how significantly a given INN parameter contributes to the model,11,59 with smaller values of f(t) indicating greater contribution,
where Γ is the gamma function, r = N – v degrees of freedom and t = ΔG°j(NN)/σj(NN), that is, the quotient of the free energy of the INN parameter over the estimate of its error. Calculations were carried out with R53 using the anova and t-test functions, and the critical t-value was determined with the qt function in R.
Table Table22 lists results for duplexes in the database used for determination of nearest neighbor parameters for GU pairs. Most of the duplexes are six to eight base pairs in length and have melting temperatures in the 30–70 °C range. For the 29 new duplexes reported here, the average difference between ΔG°37, ΔH°, and ΔS° derived from TM–1 vs ln(CT/a) plots and averaged curve fits are 2%, 7%, and 8%, respectively. Three duplexes, r(AGGCUU)2, r(AUGCGU)2, and r(GUCGUAC/), with TM’s less than 25 °C that were included in the database of Mathews et al.12 were omitted from the new database. Determination of thermodynamics from optical melting curves is difficult when the TM is less than 25 °C.
The results in Table Table22 were fit to a nearest neighbor model for GU pairs after subtracting contributions from Watson–Crick nearest neighbors (eq 3). This method avoids conflating thermodynamic parameters for Watson–Crick pairs with the idiosyncrasies of GU pairs. Published thermodynamics for duplexes with all Watson–Crick pairs (Supporting Information) were used to test published parameters for Watson–Crick pairs.11 Fitting the expanded database of 112 duplexes gave INN parameters within error of the values reported by Xia et al.11 (Table (Table3). Most3). Most free energy parameters did not change by more than 0.05 kcal/mol at 37 °C. Consequently, the GU component values were calculated from eq (3) by subtracting the previously published Watson–Crick thermodynamic values11 so that the GU parameters are consistent with the widely used Watson–Crick parameters.
For fitting GU parameters, duplexes containing the motif, 5′GGUC/3′CUGG, were excluded from the regression due to its poor fit in the nearest neighbor model.12 For the other 70 duplexes, 12 GU INN parameters were initially derived by linear regression, which included a penalty term for terminal GU pairs to correct for the fact that two duplexes with the same nearest neighbors can have different numbers of GU pairs and therefore different number of hydrogen bonds. A similar term was required for terminal AU pairs.11 Fitting of additional parameters would not give a unique fit.42 This 12 parameter fit gave values of −0.02 ± 0.06 kcal/mol and 2.34 ± 1.17 kcal/mol for the terminal GU penalty ΔG°37 and ΔH°, respectively (Supporting Information). The PDF for the terminal GU penalty ΔG°37 and ΔH° were 0.38 and 5.6 × 10–2, respectively, indicating that the term is not statistically significant. Therefore, the data were fit without a terminal GU term. The resulting nearest neighbor parameters are listed in Table Table33.
The free energy parameters at 37 °C for 5′UG/3′AU, 5′UU/3′AG, and 5′AU/3′UG are less favorable than previously reported12 by at least 0.43 kcal/mol. This corresponds to at least a factor of 2 for an equilibrium constant at 37 °C. The ΔG°37 values for each of the tandem GU motifs are more favorable than previously reported. The 5′UG/3′GU nearest neighbor contributes favorably to helix stability by −0.57 kcal/mol, whereas previous data provided an unfavorable increment of 0.30 kcal/mol.12 Similarly increased stability from 0.47 to −0.25 kcal/mol at 37 °C was found for 5′GG/3′UU, which was previously represented by a single duplex (Table (Table33).
Estimated errors of the free energy parameters for most nearest neighbor motifs are less than 0.10 kcal/mol (Table (Table3).3). The p-value for the F-test is less than 2.2 × 10–16, indicating that there is a linear dependence of the free energy of a duplex on the occurrence of each nearest neighbor parameter in it at the 5% significance level.60 The PDF values from the Student t-distribution (Table (Table4)4) are small for ΔG°37 except for the 5′GG/3′UU motif. The relatively large PDF for the 5′GG/3′UU motif may be attributed to the small magnitude of its free energy and large error compared to those of most of the other INN parameters.
Table Table55 lists results for apparently two-state duplexes that were omitted from the database fitted because their TM’s are less than 25 °C. The predicted thermodynamic parameters for r(AGGCUU)2 and r(AUGCGU)2 do not agree well with those measured. The NMR spectra of r(AGGCUU)2 and r(AUGCGU)2 have strong H2′-H6/8 cross peaks and a sequential H2′, H1′-H6/8 proton walk in 2D 1H–1H NMR (Supporting Information) that indicate the duplexes adopt a largely A-form conformation.62 For r(AUGCGU)2, however, the presence of broad on-diagonal peaks and exchange cross peaks in the aromatic region of the 2D spectra and of more imino resonances in a 1D spectrum than the number of imino protons in the sequence indicates the presence of alternate conformations. The presence of broad on-diagonal peaks, particularly for A1H8 and H2, in the aromatic region of the 2D spectra for r(AGGCUU)2 also suggests multiple conformations at 0 °C.
Table Table55 also lists duplexes that do not melt in a two-state manner. There are many possible reasons for this.56,63−65 The average difference between experimental and predicted TM for these sequences is 10.0 °C, while the predicted free energy is, on average, within 1.33 kcal/mol of the experimental free energy (Table (Table5).5). Evidently, the INN model may provide useful predictions for non-two-state sequences even though ΔH° from the van’t Hoff equation is erroneous.64
Using the previous parameters,12 the correlation coefficients between experimental values for ΔG°37, ΔH°, and ΔS° and those predicted for the 70 duplexes in Table Table22 are 0.89, 0.86, and 0.85, respectively. Comparisons of the values of ΔG°37, ΔH°, and ΔS° of the 70 duplexes as predicted with the previous parameters,12 and those in Table Table3yielded, respectively,3yielded, respectively, means of the differences of −0.36 kcal/mol, −1.75 kcal/mol, and −4.5 eu. The paired t-test gives t-values of −3.386, −2.528, and −2.257, respectively, which have absolute magnitudes greater than 1.995, indicating that the two sets of parameters differ with a significance level of 0.05 for 69 degrees of freedom.61 Furthermore, the respective p-values of 1.2 × 10–3, 1.4 × 10–2, and 2.7 × 10–2 are less than 0.05. This again indicates that the new parameters predict the thermodynamics of RNA duplexes significantly differently from those published previously.12
Using the set of GU parameters in Table Table3,3, the correlation coefficients between experimental values for ΔG°37, ΔH°, and ΔS° and those predicted for the 70 duplexes in Table Table22 are 0.95, 0.89, and 0.87, respectively. Comparison of experimental values of ΔG°37, ΔH°, and ΔS° of the 70 duplexes and those predicted with the set of GU parameters in Table Table33 yielded means of differences of −0.05 kcal/mol, 0.58 kcal/mol, and 2.1 eu and t-values of −0.544, 0.590, and 0.686, respectively, which have absolute magnitudes less than 1.995. The corresponding p-values of 0.59, 0.56, and 0.50 are greater than 0.05. These results show that the thermodynamic properties predicted with the new INN parameters are not significantly different from experiment.61 The same analysis yielded means of differences of −0.41 kcal/mol, −1.17 kcal/mol, and −2.4 eu and t-values of −2.752, −1.096, and −0.736, with corresponding p-values of 7.6 × 10–3, 0.28, and 0.46 for ΔG°37, ΔH°, and ΔS°, respectively, when experimental properties were compared with predicted properties using previous parameters.12 Evidently, the expanded database provides improved modeling of the thermodynamics of GU pairs.
While the nearest neighbor model predicts well the ΔG°37 for most of the duplexes in Table Table2,2, there are likely to be other terms that partially control stability. For example, there are four duplexes, r(GGCGUC)2, r(AGUCGAUU)2, r(UCACGUGG)2, and r(CCGAAUUUGG)2 with predicted ΔG°37 values not within 1.0 kcal/moland 20% of the 1/TM vs ln(CT/a) experimental values. No pattern is evident for these duplexes. A series of 1D spectra were acquired for r(CCGAAUUUGG)2 at different temperatures (Figure (Figure1)1) because its predicted free energy is 1.5 kcal/mol more favorable at 37 °C than measured. These spectra show that the imino protons of all but U8, which is in the GU pair, and G10, which is in the terminal base pair disappear with each other, consistent with the duplex melting in a two-state manner. The results suggest that the nearest neighbor model does not include all factors that determine stabilities of duplexes with GU pairs.
The expanded database allows preliminary testing of models beyond the nearest neighbor model. For example, terminal GU pairs could be considered separately43 and a base pair triplet model used for internal GU pairs. Comparison of measured values of ΔG°37 for terminal GU pairs with those predicted from the parameters in Table Table33 give a standard deviation within 0.30 kcal/mol at 37 °C (Supporting Information). For the 16 triplets, 5′WGY/3′XUZ, with WX and YZ as Watson–Crick pairs, 12 measured ΔG°37(GU component) values are within 0.5 kcal/mol of the predicted values and the others within 1.0 kcal/mol (Supporting Information). The nearest neighbor model is apparently a reasonable approximation, and considerably more data would be required to develop a triplet model.
One clear exception to the nearest neighbor model is multiple terminal GU pairs.43 Thus, the parameters in Table Table33 cannot be used beyond the first terminal GU pair at a helix end. Parameters for additional terminal GU pairs have been published by Nguyen and Schroeder.43
To check for expected base pairing, NMR imino proton spectra were measured for 12 duplexes. All had chemical shifts from 10 to 15 ppm (Figure (Figure2).2). Chemical shifts for GH1 and UH3 of GU pairs were relatively upfield (10–12 ppm), consistent with expectations.66 Chemical shifts for UH3 in AU pairs resonated between 13 and 15 ppm and GH1 in GC pairs resonated from 12 to 13.5 ppm, as expected.67,68 The absence of an imino peak for a terminal base pair in r(CUGGCUAG)2 indicates exchange with water. The G3-H1 and U7–H3 resonances of r(CUGGAUUCAG)2 appear to overlap, as evident by the presence of a single large peak. These chemical shift signatures show that the RNA sequences form the expected duplexes.
GU pairs are the most common non-Watson–Crick base pairs in RNA structures. Thus, the thermodynamics of GU pairs are important for finding regions of RNA that are structured,1,14−16,69 predicting the secondary structure12 or determining structure on the basis of chemical modification8,70 and/or NMR data.71
GU pairs can serve as binding sites for proteins or metal ions and participate in tertiary interactions.72,73 Thus, a better characterization of the thermodynamic properties of GU pairs can improve prediction of secondary and tertiary structure and help predict binding sites for metal ions and target sites for therapeutics. For example, GU pairs in group I introns can bind cations, including Mg2+, Co3+, and Os3+.38,40,74,75 Divalent metal ion binding by GU pairs, which have greater negative potential in the major groove than other base pairs, was postulated as important for activating RNA catalysis.76 Divalent ions that interact with a GU pair help catalyze splicing by group I and group II introns77−82 and cleavage by HDV ribozyme.35 Metal ion binding with RNA neutralizes negative potential, which may promote higher order RNA folding.75 The 5′GG/3′UU and 5′GU/3′UG motifs particularly contain greater negative potential in the major groove than their Watson–Crick counterparts.83
Not including sequences containing the 5′GGUC/3′CUGG motif, the database in Table Table22 expands from 35 to 70 the duplexes used to fit nearest neighbor parameters for GU pairs. This expansion includes published data not included in the original database43,84−86 along with 29 new measurements (Table (Table2).2). Two of the original 35 duplexes were removed from the database because their melting temperatures were below 25 °C, which makes it difficult to analyze the melting curves. A third duplex, r(AUCUAGGU)2, was omitted because two-state melting could not be confirmed. The expanded database contains GU pairs flanked by Watson–Crick pairs in all possible orientations (Table (Table1).1). The new set of GU INN parameters were obtained with consideration for propagated errors from experiment and from Watson–Crick nearest neighbor parameters. Errors for the free energies of individual nearest neighbors were less than 0.2 kcal/mol for tandem GU pairs and 0.1 kcal/mol for other GU motifs. The 5′GG/3′UU motif, which was previously represented by a single sequence, was added to the fitting. The favorable free energy of −0.25 ± 0.16 kcal/mol for 5′GG/3′UU is in better agreement with the value of −0.5 kcal/mol used by Mathews et al.12 to optimize secondary structure prediction than with the previous single experimental measurement of 0.47 kcal/mol.
The free energies of formation for many of the duplexes with GU pairs (Table (Table2)2) can be compared with the free energies when the U or G of the GU pairs is replaced with a C or A, respectively, to form GC or AU pairs (Table (Table6).6). Because many of the latter duplexes terminated with a 3′ phosphate, the comparisons assume that the 3′ phosphate has negligible effect on ΔG°37 at 1 M NaCl.87,88 Duplexes containing GC pairs in place of GU pairs are more stable at 37 °C by 1.8 ± 0.8 kcal/mol per GU pair (Table (Table6).6). This is presumably due to the presence of an additional hydrogen bond in GC pairs and unfavorable backbone distortion due to GU pairs. Terminal substitutions all have a less than average effect while internal substitutions have a larger than average effect, as expected if backbone distortion is less important for a terminal GU.
The effect of replacing GU with GC pairs can be compared to replacing AU pairs with GC pairs (Supporting Information). On average, replacing an AU pair with a GC pair stabilized a duplex by 1.5 ± 0.4 kcal/mol per AU pair. In this case, there was no apparent difference between terminal and internal substitutions.
Duplexes containing AU pairs in place of GU pairs are more stable at 37 °C by 0.6 ± 0.7 kcal/mol per GU pair (Table (Table6).6). While the difference is zero within the standard deviation, in only 2 of 11 cases is the GU duplex more stable than the AU duplex and in both cases the difference is within the experimental error of 4%.
Unlike terminal AU pairs, no penalty for terminal GU pairs is required to account for base pair composition. The terminal AU penalty of 0.45 kcal/mol at 37 °C was considered to account for numbers of base pairing hydrogen bonds.11 Thus, the penalty for terminal GU pairs was assumed to be equal to that of AU pairs,12 consistent with wobble GU pairs at the end of a helix having two hydrogen bonds.89 When the terminal GU parameter was included in the reparameterization of GU nearest neighbor thermodynamic parameters, the free energy of each nearest neighbor parameter differed by no more than 0.01 kcal/mol from that calculated without it (Table (Table22 and Supporting Information). The lack of a terminal GU penalty may arise from the flexibility of a terminal GU pair which allows optimization of hydrogen bonding and stacking interactions without incurring the energetic penalty associated with an interior GU distorting the backbone.43 For example, even for an internal GU pair, optimal stability may be found with only one hydrogen bond due to stacking energies.90,91 Thus, flexibility of terminal GU’s may compensate for the difference between the free energy of formation of two and three hydrogen bonds in GU and GC pairs, respectively.
With the exception of 5′GGUC/3′CUGG, the 5′UG/3′GU motif is more stable than 5′GU/3′UG (Table (Table3).3). Available structures show that 5′UG/3′GU contains interstrand stacking between the guanines,90−93 whereas 5′GU/3′UG does not.91,94,95 The favorability of the 5′UG/3′GU motif relative to the 5′GU/3′UG motif is consistent with molecular dynamics (MD) simulations91 that predict a one hydrogen bond GU pair90 predominates in duplexes containing the 5′GU/3′UG motif while a two hydrogen bond model predominates in duplexes containing the 5′UG/3′GU motif. There is also less overlap of negative potentials in 5′UG/3′GU than in 5′GU/3′UG.95 In two different sequences containing the 5′UG/3′GU motif, there is also intrastrand stacking between each GU pair and its Watson–Crick neighbors.92,93 By comparison, the 5′GU/3′UG motif contains less overlap between the GU pairs and Watson–Crick purine neighbors, but has intrastrand stacking between the tandem GU pairs.94 Furthermore, the 5′UG/3′GU motif preserves the A-form of RNA more than 5′GU/3′UG.96
The 5′GGUC/3′CUGG motif is an exception to the above generalizations. NMR spectra and modeling indicate that the GU pairs of r(GAGGUCUC)2 contain two hydrogen bonds,52 whereas the GU pairs in r(GGCGUGCC)2 contain only one hydrogen bond.90 This difference would contribute to the favorable free energy of 5′GGUC/3′CUGG compared to that of 5′GU/3′UG in other contexts, such as 5′CGUG/3′GUGC. Pan et al. saw similar hydrogen-bonding scenarios in MD simulations.91 Additional stability for the 5′GGUC/3′CUGG motif may also be conferred from less overlap of its negative electrostatic potentials between a GC and GU pair than for its related motif, 5′CGUG/3′GUGC.52 These patterns may explain the poor fit of nearest neighbor parameters for the 5′GU/3′UG motif when duplexes containing the 5′GGUC/3′CUGG motif are included in the fit. Alternatively, the extra stability of the 5′GGUC/3′CUGG motif over 5′CGUG/3′GUGC may arise from poor cross-strand overlap between the U in a GU pair and the C in its neighboring GC pair in 5′CGUG/3′GUGC.97 Stacking interactions alone do not contribute to the stability of nearest neighbor motifs comprised of the same base pairs, however, as evident from the comparable stability of 5′UG/3′GU and 5′GG/3′UU. This contrasts with the expectation that the free energy of 5′GG/3′UU is between the other tandem GU motifs because its base stacking is intermediate among them.98 Understanding the interactions responsible for the observed sequence dependence of thermodynamics presents a challenge to computational chemists.
Comparison of ΔG°37 values for GT nearest neighbors in DNA74 with those measured for GU nearest neighbors show that GT nearest neighbors are on average 0.84 ± 0.36 kcal/mol less stable than their GU counterparts. The extra stability of GU relative to GT is also evident from comparisons of ΔG°37 (GU or GT component) for duplexes containing comparable triplet motifs (Table (Table7).7). This may reflect a possible hydrogen bond between the amino group of guanine and the O2′ of uracil,99 which is not possible with DNA. MD simulations utilizing residual dipolar coupling (RDC) restraints suggest that the 5′TG/3′GT motif contains a bifurcated hydrogen bond100 similar to that in the 5′GU/3′UG motif.90,91 Another difference between GT and GU is that the 5′GGTC/3′CTGG motif fits the nearest neighbor parameters for the 5′GT/3′TG motif better than their respective uracil-substituted RNA motifs.74 Consistent with the relative stabilities of GT and GU nearest neighbors, component free energies of GT pairs in duplexes are consistently less favorable than those of GU pairs flanked by the same Watson–Crick pairs (Table (Table77).
The authors thank Zhenjiang Xu for suggesting the paired t-test and Dr. Susan Schroeder for comments on the manuscript.
This work was supported by NIH Grant GM22939 (D.H.T.)
Department of Chemistry, Northwestern University, Evanston, Illinois, 60208, USA.
Roswell Park Cancer Institute, Buffalo, New York 14263.
(I) Thermodynamic parameters for duplex formation of Watson–Crick sequences. (II) Experimental thermodynamic parameters and error limits for newly measured sequences. (III) Component free energies and enthalpies of GU pairs. (IV) Free energies of doublets and triplets containing GU pairs calculated as component ΔG°37 of their sequences. (V) Free energy differences between sequences where GC pair(s) were replaced by AU pair(s). (VI) INN parameters for GU pairs calculated with a separate term for terminal GU pairs. (VII) Probability density function of the Student’s t-distribution for each INN motif with a separate parameter for terminal GU pairs. (VIII) 2D NOESY spectra for r(AGGCUU)2 showing H2′, H1′, and H6/H8 regions. (IX) 2D NOESY spectra for r(AUGCGU)2 showing H2′, H1′, and H6/H8 regions. (X) Desalting procedure for oligoribonucleotides. This material is available free of charge via the Internet at http://pubs.acs.org.
The authors declare no competing financial interest.