We developed MetalionRNA, a computational method for the prediction of metal ion-binding sites in RNA 3D structures, using a statistical potential and a grid-based calculation approach. The potential is based on the analysis of known metal ion-binding sites present in 113 RNA structures. As an input, MetalionRNA takes an RNA 3D structure in the PDB format, and returns PDB files with the calculated RNA potential surface and the coordinates of cations predicted for the target RNAstructure.
4.1 Web server
To make our method easily available to the research community, we developed a web server available at http://metalionrna.genesilico.pl
(server mirror is available at http://metalionrna.amu.edu.pl
). The submission form accepts an RNA structure only in the PDB format. Every other file format is rejected and the server displays an adequate error message. One can specify the cation type, the number of cation positions expected to bind to the query structure, the minimal distance between predicted cations, width of the cubic grid, the ionic radius of the cation or use default values. The default cation is Mg2+
. The default number of predicted ions is calculated on the basis of the number of residues in the target structure; the minimal default distance between predicted cations is the one observed in known structures (Supplementary Table S1
), and the default width of the cubic grid C is 0.5Å. The results returned by the server are available as a separate web page, including a file with the predicted cation positions in text and PDB formats, a script to display the predicted cations in the PyMOL viewer, and a PDB file containing the target structure with the calculated potential surface. The page with the output files is kept on the server for 1 week.
The time required for MetalionRNA to return predictions depends mainly on the size of the molecule. Currently, we use a simple queuing system that allows running one prediction at a time. For a tRNA molecule (PDB id: 1EHZ) 76 nt long, with the default number of 7 Mg2+ hits, it takes ~5 min to obtain the results. The server was implemented in Python using the Django web framework.
One of the weaknesses of the statistical approach is the relative paucity of high-resolution crystal structures of RNA molecules with accurately determined cation-binding sites. The MetalionRNA web server once per week (every Saturday at 12 p.m. Central European Time) downloads structures released in the PDB that have resolution better than 2Å. RNA structures containing Mg2+
cations that fulfill the conditions described in Section 2.1
are added to the original training set and the statistical potential is recalculated. In time, the structures with the resolution worse between 2Å and 3Å will be outnumbered by those with the resolution better than 2Å, hopefully leading to a constant improvement of the potential. The MetalionRNA web site allows the user to select whether to perform predictions with the original potential described in this article or with the updated one.
4.2 RNA-metal ion statistical preferences
To calculate the anisotropic statistical potential for RNA-ion contact prediction, we derived statistics for the most common cations from 50 RNA structures containing Mg2+
(182 binding sites), 25 RNA structures containing Na+
(88 binding sites) and 38 RNA structures containing K+
(123 binding sites). The graph showing the statistical potential in B depicts the preferred interaction geometries for direct contacts and solvated Mg2+
ions. The distribution function for the RNA atom pair [P, OP2] and Mg2+
has three peak tuples (the darkest areas). The peak tuples correspond to the three possible states of magnesium binding to RNA (Draper, 2004
). The first peak tuple is present at a distance of ~2Å, with an acute angle of 15–60○
. It corresponds to magnesium ions chelated and partially dehydrated by phosphate groups of RNA. In this state, the Mg2+
ion interacts with RNA atoms directly. The second peak tuple is at a distance of 4–5Å with a bimodal angle distribution. Acute angles (15–90○
) correspond to the water-mediated state, in which the cation retains one layer of hydrating water molecules that in turn interact with the RNA atoms. Obtuse angles (120○
and higher) correspond to cations chelated with the OP2 atom, and the same Mg2+
ions appear as the first peak (in the distance of about 2–2.5Å) for the [P, OP1] pair (data not shown). Finally, the third peak tuple corresponds to a distance of 6–7Å and represents the situation where the Mg2+
ion remains hydrated and interacts with the RNA via a layer of water molecules. For these distances, angles of 30–60○
4.3 MetalionRNA predicts metal ion with high accuracy
In order to assess the accuracy of MetalionRNA, a 5-fold cross-validation test was performed using RNA–metal ion complexes (Supplementary Table S1
). We used a cubic grid C with edge width of 0.25 and 0.5Å, a Mg2+
ionic radius of 0.75Å and a minimal distance between predicted cations of 1.5Å. The results for the cubic grid of 0.5Å edge width are illustrated in the form of ROC plots (RNA-Mg2+
in A, RNA-Na+
in B and RNA−K+
Fig. 3. ROC curves to assess the classification performance of MetalionRNA with the width of 0.5Å for the cubic grid C using (A) the RNA-Mg2+ dataset, (B) the RNA-Na+ dataset, (C) the RNA−K+ dataset, (D) the DNA-Mg2+ dataset and various cut-off (more ...)
The AUC values that describe the degree of successful predictions for the Mg2+ ions were calculated for the following cut-off distances (the maximum distances between a predicted and a real metal ion, for which the prediction was regarded as correct): 0.72, 0.75, 1.0, 1.5, 2.0 and 3.0Å. The ionic radius of Mg2+ is 0.72Å, the other values are multiples of grid width of C. Using these values, the AUC values for the Mg2+ ions were 50, 56, 81, 93, 95, 96% for the grid C of 0.25Å and 62, 62, 81, 95, 96, 97% for the grid C of 0.5Å. The solid line in A illustrates predictions that lie within the ionic radius of Mg2+ (0.72Å) and hence are within the space occupied by the cation in the crystallographic model.
For Na+ ions, AUC values were calculated to be 43, 82, 87, 91, 93% (0.25Å grid) and 47, 78, 85, 88, 91% (0.5Å grid) for the cut-off distances 1.0, 1.5, 2.0, 3.0 and 4.0Å, respectively. The ionic radius of Na+ is 1.0Å. Predictions for Na+ are slightly less accurate than those for Mg2+, most likely because of the smaller number of cations in the training dataset. The solid line in B illustrates predictions within the ionic radius of Na+ (1.0Å). For K+ ions, AUC values were 54, 61, 84, 96, 97% (0.25Å grid) and 54, 61, 81, 97, 98% (0.5Å grid) for the cut-off distances 1.38, 1.5, 2.0, 3.0 and 4.0Å, respectively. The ionic radius of K+ is 1.38. C shows predictions for K+.
We also conducted the predictions and ROC analysis for a set of DNA–Mg2+ complexes (D) using the statistical potential derived from RNA–Mg2+ PDB complexes. Interestingly, our method works for DNA structures that were not considered in the training of the potential: the AUC curve corresponding to the cut-off distances of 0.72, 0.75, 1.0, 1.5, 2.0 and 3.0Å was calculated to be 44, 49, 74, 90, 91, 93% (for the grid C of 0.25Å) and 56, 56, 72, 88, 91, 93% (for the grid C of 0.5Å) respectively. These results are only slightly worse than those for RNA and indicate that our approach captured a general aspect of the metal ion binding by nucleic acids.
FEATURE is another method for predicting metal ions in RNA structures (Banatao et al., 2003
). It applies supervised learning on a training set consisting of positive and negative examples of Mg2+
ion binding sites to create a statistical model that describes the micro-environments surrounding site-bound and diffusely bound cations. To create a statistical model, 126 physicochemical and structural properties that influence or take part in RNA–Mg2+
ion interactions were used, and the method was tested on a 58 nt fragment of Bacillus stearothermophilus
23S rRNA (PDB code 1HC8). To compare the performance of MetalionRNA with that of WebFEATURE, we made predictions for this structure using our default settings, as well as after retraining our potential on the FEATURE training set.
and summarize predictions for seven Mg2+ ions present in the 1HC8 structure. MetalionRNA calculated it for the molecule of that size, with six Mg2+ ions expected to be observed in a crystal structure solved under ‘average’ conditions; hence, the six top-scoring predictions are considered as strong bets, and further positions in the ranking correspond to alternative, low-confidence sites, potentially occupied e.g. at higher Mg2+ concentrations. The six predictions reported with top scores by MetalionRNA with the default potential included four out of the seven Mg2+ ions, identified with accuracy of 0.6–1.9Å. The remaining ions were predicted with ranks 10, 13 and 29. Using a potential calculated from the FEATURE training set, MetalionRNA predicted only two of the seven ions at the first six positions of the ranking, with accuracy of 0.8 and 0.6Å, respectively. The remaining five ions were ranked at positions 8, 9, 21, 29 and 33. FEATURE correctly identified only two site-bound Mg2+ ion positions within its seven top-scored predictions with accuracy of 1.5 and 3.6Å, respectively. The diffuse ions were all scored relatively poorly by FEATURE, all outside the top positions of its ranking.
A list of Mg2+ ions in the 23s rRNA structure (PDB ID: 1HC8) for which predictions using MetalionRNA and FEATURE were done
Fig. 4. Structure of the 23S rRNA fragment (PDB ID: 1HC8) with the experimentally determined positions of Mg2+ cations indicated by white labeled balls. Top-scoring Mg2+ cations predicted by MetalionRNA are shown as black balls. For detailed comparison of predicted (more ...)
MetalionRNA with both variants of the potential were able to identify four out of five diffuse Mg2+ ions much better than FEATURE. The only exception was Mg2+ ion 1160, for which FEATURE found a more accurate match, but only at the 17th position of the ranking, while MetalionRNA reported a reasonable prediction at the sixth position of its ranking (i.e. above the default threshold). Predictions for two of the diffuse ions (1161 and 1172) were reported with relatively low scores by both methods. MetalionRNA also predicted one of the two site-bound Mg2+ ions (*1163) with very high accuracy and high position in the ranking (using our training set: 0.6Å, Rank 3, using the FEATURE training set: 0.6Å, Rank 4). For this cation, FEATURE performed only slightly worse (accuracy 1.5Å, Rank 5 in a separate prediction for site-bound ions alone). The second site (*1167) was predicted by MetalionRNA with accuracy of 3.8Å (Rank 10) and 1.1Å (Rank 8) for the two training sets, while FEATURE reported it with accuracy of 3.6Å, Rank 2 (again, in a separate prediction for site-bound ions). Hence, both methods performed similarly well for site-bound ions. Summarizing, MetalionRNA was able to identify four out of seven true Mg2+ sites in the 1HC8 structure with just two false positives, while FEATURE identified these ions with a much higher number of false positives.
Interestingly, the top-scoring Mg2+ binding site predicted by FEATURE corresponds to a K+ binding site in the 1HC8 structure (*1162, accuracy 1.4Å). MetalionRNA predicted this site at the fourth position of the ranking specific for K+ cations (accuracy 2.2Å), with the three alternative predictions coinciding with the Mg2+ binding sites observed in the experimentally determined structure. Among the Mg2+ binding sites predicted by MetalionRNA, this K+ binding site is found at the 18th position in our ranking (accuracy 1.7Å). This partial overlap of predicted Mg2+ and K+ binding sites suggests that cations compete with each other for binding to the RNA molecule. MetalionRNA does not yet support simultaneous prediction of different ions and does not take the ion concentration into account. Such features will be implemented when the number of high-resolution RNA structures determined at a range of different ion concentrations (and with confidently assigned ions) reaches the level required for statistical significance of training and testing the knowledge-based potential.