The structures used to accumulate the statistical distributions employed in the proposed tracing algorithm were extracted from the Nucleic Acid Database (NDB; Berman et al.
) by selecting crystallographic data with a resolution of 1.5 Å or better and structures containing either RNA only or DNA only. By July 2010, the NDB contained 53 structures of DNA in the A
conformation, 79 structures of DNA in the B
conformation, 31 structures of DNA in the Z
conformation, and 61 RNA structures that met these criteria. RNA was not split into A
conformations because too few database structures are in the B
conformation or the Z
conformation. The desired distances and angles were calculated using the program nanalysis
written in C++ for this purpose. The program and its source code can be obtained via
TG’s web site (http://shelx.uni-ac.gwdg.de/~tg
The geometrical data were further processed with the program R
(R Development Core Team, 2010
). The mean angle and its resultant length of angular data were calculated as
-DNA, for which all distributions show two peaks, the data sets were split into two sets in order to maximize the weighted sum of
. For the angle (C1′, P, C1′), this was the case at 81.37°, leaving the first data set with 151 values and the second one with 87 values. For the angle (P, C1′, P), this was the case at 80.50°, leaving the first data set with 155 values and the second one with 83 values.
In order to determine the mean and standard deviations for the distances, their histogram data were fitted to normal distributions
using the Marquardt–Levenberg algorithm as implemented in gnuplot
(Williams et al.
). In the case of Z
-DNA the distributions show two peaks and the sum of two normal distributions was fitted. The resulting mean values and standard deviations are summarized in Table 4. The distance distributions are shown in Fig. 2 and the angular distributions are shown in Fig. 1 of the supplementary material.1