Given three consecutive points A, B and C on a discrete curve, the curvature at B can be approximated by the inverse of the radius of the circle that goes through A, B and C. A kink should induce high curvature in a short portion of the minicircle double-helix axis. We analyzed the distribution of such curvature values in the reconstructed minicircles. Each minicircle provided 200 entries for the curvature measured at each of the 200 indexed points. The curvature distributions of the points belonging to the TATA circles and to the CAP circles are computed separately and compared in . For both sequences, the curvature distribution is peaked; the maximum corresponds to the curvature of a 158 bp prefect circle (0.12 nm−1). The distribution of curvature is very similar for both TATA and CAP (). Note that the shape data have no reference that indicates the location of the TATA box or CAP (CRP) site sequences.
Figure 3 Probability density function of curvature in reconstructed TATA and CAP minicircles (the corresponding profiles are red and blue, respectively). The function is approximated by a normalized histogram counting curvature values within intervals of 0.03 (more ...)
Superposition of DNA minicircles shapes along their principal axes of inertia
shows axial paths of reconstructed minicircles that have been translated and rotated so that their center of mass (assuming uniform mass density), and their principal axes of inertia coincide. Such a presentation allows us to visually compare many minicircle shapes at the same time. The resulting picture does not show a clear difference between the shapes of TATA (red) and CAP (blue) minicircles.
Figure 4 All the reconstructed shapes of DNA are aligned by superposition of their principal axes of inertia. The upper and lower views differ by a rotation of 90° around the horizontal axis: (a) all the 31 CAP minicircles (blue), (b) all the 95 minicircles (more ...)
The shape-distance for curves: minimum RMSD over all rigid-body motions, index shifts and curve orientations
Although curvature analysis and visualization did not reveal the presence of a kink in TATA in comparison to CAP minicircles, there may be a more subtle sequence-dependent shape pattern. Therefore, rather than looking for a particular shape, we designed a method to identify groups of similar shapes, and looked whether the sequence correlates with the groups or not. We first chose a distance for the determination of shape similarity, then we clustered the shapes according to their mutual similarities measured in terms of this distance.
Because we do not know the correspondence between the sequence and the curve in each image, in order to estimate the similarity between two minicircle shapes we need to adapt the standard root mean square deviation (RMSD) minimization procedure that is often used to compare the geometries of two solid objects. The standard method is as follows: for two ordered sets of N
, RMSD is the square root of the sum over i
of the squares of the Euclidean distances between two corresponding points xi
. Then, to eliminate rigid-body motions, one computes a 3 × 3 rotation matrix
and a translation vector r
which, when applied to x
, minimizes the RMSD function defined in Equation 1
, producing the best superposition of the two structures:
A Fortran 95 code given in (32
) was used to compute this minimum RMSD.
Our shape-distance function is then defined in Equation 2
via minimization over all possible rigid-body rotations and translations in 3D, plus further minimizations in all shifts of an index (the variable δ), and two curve orientations, clockwise or counter-clockwise (the variable α):
The additional minimization over δ is necessary in our case because we do not know which point of the discretized curve y
should correspond to the first point of the curve x
. However, if there is a common pattern between shapes of minicircles, a particular mapping of x
should give a minimal RMSD. The minimization over δ in Equation 2
allows all possible phasing differences in index to compete in the fit. Minimization over α recognizes that a given curve can be discretized with two distinct orientations. Except for particular symmetrical shapes, identical curves that happen to be discretized with opposite orientations cannot be perfectly superposed by standard RMSD.
As a matter of implementation the additional minimizations in Equation 2
were achieved by calling the RMSD function given in (32
) inside a Matlab loop for all possible shifts δ (δ = 1,
200 in our data with the index of y
to be understood modulo
200), and the two choices of α. The smallest RMSD value found in the loop defines the distance between the two shapes.
Error of reconstruction measurements
To measure similarity or dissimilarity between different reconstructed minicircles it is important to determine the error of reconstruction and to see how much this error could affect the comparison between different reconstructed minicircles. In order to estimate the reconstruction error, we applied our distance function to two reconstructed shapes coming from the same image pair, but obtained by two different users of the reconstruction program. We computed the user error for six image pairs (). We find that the average error is 0.9 nm, with SD 0.3 nm.
Figure 5 Estimation of the error of reconstruction. (a) The same DNA minicircle is shown from two different angles. In the right image, the sample is rotated by 30° around the vertical axis with respect to the left image. (b) Two reconstructions from the (more ...)
Analysis of shape-distances with respect to TATA and CAP sequences
We analyzed a set of 95 distinct minicircles (64 TATA, 31 CAP) all reconstructed by the same user. We therefore have a set of 4465 (or 95 * 94/2) pairwise shape-distances. gives the normalized histograms, i.e. probability distributions of pairwise distances in three groups: TATA to TATA, CAP to CAP and TATA to CAP. The average shape-distances are 2.03 nm for TATA–TATA (SD 0.57 nm), 1.96 nm for CAP–CAP (SD 0.52 nm) and 1.98 nm for TATA–CAP (SD 0.55 nm). TATA–TATA and CAP–CAP shape-distances are not significantly smaller than TATA–CAP distances. Therefore, we do not observe increased shape similarity between minicircles with the same sequence.
Normalized histograms (i.e. probability density) of the shape-distance values between any two TATA minicircles reconstructed shapes (red), any two CAP shapes (blue) and between one TATA shape and one CAP shape (green).
We cannot use classical methods for clustering our shapes, as we do not have a sensible way to represent them as vectors in a multidimensional space. We also do not have reference shapes to build clusters. Accordingly we adopt the reference-free SPIN algorithm (33
) that is capable of ordering elements of a set using only their pairwise distances. For an ordered list of shapes and a shape-distance function, there exists a unique shape-distance matrix defined as follows: each element (i, j
) of the matrix is the shape-distance between minicircles i
. By definition, the matrix is symmetric and the elements on the diagonal vanish; the i
-th line (or column) is a list of the distances between minicircle i
and all others. SPIN finds a permutation of an initial ordered list of shapes that minimizes the elements near the diagonal. If the resulting matrix has a block of low (dark blue) values near the diagonal, with comparatively higher values above and below (and therefore necessarily by symmetry to right and left), the shapes in the block can be considered as clusters. A SPIN sorted shape-distance matrix and the corresponding clusters are represented in . Three columns were added on the left of the matrix. They show some properties of the shapes. Each line and each column of the matrix correspond to a minicircle. For each line i
of the matrix, the corresponding element i
of the column ‘Minicircle type’ shows whether the corresponding minicircle i
is of type TATA (gray) or CAP (white). It is clear that the TATA and CAP minicircles are spread throughout each cluster. Similarly, the i
-th element of the column ‘Circle’ (respectively ‘Ellipse’) shows the distance between the minicircle i
and a circle (respectively an ellipse). The circle diameter is 17.1 nm (corresponding to a perimeter of 158 bp). The longer ellipse axis is also 17.1 nm while the shorter axis is 13.7 nm. These two columns and the lower part of suggest that the method was able to identify clusters of circular and ellipsoid shapes, and to find another non-planar cluster. Stereo images of the cluster 7–15 are presented in .
Figure 7 (Upper panel) The shape-distance matrix after clustering. Each line of the figure contains information about one minicircle reconstructed shape: the sequence type (first column), the shape-distance between the shape and a perfect circle (second column), (more ...)
Stereo images of the cluster 7–15. Images are presented in ‘side by side’ stereo mode.
Interestingly, the distance matrix apparently reveals presence of multiple clusters of shapes. It is known that DNA circles with non-uniform sequence have multiple local energy minima (34
). For this reason, we believe that our clustering analysis detected sampling of at least two and possibly more energy wells in the configuration space. However, the small difference between the majority of the clusters (comparable with the error of the reconstruction method) warns against over-interpretation of the distance matrix data. Importantly, each detected cluster contains both TATA and CAP minicircles, so that the different clusters seem to be associated with the sequence-dependent features that are shared between the two sequences, e.g. the six phased A-tracts, rather than the differences between TATA and CAP sequences. We therefore conclude that TATA and CAP sequences produce minicircles with similar 3D shapes.