|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
We use cryo-electron microscopy to compare 3D shapes of 158 bp long DNA minicircles that differ only in the sequence within an 18 bp block containing either a TATA box or a catabolite activator protein binding site. We present a sorting algorithm that correlates the reconstructed shapes and groups them into distinct categories. We conclude that the presence of the TATA box sequence, which is believed to be easily bent, does not significantly affect the observed shapes.
DNA base pair sequence is believed to influence the structure and deformability of the DNA double-helix (1,2), and thereby affects biological processes, such as DNA packaging (3), DNA loops in prokaryote regulatory complexes (4,5), nucleosome positioning (6–8) and DNA–protein interactions (9–11).
In the nucleosome, 147 bp of DNA wraps for almost two complete turns around histones [see Refs (12,13) for a crystal structure]. The radius of curvature of the double-helix axis is between 4 and 5 nm, whereas the DNA thickness is 2 nm. In this highly bent regime, the DNA sequence strongly affects the nucleosome position along the DNA molecule (6–8), and it is believed that this is due to DNA deformability.
In DNA–protein binding, the contribution of the DNA sequence to the binding affinity is frequently indirect, by opposition to direct recognition where nucleotide bases interact with amino acids through hydrogen bonds. For instance, the Bovine Papilloma Virus BPV-1 E2 binds two sites spaced by 4 bp (10). The DNA spacer does not contact the protein, but its sequence affects the stability of the DNA–protein complex (11). The role of the spacer in the stability of the DNA–protein complex has also been shown in the case of the cyclic AMP receptor protein (CRP) (14,15) also known as catabolite activator protein (CAP) (16).
The TATA box is a short sequence in the promoter region of genes that binds protein complexes and initiates transcription. Cyclization experiments (17–19) showed that free DNA in solution containing a TATA box sequence exhibits greatly enhanced J-factors that can be attributed to strong bends and high flexibility (9). It was shown that prebending of DNA enhances its interaction with TATA box binding protein (TBP) (20). As there are at most a few direct hydrogen bonds between DNA and the TBP (21,22), it is thought that the mechanical properties of the TATA box are probably very important for its function (9). This hypothesis is supported by a recent all-atom computation that predicts mostly indirect recognition between TBP and the TATA box (23).
Studies of DNA cyclization using DNA between 147 and 163 bp in length provided strong indications that the TATA box is highly flexible (9). However, the exact nature of this flexibility is not known. The sequence could behave as a kink and permit high local bending. On the other hand, the bending abilities of the TATA box could be approximately limited to the curvature expected for a 158 bp long DNA circle. A flexible kink should perturb the shape of the minicircle and be easily visible on cryo-electron microscopy (cryo-EM) images. However, if the bending is limited to a curvature similar to the one in the relaxed minicircle, it might not affect the shape of observed minicircles. In this study, we observe and compare shapes of DNA minicircles of length 158 bp in which an 18 bp fragment contains either a TATA box or a CAP (CRP) site.
Cryo-EM allows the observation of DNA molecules in nearly physiological conditions; thin aqueous layers containing suspended DNA molecules are rapidly cooled and cryo-vitrified at such speed that ice crystals do not form (24–26). The frozen sample can be tilted, and one can obtain micrographs of individual DNA molecules visible from two different angles of view. This method has been used to reconstruct the 3D path of individual DNA molecules (27), and to determine DNA persistence length (28).
To observe how DNA shape is influenced by its sequence, it is advantageous to minimize variations due to thermal fluctuations and to visualize molecules as close as possible to their minimal energy shape. DNA minicircles seem to be best suited for this purpose. Because of their short length (close to the persistence length) the closure constraint of minicircles effectively limits the range of possible fluctuations.
On the other hand, the small size of the minicircles (~17 nm in diameter) implies that even nanometer-size errors in the 3D reconstruction procedure significantly affect the reconstructed shapes. It is therefore desirable to use specialized software that can reconstruct the filaments with sub-pixel resolution (29).
As we wish to study sequence-dependent effects, it would be advantageous to know which point in our reconstructed center lines corresponds to which base pair of the sequence, which would require the use of a molecular marker. However, the intrinsic DNA shape can be altered by protein–DNA binding or by binding of specific chemicals that can be used to map specific sequences. For this reason we did not attempt such an approach, and instead visualize totally naked DNA.
Two DNA minicircles constructs are analyzed: t11T15 and c11T15 (9,19). The two minicircles are 158 bp long, and their sequences differ by 14 bp (Figure 1). The TATA box site in t11T15 is replaced by a CAP (CRP) binding site in c11T15 (16). Kahn and collaborators measured the cyclization rate of t11T15 and c11T15 sequences and determined their J-factors (9,19). The J-factor can be interpreted as a measure of the effective concentration of one DNA end in the vicinity of the other end, with orientations and helical twist that allows minicircle closure (30). The J-factor of the t11T15 minicircle is ~3500 nM whereas the J-factor of c11T15 is 95 nM. The t11T15 and c11T15 fragments will be referred to as TATA and CAP, respectively, throughout the text.
The DNA minicircles were immobilized in a 50 nm layer of vitreous ice, at a temperature of −170°C. Images were taken at the magnification of ×53000 and registered on Kodak EM negative plates. The negatives were scanned at 1800 dpi, 8 bit gray-scale. The first image is taken with the sample tilted by −15° and the second at +15°. The tilt axis is vertical in both images presented in Figure 2.
We used the software package developed by Jacob et al. (29) to reconstruct the DNA minicircles shapes from the cryo-electron micrographs. For each minicircle the user traces an initial approximation of the visible DNA path on the two images. A smoothing filter of the images aids in this initial tracing. Our study is blind in the sense that the user does not know the sequence (TATA or CAP) of the minicircle in order to avoid bias in initial path tracing. Given this initial estimate, the program then performs the reconstructions by assuming a 3D curve model. The shape of the curve is optimized such that its 2D projections onto the micrograph planes match with the signals in the two images. The reconstructed curves are output in a list of points expressed in 3D Euclidean space. We re-sampled (using the spline function of Matlab) the output curves with a cubic spline to have 200 points per minicircle, equally spaced within one curve. We then analyzed the shape of 64 reconstructions of TATA and 31 of CAP.
The main part of the code for data analysis was written in Matlab, with some Python scripts. Methods are described together with results in the next section. 3D pictures were produced with VMD (31).
Given three consecutive points A, B and C on a discrete curve, the curvature at B can be approximated by the inverse of the radius of the circle that goes through A, B and C. A kink should induce high curvature in a short portion of the minicircle double-helix axis. We analyzed the distribution of such curvature values in the reconstructed minicircles. Each minicircle provided 200 entries for the curvature measured at each of the 200 indexed points. The curvature distributions of the points belonging to the TATA circles and to the CAP circles are computed separately and compared in Figure 3. For both sequences, the curvature distribution is peaked; the maximum corresponds to the curvature of a 158 bp prefect circle (0.12 nm−1). The distribution of curvature is very similar for both TATA and CAP (Figure 3). Note that the shape data have no reference that indicates the location of the TATA box or CAP (CRP) site sequences.
Figure 4 shows axial paths of reconstructed minicircles that have been translated and rotated so that their center of mass (assuming uniform mass density), and their principal axes of inertia coincide. Such a presentation allows us to visually compare many minicircle shapes at the same time. The resulting picture does not show a clear difference between the shapes of TATA (red) and CAP (blue) minicircles.
Although curvature analysis and visualization did not reveal the presence of a kink in TATA in comparison to CAP minicircles, there may be a more subtle sequence-dependent shape pattern. Therefore, rather than looking for a particular shape, we designed a method to identify groups of similar shapes, and looked whether the sequence correlates with the groups or not. We first chose a distance for the determination of shape similarity, then we clustered the shapes according to their mutual similarities measured in terms of this distance.
Because we do not know the correspondence between the sequence and the curve in each image, in order to estimate the similarity between two minicircle shapes we need to adapt the standard root mean square deviation (RMSD) minimization procedure that is often used to compare the geometries of two solid objects. The standard method is as follows: for two ordered sets of N points x and y, RMSD is the square root of the sum over i of the squares of the Euclidean distances between two corresponding points xi and yi. Then, to eliminate rigid-body motions, one computes a 3 × 3 rotation matrix and a translation vector r which, when applied to x, minimizes the RMSD function defined in Equation 1, producing the best superposition of the two structures:
. A Fortran 95 code given in (32) was used to compute this minimum RMSD.
Our shape-distance function is then defined in Equation 2 via minimization over all possible rigid-body rotations and translations in 3D, plus further minimizations in all shifts of an index (the variable δ), and two curve orientations, clockwise or counter-clockwise (the variable α):
, The additional minimization over δ is necessary in our case because we do not know which point of the discretized curve y should correspond to the first point of the curve x. However, if there is a common pattern between shapes of minicircles, a particular mapping of x onto y should give a minimal RMSD. The minimization over δ in Equation 2 allows all possible phasing differences in index to compete in the fit. Minimization over α recognizes that a given curve can be discretized with two distinct orientations. Except for particular symmetrical shapes, identical curves that happen to be discretized with opposite orientations cannot be perfectly superposed by standard RMSD.
As a matter of implementation the additional minimizations in Equation 2 were achieved by calling the RMSD function given in (32) inside a Matlab loop for all possible shifts δ (δ = 1,…,200 in our data with the index of y to be understood modulo 200), and the two choices of α. The smallest RMSD value found in the loop defines the distance between the two shapes.
To measure similarity or dissimilarity between different reconstructed minicircles it is important to determine the error of reconstruction and to see how much this error could affect the comparison between different reconstructed minicircles. In order to estimate the reconstruction error, we applied our distance function to two reconstructed shapes coming from the same image pair, but obtained by two different users of the reconstruction program. We computed the user error for six image pairs (Figure 5). We find that the average error is 0.9 nm, with SD 0.3 nm.
We analyzed a set of 95 distinct minicircles (64 TATA, 31 CAP) all reconstructed by the same user. We therefore have a set of 4465 (or 95 * 94/2) pairwise shape-distances. Figure 6 gives the normalized histograms, i.e. probability distributions of pairwise distances in three groups: TATA to TATA, CAP to CAP and TATA to CAP. The average shape-distances are 2.03 nm for TATA–TATA (SD 0.57 nm), 1.96 nm for CAP–CAP (SD 0.52 nm) and 1.98 nm for TATA–CAP (SD 0.55 nm). TATA–TATA and CAP–CAP shape-distances are not significantly smaller than TATA–CAP distances. Therefore, we do not observe increased shape similarity between minicircles with the same sequence.
We cannot use classical methods for clustering our shapes, as we do not have a sensible way to represent them as vectors in a multidimensional space. We also do not have reference shapes to build clusters. Accordingly we adopt the reference-free SPIN algorithm (33) that is capable of ordering elements of a set using only their pairwise distances. For an ordered list of shapes and a shape-distance function, there exists a unique shape-distance matrix defined as follows: each element (i, j) of the matrix is the shape-distance between minicircles i and j. By definition, the matrix is symmetric and the elements on the diagonal vanish; the i-th line (or column) is a list of the distances between minicircle i and all others. SPIN finds a permutation of an initial ordered list of shapes that minimizes the elements near the diagonal. If the resulting matrix has a block of low (dark blue) values near the diagonal, with comparatively higher values above and below (and therefore necessarily by symmetry to right and left), the shapes in the block can be considered as clusters. A SPIN sorted shape-distance matrix and the corresponding clusters are represented in Figure 7. Three columns were added on the left of the matrix. They show some properties of the shapes. Each line and each column of the matrix correspond to a minicircle. For each line i of the matrix, the corresponding element i of the column ‘Minicircle type’ shows whether the corresponding minicircle i is of type TATA (gray) or CAP (white). It is clear that the TATA and CAP minicircles are spread throughout each cluster. Similarly, the i-th element of the column ‘Circle’ (respectively ‘Ellipse’) shows the distance between the minicircle i and a circle (respectively an ellipse). The circle diameter is 17.1 nm (corresponding to a perimeter of 158 bp). The longer ellipse axis is also 17.1 nm while the shorter axis is 13.7 nm. These two columns and the lower part of Figure 7 suggest that the method was able to identify clusters of circular and ellipsoid shapes, and to find another non-planar cluster. Stereo images of the cluster 7–15 are presented in Figure 8.
Interestingly, the distance matrix apparently reveals presence of multiple clusters of shapes. It is known that DNA circles with non-uniform sequence have multiple local energy minima (34). For this reason, we believe that our clustering analysis detected sampling of at least two and possibly more energy wells in the configuration space. However, the small difference between the majority of the clusters (comparable with the error of the reconstruction method) warns against over-interpretation of the distance matrix data. Importantly, each detected cluster contains both TATA and CAP minicircles, so that the different clusters seem to be associated with the sequence-dependent features that are shared between the two sequences, e.g. the six phased A-tracts, rather than the differences between TATA and CAP sequences. We therefore conclude that TATA and CAP sequences produce minicircles with similar 3D shapes.
Using cryo-EM we have investigated the effect on the 3D shape of 158 bp long DNA minicircles with identical sequences except for the interchange of TATA and CAP boxes. Although, the TATA minicircles cyclize two orders of magnitude more efficiently than CAP in ligation experiments, we did not detect significant differences in the observed 3D shapes. Analysis of the reconstruction errors revealed that the average user error (0.9 nm) was two times smaller than the average shape-distance between two minicircles (2 nm). We conclude, therefore, that thermal fluctuations ‘blur’ the possible differences in 3D shapes of DNA minicircles induced by the presence of CAP or TATA sequences.
We thank D. Tsafrir and E. Domany for the code of SPIN and their help in using it. We also thank E. Trifonov and D. Demurtas for discussions and helpful advice. This work was partially supported by the grants from the Swiss National Science Foundation 3100A0-103962 and 205320-103833/1, and by the Centre Interdisciplinaire Bernoulli. Funding to pay the Open Access publication charges for this article was provided by the Swiss National Science Foundation, grant number 205320-103833/1.
Conflict of interest statement. None declared.