Extension of Double-ended Zipper Model
So far duplex opening was only possible from the ends of the duplex. States, in which the duplex zips at the loop position, are essential for the correct reproduction of our experimental results.
The partition function of a duplex Zon (P, L) as a function of loop position P and loop length L can be decomposed as a sum of five elements:
Zzipper (P, L): The hybridized strands zip from both ends. Partition function as presented in the section above.
Zextended,right (P, L): Probe and target strands right of the loop position can undergo every possible binding configuration among each other (not limited to a zipper). Thus, loops of different size in probe and target strand can appear. The duplex part left of the loop position zips to and from the loop position. The free energy of this part is considered in ΔGleft. Figure illustrates Zextended,right. The red and green dashed lines represent hybridized duplex parts. The black dashed lines are denatured (at the end) or they form a loop between probe and target (middle). Here in this particular case, the probe and target strand form a loop of 16 and 11 bases respectively. The two strands reunite after base 28 of the probe strand and base 23 of the target strand, the following 7 bases are hybridized. This results in the free energy ΔG7,28,23.
Zextended,left (P, L): analogous to
Zextended,right (P, L) but opposite side.
Zdouble zipper (P, L): Both parts, left and right from the loop position behave like an independent zipper. To avoid double count of states from adding Zextended,right (P, L) and Zextended,left (P, L), this partition function needs to be subtracted.
Znon-canonical (P, L): This partition function sums over all non-canonical binding states which occur simultaneously on both sides of the loop position. As we show below, this term can in principle be neglected because all of these binding states bind only weakly.
In the full expression for Zextended,right the summation of all possible binding configurations between both strands right of the loop position depends on the zipping state Sk,l of the duplex part left of the loop position. This makes the calculation computation intensive. To reduce computing time of the model, we use Zextended,right approximated by (Figure ):
i and
j mark the positions of the zipper forks in probe and target respectively,
n bases of probe and target strand (
n - 1 NN pairs) of the region to the right of
i and
j are hybridized. Thus, Δ
Gn,i,j is the NN energy of (
n - 1) base pairs which are hybridized from zipper fork position
i in the probe and zipper fork position
j in the target.

is the NN energy of a single hybridized base pair in this region.
The free energy of the duplex part left of the loop position ΔGleft is approximated using the zipper model (6). Since ΔGleft is a function of the loop position P only, and it is independent from the current zipping state Sk,l, computing time is greatly reduced.
We calculate the binding constants
with and without our approximation (13) for Zextended,right(P, L) (and (20) for Zextended,left(P, L) below) for one duplex sequence. To fit the theoretical signals to the experimental data we use a scaling factor C, which links the calculated binding constants to the fluorescent intensity values (C is a free parameter):
Figure shows that our approximation for Zextended,right (13) and for Zextended,right(P, L) (20) is excellent, if C is adjusted properly. If the factor C is the same for the calculation with and without approximation, the red and black curve differ in absolute values but the shape remains very similar (left side in Figure ). By adjusting C, the two curves overlap (right side). The reason for this is that the approximations (13) and (20) neglect some binding states resulting in (slightly) smaller overall binding constants.
In the extended model, it is possible that loops start at some origin

and end at position

. Now we obtain for two SAWs with
M1 and
M1 steps:
ρ(M1, M2) is the probability that two SAWs with number of steps M1 and M2 respectively start at the origin and meet again. Here, we have:

is the number of SAWs with
Mi steps which start at the origin and end at position

. In 3D [
23]:
Constants γ and μ are defined as before. ν = 0, 588 ± 1, 5 · 10-3 is the (universal) metric exponent. #total(Mi) is the total number of SAWs of Mi steps (as defined in equation (9)).
In an analogous manner, Zextended,left is calculated:
Here we have
and finally
For Zdouble zipper(P, L), we have:
And ΔGentropy,double:
In the case where probe and target length match, duplex zipping can only occur if the two strands are perfectly aligned. We consider the initiation energy, the entropic barrier to meet this constraint, as constant. We simply write Ki = ZD and include the initiation energy in a prefactor.
In the case of duplexes with loops, the probe-target length difference ΔL increases the possible conformations of the probe strand, that do not promote duplex initiation. The initiation energy changes accordingly. Neglecting unfolding of the coils for duplex formation, the number of pairing collisions, that do no lead to zipping, grows linearly with ΔL, resulting in an initiation entropy change
L0 is the characteristic length of the problem, which is the persistence length (in our experimental conditions this corresponds to a single base). In the case of a short, loop-forming sequence located in the center of the strand, however, there are two positions, where parallel but shifted probe and target strands can initiate duplex formation. These positions correspond to the matching sequence left and the right from the loop implying a correction of ΔL/L0 by 1/2. However, if the loop forms towards the ends, we are close to the situation of a single strand above. In the following we neglect this dependence on loop position and use a factor 1/2 throughout. Either factor (1 or 1/2) does not drastically modify our result, if the factor C is adjusted accordingly.
Our approximation for ΔSinit tends to overestimate the corresponding initiation energy penalty as ΔL increases. This is because for large ΔL the situation differs: in this case the separated matching sequences are almost independent and the initiation energy tends to its asymptotic value of two independent hybridization events. As a conclusion for large ΔL a weaker dependence of the initiation energy on ΔL can be expected.
From (25), we get the modified binding constant Ki:
The calculation of the hybridization signal is then straight forward.
We note, that the choice of the denominator of equation (26) following from (25) has an impact on the calculated hybridization signals. Our theory could possibly be improved by choosing a different denominator which, however, may be a subtle problem by itself, not the scope of this paper.
Figure shows the comparison between our experimental results and theory. a) Hybridization signals as a function of loop length for one specific loop position. b) Hybridization signals as a function of loop position for one specific loop length. In the figures to the right, we give the 95% confidence intervals for our data points (black) and compare them to our theory (red). This shows that the experimentally observed trends and the reproduction with our model are statistically relevant. The different symbols indicate the signals of the different feature blocks as a function of loop position or length, the solid black line is the experimental average and the red solid line represents the theoretical predictions.
To make the signal dependence on loop length clearer, we present the hybridization signals averaged over all loop positions as a function of loop length and compare them to the predicted signals (upper part of Figure ). The lower part of the same figure shows the signal dependence as a function of loop position after averaging over all loop lengths. The symbols represent the signals of the feature blocks, the solid black line the average signal of all feature blocks and the solid red line represents the predicted signals.
Figure and show that the model reproduces our experimental findings well. Parameters used here were: simulation temperature
Tsym = 317 K, synthesis error rate
p = 0.084, energy penalty for synthesis related defects Δ
gdef,syn = -1 kcal/mol (consideration up to three errors per probe during synthesis, Δ
gdef,syn was determined in [
16]). We use the temperature adjusted NN and MM defect parameters from [
12,
13] and the references therein. Since MM defect parameters are only available for isolated MMs, we include another parameter
MMdef = -2 kcal/mol for the case of two adjacent MMs (we approximate two adjacent MMs as two independent synthesis defects next to each other, therefore:
MMdefect = 2.
gdef,syn. Furthermore, we use the (universal) parameters for a SAW [
23]. The only free parameters are the factor
C = 1.5 · 10
-3 that links the theoretical binding constants
Ki to fit our experimental data of fluorescent signals and the probability for synthesis related defects
p = 0.084 (the latter is not completely free since it is used to check if our theory is consistent with the coupling and deprotection efficiency of the used oligonucleotides).
We note that the partition function
Zdouble zipper(
P,
L) alone already reproduces the approximate shape of the symmetric loop defect profile as shown in Figure (dependence of the hybridization signal on loop position). However, the resulting binding constants are smaller than the ones calculated with
Zextended,left and
Zextended,right respectively.
Zextended,left and
Zextended,right help in reproducing the shape and moreover, the absolute values of the experimental signals (see additional file
4: Hybridization signals resulting from
Zdouble zipper and comparison to
Zextended,right + Zextended,left).
Small differences between theoretical and experimental results regarding the signal dependence on loop position can be explained by the particularities of the duplex sequence under study. Here we look at two differences:
region ranging from loop position 14 to 18: this duplex region has many A/T bases and the distance between two C bases is the largest for the whole sequence. The duplex destabilization of an A/T rich region may be underestimated.
loop position 21: the region has many C bases and the loop bases are inserted after two existing C bases. It has been shown [
22,
29,
30], that degenerated base pairs may reinforce binding considerably. Stabilization by degenerated base pairs is not included in our theory.
Although there are differences between experiment and theory, the deviations are small (see Figure and ). An even better agreement could be obtained by choosing a different dependence of the duplex initiation energy on Δ
L. Our approximation for it (see above) only holds for short Δ
Land we suppose the systematic deviation visible in Figure from theory and experiment to originate from our approximation. As expected, at longer Δ
L, we tend to underestimate the binding constant. To our knowledge, although an often encountered problem, no simple scheme to assess the initiation energy is known. Working out the dependence of the initiation energy between the two regimes discussed above (short and very long Δ
L) is beyond the scope of this paper. Molecular simulations could help to provide better understanding of the nucleation process [
7].
In literature, internal DNA loops or bubbles of total length
l =
l1 +
l2 e.g. occurring in DNA denaturation experiments are often treated as SAWs of the same length returning to their origin (
l1 : unbound bases in probe;
l2 : unbound bases in target) [
24]. Reproduction of our experimental data could not be achieved when the calculation is done in this way, because the calculated loop energy penalties were much too large. Treating a DNA loop as a SAW of length
l =
l1 +
l2 returning to the origin is different from calculating the probability that two SAWs of given lengths
l1 and
l2 start at the same point and meet again at some distance. In the first case, the number of possible conformations is much higher because the constraint is weakened to any pair

with

, not just the given
l1,
l2. The first case could give the same same results if the calculation is done under the constraint that the loop of length
l =
l1 +
l2 reaches the position

where the two loops reunite after
l1 steps similar to the way described in [
31].
This may not always matter so much: the length of the probe sequences used throughout this study is much shorter than the length of DNA strands used in DNA denaturation experiments. Since the free energy of a short DNA strand is small, the size of the loop energy penalties is more crucial.