Home | About | Journals | Submit | Contact Us | Français |

**|**BMC Biophys**|**v.4; 2011**|**PMC3262748

Formats

Article sections

- Abstract
- Background
- Methods
- Results and Discussion
- Conclusions
- Authors' contributions
- Supplementary Material
- References

Authors

Related links

BMC Biophys. 2011; 4: 20.

Published online 2011 December 13. doi: 10.1186/2046-1682-4-20

PMCID: PMC3262748

Christian Trapp: ed.dnalraas-inu.kisyhp@ppart.naitsirhc; Marc Schenkelberger: ed.dnalraas-inu.kisyhp@regrebleknehcs.m; Albrecht Ott: ed.dnalraas-inu.kisyhp@tto.thcerbla

Received 2011 August 2; Accepted 2011 December 13.

Copyright ©2011 Trapp et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been cited by other articles in PMC.

DNA is a carrier of biological information. The hybridization process, the formation of the DNA double-helix from single-strands with complementary sequences, is important for all living cells. DNA microarrays, among other biotechnologies such as PCR, rely on DNA hybridization. However, to date the thermodynamics of hybridization is only partly understood. Here we address, experimentally and theoretically, the hybridization of oligonucleotide strands of unequal lengths, which form a bulged loop upon hybridization. For our study we use in-house synthesized DNA microarrays.

We synthesize a microarray with additional thymine bases in the probe sequence motifs so that bulged loops occur upon target hybridization. We observe a monotonic decrease of the fluorescence signal of the hybridized strands with increasing length of the bulged loop. This corresponds to a decrease in duplex binding affinity within the considered loop lengths of one to thirteen bases. By varying the position of the bulged loop along the DNA duplex, we observe a symmetric signal variation with respect to the center of the strand. We reproduce the experimental results well using a molecular zipper model at thermal equilibrium. However, binding states between both strands, which emerge through duplex opening at the position of the bulged loop, need to be taken into account.

We show that stable DNA duplexes with a bulged loop can form from short strands of unequal length and they contribute substantially to the fluorescence intensity from the hybridized strands on a microarray. In order to reproduce the result with the help of equilibrium thermodynamics, it is essential (and to a good approximation sufficient) to consider duplex opening not only at the ends but also at the position of the bulged loop. Although the thermodynamic parameters used in this study are taken from hybridization experiments in solution, these parameters fit our DNA microarray data well.

The hybridization process - the formation of the well-known double-helix structure from two complementary nucleic acid strands (such that A · T and C · G base pairs are formed) - is pivotal to the living organism. Among other important biotechnological methods, PCR or DNA microarray technology rely on it.

DNA microarrays consist of regular spaced domains of surface-attached probe sequences, which act as binding sites for their complementary fluorescently-labeled target sequences in solution. The probe sequence and position of each domain on the surface is known and the amount of bound target DNA can be determined quantitatively. Microarrays are important in many biotechnological methods such as gene expression profiling, where complex target oligonucleotides mixtures need to be analyzed in a highly parallel manner [1-3].

Due to the very sensitive molecular recognition process of DNA, one is in principle able to detect even small sequence deviations with the help of DNA microarrays. However, DNA targets that are not perfectly complementary can also form duplexes with the surface bound probes, albeit are less stable than the perfectly matching correspondent (PM).

Although DNA microarrays are widely used in biological and biotechnological applications, the underlying physical mechanisms of the hybridization process are poorly understood. Data analysis is mostly based on empirical, statistical methods [4-6]. To fully exploit the potential of the DNA microarray technology, it is desirable to pursue a more fundamental approach to the stability of hybridized or partly hybridized strands. Molecular simulations have greatly increased our understanding of DNA dynamics, thermal fluctuations and hybridization. DNA hybridization and mechanical properties of DNA e.g. the persistence length in the presence of surfaces were investigated on the molecular level [7-9]. While molecular simulations give a very detailed view of the molecular dynamics, here we are interested in a simple scheme to assess the stability of bulged loops on a DNA microarray. Systematic experiments on short bulged loops have hardly been performed.

The standard model for hybridization in solution is the so called two-state-nearest-neighbor model (NN-Model), which treats the formation of the DNA duplex as a two-state process where the duplex is either fully hybridized or fully denatured [10,11]. The model calculates the binding free energy of a perfectly complementary double-stranded duplex by summing the nearest-neighbor interaction parameters (10 experimentally determined free energy parameters [12-14]). These parameters take into account, that DNA stability arises from hydrogen bonding and base stacking interactions. Furthermore it is possible to extend the model and include single base mismatch (MM) defect parameters [13]. This model proved very successful for the prediction of duplex melting temperature *T _{m }*in solution.

In several experiments, incorporated MMs have had a position dependent influence on the fluorescent signal. Zhang *et al*. suggested the position-dependent-nearest-neighbor-model (PDNN) [15] where the binding free energy of the duplex is calculated as a weighted sum of the nearest-neighbor parameters. The weight parameters are determined empirically.

In the past we experimentally and theoretically investigated the effect of single MMs on the duplex stability of a DNA microarray in the case where the lengths of probe and target match [16-18]. We have shown that a two state NN-model could not predict the MM binding affinities precisely. Therefore we developed a different theoretical approach, based on a double-ended molecular zipper [19-21]. The double-ended molecular zipper considers, that the duplex can only open from the ends. This simplification is justified because base pairs, which are located away from the duplex ends are less stable. This holds even if a single MM is incorporated into the duplex. Taking into account the heterogeneity of the binding affinities due to synthesis defects, the DNA microarray data could be reproduced with the model. We have shown that the double-ended zipper model maps to the PDNN model, while the former is derived from first principles [16]. The purpose of this study is to investigate the case where probe and target have unequal lengths and bulged loops form upon hybridization. Bulged loops are referred to as loops in the following. With our DNA microarray setup, loops of different lengths and at different positions can be obtained in a controlled manner by inserting additional bases into the perfectly matching probe sequence. The formation of loops increases the complexity of the hybridized state: new binding states between probe and target strands may emerge. We show that a good reproduction of the experimental data remains possible with the molecular zipper, but only if duplex opening can also occur at the loop position.

We use in-house synthesized DNA Microarrays. All employed protocols including the preparation of dendrimer-functionalized microarray substrates, the light-directed synthesis (a "maskless" photolithographic technique based on NPPOC-phosphoramidites), as well as the data analysis methods are provided in Naiser *et al*. [18]. The only difference to the previously published experimental setup is a more homogeneous illumination of the microarray surface as well as an increased resolution due to the improved optics.

To avoid target-target interaction and competitive hybridization effects, only one target species (see table table1)1) is employed in the hybridization experiments. Probes on the microarray surface are coupled to the surface with their 3'-end. Hybridization temperature is 317 K.

Images of the hybridized DNA microarray are taken for data analysis after thermal equilibrium is reached.

In order to test if the microarray surface has a significant influence on the hybridization, we repeat our experiments with the reversed probe sequence. The 5'-end of the sequence employed throughout this work corresponds to the 3'-end of the test sequence. No influence of the microarray surface on the hybridization could be detected (see additional file 1: Influence of the microarray surface on the hybridization signal).

To generate single-stranded DNA loops in the probe-target-duplexes, we introduce additional poly-T-sequences into the PM sequence. This is illustrated in Figure Figure1.1. The poly-T-sequence (black) is located between the red and the green parts of the probe strand. The green and red parts of the probe strand are complementary to the corresponding parts in the target strand. Upon hybridization, the black part forms a loop. By varying the length of the poly-T-sequence and the position at which the poly-T-sequence is introduced into the probe motif, generation of loops of different lengths and at different positions can be achieved.

In this way, poly-T-loops up to a length of 13 bases, at 20 different positions along the strand, amounting to 260 different probe sequences are generated during the *in situ *synthesis. To control the synthesis quality, 20 PM features are added. A single "feature block" consists of these 280 features organized as a square (see Figure Figure2).2). This feature block is synthesized 4 times on the microarray. Table Table22 lists the synthesized probe sequences.

We also synthesized probes with other loop sequences than the discussed poly-T. We investigated the influence of poly-C-sequences and random sequences on duplex stability as a function of loop length. The results are provided in additional file 2: Duplex stability of DNA duplexes with bulged loops of different sequences as a function of loop length. We didn't observe a significant change in the dependence of the fluorescent signal as a function of loop length as compared to the poly-T-sequences.

In order to determine the fluorescence intensities ("hybridization signals") of the microarray features from hybridized, fluorescently labeled target molecules, we take images of the DNA microarray surface with a fluorescent microscope. In Figure Figure2,2, we show such an image. A feature block (see Probe Design for definition) is surrounded by PM features. These PM features help control the illumination quality during synthesis and microscopic observation. For each feature inside the feature block, there are four corresponding PM features (green arrows). The average signal of these four PM features is used to correct the signal of the feature by normalizing the latter with respect to the average signal of the PM features. Synthesis-related illumination gradients can be - at least linearly - canceled out. To reduce experimental error, we reproduce the same feature block on the microarray at four different locations. To obtain the final data set, we take the average of the normalized signals of these four feature blocks. Error can be due to inhomogeneities of the microarray surface, fluorescent stains in the feature blocks or illumination gradients during the synthesis. The hybridization signals as a function of loop length and loop position of the final data set are shown in Figure Figure3.3. Hybridization temperature is 317 K. Strongest and weakest hybridization signals are normalized to 1 and to 0 respectively. For further details [18,22].

Figure Figure4a4a shows the dependence of the hybridization signal as a function of loop length averaged over all loop positions. The intensity of the PM is set to 1. We note a monotonic decrease of the signal with increasing loop length. The insertion of a single base already reduces the hybridization intensity to about 85% of the PM signal, 13 additional bases (largest number of additional bases under study) reduce the signal to about 60% of the PM signal. With a zipper, MMs in the middle of the duplex affect duplex stability most, because they are included in many of the possible states that are considered in the partition function. The employed probe strands are short compared to the length of DNA sequences used in other applications, which explains why the decrease in signal intensity after inserting a single additional base seems unusually strong.

Figure Figure4b4b shows the measured hybridization signals as a function of loop position in 3' to 5' direction after averaging over all loop lengths (PM signal set to 1). The resulting "loop position defect profile" is symmetric with respect to the center of the duplex. The signal is strongest for loops at the end, as well as in the middle of the duplex, it is weakest for loops at a distance of about 3-4 bases from the center. The difference between maximum and minimum is about 10% only. This is a weak variation compared to the hybridization signal as a function of loop length.

From the following arguments, the dependence of the fluorescent signal on loop position, can be understood at least qualitatively (see Figure Figure55):

Loops positioned close to either end of a duplex have less potential binding sites towards that end, and they can open to form a dangling end. However, in this case a large part of the duplex to the opposite side of the loop remains strongly bound (Figure (Figure5a5a).

Loops located at a center position have many possible binding partners to the left and to the right resulting in a closed loop and higher duplex stability (Figure (Figure5b5b).

In between both of the extremes above, the hybridization signal drops to a minimum. This is because on one side these loops have less binding partners than loops in the middle of the duplex. On the other, the large hybridized part is shorter than for loops occupying end positions (Figure (Figure5c5c).

At equilibrium single stranded probes *P *and target molecules *T *form a duplex *D *with a rate constant *k*_{+}, they denature with a rate constant *k*_{-}:

$$P+T\underset{{k}_{-}}{\overset{{k}_{+}}{\rightleftharpoons}}D$$

(1)

This process can be described with a Langmuir-type adsorption isotherm. Since targets were in excess in our experiments, the target concentration [*T*] = [*T*_{0}] is considered constant. The fraction of hybridized probes *θ*:

$$\theta =\frac{\left[D\right]}{\left[{P}_{0}\right]}=\frac{K\cdot \phantom{\rule{2.77695pt}{0ex}}\left[{T}_{0}\right]}{1+K\cdot \phantom{\rule{2.77695pt}{0ex}}\left[{T}_{0}\right]}$$

(2)

where *K *is the equilibrium binding constant of the probe-target duplex. Since the fluorescent signal of the array is proportional to the fraction of hybridized probes *θ*, we think of *θ *as the "hybridization signal" in the following.

The Langmuir-type adsorption isotherm (2) has a very narrow transition region from low to high binding affinity. Our experimental data from previous experiments exhibits a broadened transition region. As we have shown [16,17], this is due to the heterogeneity of binding affinities due to unavoidable sequence defects during the *in situ *synthesis. It is necessary to describe the situation with a distribution of binding constants *K _{i}*. Thus, the hybridization signal of an individual probe with random defects reads:

$${\theta}_{i}=\frac{{K}_{i}\cdot \phantom{\rule{2.77695pt}{0ex}}\left[{T}_{0}\right]}{1+{K}_{i}\cdot \phantom{\rule{2.77695pt}{0ex}}\left[{T}_{0}\right]}$$

(3)

Assuming that the synthesis defects follow a binominal distribution with a probability *p *that a defect occurs, the hybridization signal *θ *of a single feature is:

$$\theta =\sum _{{k}^{\prime}}{x}_{{k}^{\prime}}\cdot \frac{\sum _{i=1}^{\left(\begin{array}{c}\hfill {N}^{\prime}\hfill \\ \hfill {k}^{\prime}\hfill \\ \hfill \hfill \end{array}\right)}{\theta}_{i}}{\left(\begin{array}{c}{N}^{\prime}\\ {k}^{\prime}\\ \end{array}\right)}=\sum _{{k}^{\prime}}{x}_{{k}^{\prime}}\cdot \frac{\sum _{i=1}^{\left(\begin{array}{c}\hfill {N}^{\prime}\hfill \\ \hfill {k}^{\prime}\hfill \\ \hfill \hfill \end{array}\right)}\frac{{K}_{i}\cdot \phantom{\rule{2.77695pt}{0ex}}\left[{T}_{0}\right]}{1+{K}_{i}\cdot \phantom{\rule{2.77695pt}{0ex}}\left[{T}_{0}\right]}}{\left(\begin{array}{c}\hfill {N}^{\prime}\hfill \\ \hfill {k}^{\prime}\hfill \\ \hfill \hfill \end{array}\right)}$$

(4)

*N *' is the number of bases in the probe, *k*' the number of synthesis defects and *x _{k}*

$${x}_{{k}^{\prime}}=\left(\begin{array}{c}\hfill {N}^{\prime}\hfill \\ \hfill {k}^{\prime}\hfill \\ \hfill \hfill \end{array}\right)\cdot {p}^{{k}^{\prime}}\cdot {\left(1-p\right)}^{{N}^{\prime}-{k}^{\prime}}$$

(5)

To minimize computation time, synthesis defects are only considered up to a certain maximum number per strand ${k}_{max}^{\prime}$. The bases of the loop are treated synthesis defect free. Since the bases in the loop are, most of the time, only weakly or not at all bound (there are almost no complementary bases in the target strand), the consideration of synthesis defects in the loop is not necessary.

In our case, *N *' = 33, we took up to ${k}_{max}^{\prime}=3$ synthesis defects into account. This generates 6018 different probe sequences. Strands with more than 3 synthesis defects can be neglected (see additional file 3: Influence of the number of MMs on the fluorescent signal).

In the following, we calculate the binding constants *K _{i }*as a function of loop position

The partition function of the double-ended zipper model is [19-21]:

$${Z}_{D}=\sum _{k=0}^{N-1}\sum _{l=k+1}^{N}{\omega}_{k,l}=\sum _{k=0}^{N-1}\sum _{l=k+1}^{N}{e}^{\mathrm{\Delta}{G}_{k,l}^{\circ}\u2215RT}$$

(6)

Here, *N *is the number of NN pairs and *ω _{k,l }*is the statistical weight of the partially denatured state

$$\mathrm{\Delta}{G}_{k,l}^{\circ}=\left(\sum _{i=k}^{l}\mathrm{\Delta}{g}_{i}^{\circ}\right)+\mathrm{\Delta}{g}_{init}$$

(7)

Δ*g _{init }*= -4.5 kcal/mol is the duplex initialization free energy [17]. For the binding constant

Figure Figure66 illustrates the double-ended zipper model and the corresponding notation. The duplex is hybridized between the zipper forks at positions *k *and *l*. This corresponds to the free energy $\mathrm{\Delta}{G}_{k,l}^{\circ}$. Duplex opening and closing occurs only at the ends indicated by the black arrows left and right to the duplex. In Figure Figure6b,6b, a single MM is incorporated into the duplex.

We have shown that it is sufficient to include MM defect parameters into a zipper model to account for single base defects [17]. In the following, we test this simple model for the case of loops. For single stranded DNA loops we calculate purely entropic energy penalties by treating the DNA loop as a self-avoiding random walk (SAW) on a lattice. Since duplex opening can only occur from the ends and therefore the DNA loops are always closed, only SAWs which return to the origin need to be considered. For the number of SAWs of length *l *returning to the origin in the limit *l *→ ∞ [23,24]:

$${\#}_{origin}\left(l\right)\propto \sigma \cdot \frac{{\mu}^{l}}{{l}^{c}}$$

(8)

*σ *= 1, 75 · 10^{-4 }is the so-called cooperativity parameter, *μ *is the connectivity constant and *c *= 2, 15 is the loop closure exponent. *σ *and *c *are universal constants whereas *μ *(*μ *= 4, 684 used here) depends on the considered geometry.

For the total number of SAWs of length *l *of all possible SAW configurations [23,25]:

$${\#}_{total}\propto {\mu}^{l}\cdot {l}^{\gamma -1}$$

(9)

*γ *= 1, 157 ± 3.10^{-3 }is the (universal) entropic exponent. That gives us the probability *ρ *that a SAW of length *l *returns to the origin:

$$\rho \left(l\right)=\frac{{\#}_{origin}}{{\#}_{total}}\propto \frac{\sigma \cdot \frac{{\mu}^{l}}{{l}^{c}}}{{\mu}^{l}\cdot {l}^{\gamma -1}}$$

(10)

Given *ρ*(*l*), we can calculate the entropy *S*(*l*) and the corresponding loop energy penalties Δ*G _{entropy}*(

$$\begin{array}{ll}\hfill \phantom{\rule{1em}{0ex}}S\left(l\right)\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{0.3em}{0ex}}& ={N}_{A}\cdot {k}_{B}\cdot ln\left[\rho \left(l\right)\right]\phantom{\rule{2em}{0ex}}\\ \hfill \Rightarrow \mathrm{\Delta}{G}_{entropy}\left(l\right)& =-T\cdot S\left(l\right)\phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$

(11)

The length of a DNA loop is determined by the number of bases *L *in the loop and the distance *a*_{0 }between two adjacent bases. The length of a random walk is the number of steps from start to end on a lattice with the lattice parameter *p*_{0}. When treating a DNA loop as a SAW, one has to consider the persistence length of single stranded DNA, which determines the number of steps in the SAW and defines the lattice parameter *p*_{0}. Since *p*_{0 }and *a*_{0 }rank in the same dimension depending on the salt concentration [26-28], we take a DNA loop of length *L *· *a*_{0 }as a SAW with *L *steps on a lattice with the lattice parameter *p*_{0 }≈ *a*_{0 }(salt concentration is 0.90 M NaCl and 50 mM NaH_{2}PO_{4}). Moreover, we test the influence of *p*_{0 }on the absolute loop energy penalties values Δ*G _{entropy}*. This shows that the differences are negligible. This means:

$$\mathrm{\Delta}{G}_{entropy}\left(L\right)=-T\cdot S\left(L\right)$$

(12)

where *L *ranges from 1 to 13.

Figure Figure77 shows the comparison between our experimental signals and predictions with the simple zipper model. a) Hybridization signals as a function of loop lengths averaged over all loop positions. b) Hybridization signals as a function of loop position averaged over all loop lengths. The symbols indicate the signals of each feature block, the solid black line is the average signal of all feature blocks and the red solid line represents the theoretically predicted signals. The experimental results cannot be reproduced based on the simple zipper.

So far duplex opening was only possible from the ends of the duplex. States, in which the duplex zips at the loop position, are essential for the correct reproduction of our experimental results.

The partition function of a duplex *Z _{on }*(

*Z _{zipper }*(

*Z _{extended,right }*(

*Z _{extended,left }*(

*Z _{extended,right }*(

*Z _{double zipper }*(

*Z _{non-canonical }*(

In the full expression for *Z _{extended,right }*the summation of all possible binding configurations between both strands right of the loop position depends on the zipping state

$$\begin{array}{c}{Z}_{extended,right}\left(P,L\right)=\sum _{n=2}^{N-P+1}\sum _{i=P}^{N+L-n+1}\sum _{j=P}^{N-n+1}{\omega}_{n,i,j}\\ \mathsf{\text{with}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\omega}_{n,i,j}=e\frac{\mathrm{\Delta}{G}_{n,i,j}+\mathrm{\Delta}{G}_{entropy,right}+\mathrm{\Delta}{G}_{left}}{RT}\\ \mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\mathrm{\Delta}{G}_{n,i,j}=\sum _{r=1}^{n-1}\mathrm{\Delta}{g}_{r}^{ij}\\ \end{array}$$

(13)

*i *and *j *mark the positions of the zipper forks in probe and target respectively, *n *bases of probe and target strand (*n *- 1 NN pairs) of the region to the right of *i *and *j *are hybridized. Thus, Δ*G _{n,i,j }*is the NN energy of (

The free energy of the duplex part left of the loop position Δ*G _{left }*is approximated using the zipper model (6). Since Δ

$$\begin{array}{c}\mathrm{\Delta}{G}_{left}=\mathrm{\Delta}{G}_{left}\left(P\right)=RT\cdot ln\left[\sum _{k=0}^{P-2}\sum _{l=k+1}^{P-1}{\omega}_{k,l}\right]\\ \mathsf{\text{with}}\phantom{\rule{1em}{0ex}}{\omega}_{k,l}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{defined}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{as}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{before}}\mathsf{\text{.}}\\ \end{array}$$

(14)

We calculate the binding constants

$$\begin{array}{ll}\hfill {K}_{i}\left(P,L\right)& ={Z}_{zipper}+{Z}_{extended,right}\phantom{\rule{2em}{0ex}}\\ \hfill & +{Z}_{extended,left}-{Z}_{double\phantom{\rule{2.77695pt}{0ex}}zipper}\phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$

(15)

with and without our approximation (13) for *Z _{extended,right}*(

$${\theta}_{i}=\frac{C\cdot {K}_{i}\cdot \left[{T}_{0}\right]}{1+C\cdot {K}_{i}\cdot \left[{T}_{0}\right]}$$

(16)

Figure Figure99 shows that our approximation for *Z _{extended,right }*(13) and for

In the extended model, it is possible that loops start at some origin $\overrightarrow{0}$ and end at position $\overrightarrow{r}$. Now we obtain for two SAWs with *M*_{1 }and *M*_{1 }steps:

$$\begin{array}{ll}\hfill \mathrm{\Delta}{G}_{entropy,right}& =\mathrm{\Delta}{G}_{entropy,right}\left({M}_{1},{M}_{2}\right)\phantom{\rule{2em}{0ex}}\\ \hfill & =-{k}_{B}T\cdot ln\left[\rho \left({M}_{1},{M}_{2}\right)\right]\phantom{\rule{2em}{0ex}}\\ \hfill \mathsf{\text{with}}\phantom{\rule{1em}{0ex}}{M}_{1}& =i-P\phantom{\rule{2em}{0ex}}\\ \hfill \mathsf{\text{and}}\phantom{\rule{1em}{0ex}}{M}_{2}& =j-P\phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}& \hfill \text{(5)}\end{array}$$

(17)

*ρ*(*M*_{1}, *M*_{2}) is the probability that two SAWs with number of steps *M*_{1 }and *M*_{2 }respectively start at the origin and meet again. Here, we have:

$$\begin{array}{c}\rho \left({M}_{1},{M}_{2}\right)=\sum _{\overrightarrow{r},{\overrightarrow{r}}^{\prime}}\delta \left(\overrightarrow{r}-{\overrightarrow{r}}^{\prime}\right)\cdot \frac{\#\left({M}_{1},\overrightarrow{r}\right)\cdot \#\left({M}_{2},{\overrightarrow{r}}^{\prime}\right)}{{\#}_{total}\left({M}_{1}\right)\cdot {\#}_{total}\left({M}_{2}\right)}\\ =\sum _{\overrightarrow{r}}\frac{\#\left({M}_{1},\overrightarrow{r}\right)\cdot \#\left({M}_{2},\overrightarrow{r}\right)}{{\#}_{total}\left({M}_{1}\right)\cdot {\#}_{total}\left({M}_{2}\right)}\\ \mathsf{\text{with}}\phantom{\rule{1em}{0ex}}\left|\overrightarrow{r}\right|\le min\left({M}_{1},{M}_{2}\right)\\ \end{array}$$

(18)

$\#\left({M}_{i},\overrightarrow{r}\right)$ is the number of SAWs with *M _{i }*steps which start at the origin and end at position $\overrightarrow{r}$. In 3D [23]:

$$\begin{array}{c}\#\left({M}_{i},\overrightarrow{r}\right)\propto {\mu}^{{M}_{i}}\cdot {M}_{i}^{\gamma -1-3\nu}\cdot g\left(\frac{r}{{M}_{i}^{\nu}}\right)\\ \mathsf{\text{with}}\phantom{\rule{1em}{0ex}}g\left(x\right)\propto {x}^{\phi}\cdot {e}^{-\lambda {x}^{\delta}}\mathsf{\text{,}}\lambda 0\mathsf{\text{,}}\delta =\frac{1}{1-\nu}\\ \mathsf{\text{and}}\phantom{\rule{1em}{0ex}}\phi =\frac{\gamma -1}{\nu}\\ \end{array}$$

(19)

Constants *γ *and *μ *are defined as before. *ν *= 0, 588 ± 1, 5 · 10^{-3 }is the (universal) metric exponent. #* _{total}*(

In an analogous manner, *Z _{extended,left }*is calculated:

$$\begin{array}{c}{Z}_{extended,left}\left(P,L\right)=\sum _{n=2}^{P}\sum _{i=0}^{L+P-n}\sum _{j=0}^{P-n}{\omega}_{n,i,j}\\ \mathsf{\text{with}}\phantom{\rule{1em}{0ex}}{\omega}_{n,i,j}={e}^{\frac{\mathrm{\Delta}{G}_{n,i,j}+\mathrm{\Delta}{G}_{entropy,left}+\mathrm{\Delta}{G}_{right}}{RT}}\\ \mathsf{\text{and}}\phantom{\rule{1em}{0ex}}\mathrm{\Delta}{G}_{n,i,j}=\sum _{r=1}^{n-1}\mathrm{\Delta}{g}_{r}^{ij}\\ \end{array}$$

(20)

Here we have

$$\begin{array}{cc}\hfill \mathrm{\Delta}{G}_{right}\hfill & \hfill =\mathrm{\Delta}{G}_{right}\left(P\right)\hfill \\ \hfill \hfill & \hfill =RT\cdot ln\left[\sum _{k=P}^{N-1}\sum _{l=k+1}^{N}{\omega}_{k,l}\right]\hfill \\ \hfill \hfill \end{array}$$

(21)

and finally

$$\begin{array}{ll}\hfill \mathrm{\Delta}{G}_{entropy,left}\left({M}_{1},{M}_{2}\right)& =-{k}_{B}T\cdot ln\left[\rho \left({M}_{1},{M}_{2}\right)\right]\phantom{\rule{2em}{0ex}}\\ \hfill \mathsf{\text{with}}\phantom{\rule{1em}{0ex}}{M}_{1}& =L+P-n-i\phantom{\rule{2em}{0ex}}\\ \hfill \mathsf{\text{and}}\phantom{\rule{1em}{0ex}}{M}_{2}& =P-n-j\phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$

(22)

For *Z _{double zipper}*(

$$\begin{array}{ll}\hfill {Z}_{double\phantom{\rule{2.77695pt}{0ex}}zipper}\left(P,L\right)& =\sum _{k,l}\sum _{o,p}{\omega}_{k,l,o,p}\phantom{\rule{2em}{0ex}}\\ \hfill \mathsf{\text{with}}\phantom{\rule{1em}{0ex}}{\omega}_{k,l,o,p}& ={e}^{\frac{\mathrm{\Delta}{G}_{k,l}+\mathrm{\Delta}{G}_{entropy,double}+\mathrm{\Delta}{G}_{o,p}}{RT}}\phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$

(23)

And Δ*G _{entropy,double}*:

$$\begin{array}{ll}\hfill \mathrm{\Delta}{G}_{entropy,double}\left({M}_{1},{M}_{2}\right)& =-{k}_{B}T\cdot ln\left[\rho \left({M}_{1},{M}_{2}\right)\right]\phantom{\rule{2em}{0ex}}\\ \hfill \mathsf{\text{with}}\phantom{\rule{1em}{0ex}}{M}_{1}& =o-l+L-1\phantom{\rule{2em}{0ex}}\\ \hfill \mathsf{\text{and}}\phantom{\rule{1em}{0ex}}{M}_{2}& =o-l-1\phantom{\rule{2em}{0ex}}\\ \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$

(24)

In the case where probe and target length match, duplex zipping can only occur if the two strands are perfectly aligned. We consider the initiation energy, the entropic barrier to meet this constraint, as constant. We simply write *K _{i }*=

In the case of duplexes with loops, the probe-target length difference Δ*L *increases the possible conformations of the probe strand, that do not promote duplex initiation. The initiation energy changes accordingly. Neglecting unfolding of the coils for duplex formation, the number of pairing collisions, that do no lead to zipping, grows linearly with Δ*L*, resulting in an initiation entropy change

$$\mathrm{\Delta}{S}_{init}\propto ln\left(1+\frac{\mathrm{\Delta}L}{{L}_{0}}\right)$$

(25)

*L*_{0 }is the characteristic length of the problem, which is the persistence length (in our experimental conditions this corresponds to a single base). In the case of a short, loop-forming sequence located in the center of the strand, however, there are two positions, where parallel but shifted probe and target strands can initiate duplex formation. These positions correspond to the matching sequence left and the right from the loop implying a correction of Δ*L*/*L*_{0 }by 1/2. However, if the loop forms towards the ends, we are close to the situation of a single strand above. In the following we neglect this dependence on loop position and use a factor 1/2 throughout. Either factor (1 or 1/2) does not drastically modify our result, if the factor *C *is adjusted accordingly.

Our approximation for Δ*S _{init }*tends to overestimate the corresponding initiation energy penalty as Δ

From (25), we get the modified binding constant *K _{i}*:

$${K}_{i}\left(P,L\right)=\frac{{Z}_{D}\left(P,L\right)}{1+\frac{\mathrm{\Delta}L}{2{L}_{0}}}$$

(26)

The calculation of the hybridization signal is then straight forward.

We note, that the choice of the denominator of equation (26) following from (25) has an impact on the calculated hybridization signals. Our theory could possibly be improved by choosing a different denominator which, however, may be a subtle problem by itself, not the scope of this paper.

Figure Figure1010 shows the comparison between our experimental results and theory. a) Hybridization signals as a function of loop length for one specific loop position. b) Hybridization signals as a function of loop position for one specific loop length. In the figures to the right, we give the 95% confidence intervals for our data points (black) and compare them to our theory (red). This shows that the experimentally observed trends and the reproduction with our model are statistically relevant. The different symbols indicate the signals of the different feature blocks as a function of loop position or length, the solid black line is the experimental average and the red solid line represents the theoretical predictions.

To make the signal dependence on loop length clearer, we present the hybridization signals averaged over all loop positions as a function of loop length and compare them to the predicted signals (upper part of Figure Figure11).11). The lower part of the same figure shows the signal dependence as a function of loop position after averaging over all loop lengths. The symbols represent the signals of the feature blocks, the solid black line the average signal of all feature blocks and the solid red line represents the predicted signals.

Figure Figure1010 and and1111 show that the model reproduces our experimental findings well. Parameters used here were: simulation temperature *T _{sym }*= 317 K, synthesis error rate

We note that the partition function *Z _{double zipper}*(

Small differences between theoretical and experimental results regarding the signal dependence on loop position can be explained by the particularities of the duplex sequence under study. Here we look at two differences:

region ranging from loop position 14 to 18: this duplex region has many A/T bases and the distance between two C bases is the largest for the whole sequence. The duplex destabilization of an A/T rich region may be underestimated.

loop position 21: the region has many C bases and the loop bases are inserted after two existing C bases. It has been shown [22,29,30], that degenerated base pairs may reinforce binding considerably. Stabilization by degenerated base pairs is not included in our theory.

Although there are differences between experiment and theory, the deviations are small (see Figure Figure1010 and and11).11). An even better agreement could be obtained by choosing a different dependence of the duplex initiation energy on Δ*L*. Our approximation for it (see above) only holds for short Δ*L*and we suppose the systematic deviation visible in Figure 11a from theory and experiment to originate from our approximation. As expected, at longer Δ*L*, we tend to underestimate the binding constant. To our knowledge, although an often encountered problem, no simple scheme to assess the initiation energy is known. Working out the dependence of the initiation energy between the two regimes discussed above (short and very long Δ*L*) is beyond the scope of this paper. Molecular simulations could help to provide better understanding of the nucleation process [7].

In literature, internal DNA loops or bubbles of total length *l *= *l*_{1 }+ *l*_{2 }e.g. occurring in DNA denaturation experiments are often treated as SAWs of the same length returning to their origin (*l*_{1 }: unbound bases in probe; *l*_{2 }: unbound bases in target) [24]. Reproduction of our experimental data could not be achieved when the calculation is done in this way, because the calculated loop energy penalties were much too large. Treating a DNA loop as a SAW of length *l *= *l*_{1 }+ *l*_{2 }returning to the origin is different from calculating the probability that two SAWs of given lengths *l*_{1 }and *l*_{2 }start at the same point and meet again at some distance. In the first case, the number of possible conformations is much higher because the constraint is weakened to any pair ${l}_{1}^{\prime},{l}_{2}^{\prime}$ with ${l}_{1}^{\prime}+{l}_{2}^{\prime}=l$, not just the given *l*_{1}, *l*_{2}. The first case could give the same same results if the calculation is done under the constraint that the loop of length *l *= *l*_{1 }+ *l*_{2 }reaches the position $\overrightarrow{r}$ where the two loops reunite after *l*_{1 }steps similar to the way described in [31].

This may not always matter so much: the length of the probe sequences used throughout this study is much shorter than the length of DNA strands used in DNA denaturation experiments. Since the free energy of a short DNA strand is small, the size of the loop energy penalties is more crucial.

In this paper we investigated the stability of DNA with a bulged loop. We inserted additional thymine bases into the surface-bound PM motif at a given position. By hybridizing DNA oligonucleotide targets onto the DNA microarray, bulged loops of different length and at different positions along the DNA duplex are formed.

We find that duplex stability decreases monotonically with the length of the bulged loop. Moreover, if the position of the bulged loop on the probe strand is varied, duplex stability exhibits a symmetric variation with respect to the center of the duplex. Duplex stability is highest for end- and middle-positions of the inserted bulged loop. For theoretical prediction we have shown that it is necessary and sufficient to consider strand opening at the position of the bulged loop. We have elaborated a successful approximation for the partition function of these new binding states. The signal dependence on loop length and on loop position could be reproduced with a limited amount of computing time (see Figure Figure1111).

The employed NN free energy parameters from [12] are based on solution hybridization experiments. However, as we show in this study and in a previous paper [17], these parameters can be used to describe microarray hybridization well. The corresponding loop energy penalties can be obtained by considering the bulged loops as a self-avoiding walk on a lattice.

In our simulation, we use just two free parameters:

*C *= 1.5 · 10^{-3}: scaling factor, which fits the calculated binding constants to the fluorescent light intensities. This parameter cannot be avoided.

*p *= 0.084: probability of a synthesis related defect. In a previous work, the value of *p *was determined to *p *= 0, 1. In our improved experimental setup, we have less stray light and a better resolution which result in a better synthesis quality (see Methods). Therefore, we chose *p *to be a free parameter. *p *is obtained as 0.084 in good agreement with the coupling and deprotection efficiency of the employed oligonucleotides and the achievable contrast of the optical setup [32-34]. Given this knowledge, p is not completely free and the resulting value is used to check the consistency of our theory.

The formation of bulged loops is an important aspect that needs to be considered when analyzing DNA microarray data or DNA hybridization of complex mixtures in general. Partly non-complementary sequences can form stable complementary duplexes through formation of a bulged loop resulting in false positive signals. The investigation of these bulged loop structures is therefore necessary to gain a deeper understanding of DNA hybridization and to make DNA microarrays and other, nucleic acid based high throughput technology based on DNA hybridization more reliable and accurate.

CT carried out the experiments and statistical data analysis, computational modeling and drafted the manuscript. MS helped to draft the manuscript. AO conceived of the study, and participated in its design and coordination and aided in drafting the manuscript. All authors read and approved the final manuscript.

**Influence of the microarray surface on the hybridization signal**. We test the influence of the microarray surface on the hybridization signal by synthesizing probes with reversed sequence (3'-CATTACAACAACCATTAATACTCATCATAACTT-5'). The 5'-end of the sequence employed throughout this work corresponds to the 3'-end of the reversed sequence. No significant influence of the surface can be detected.

Click here for file^{(68K, PDF)}

**Duplex stability of DNA duplexes with bulged loops of different sequences as a function of loop length**. Instead of the discussed poly-T loop sequences, we synthesize probes containing poly-C loop sequences and random loop sequences respectively at three different positions (the number of additional bases vary from one to thirteen; the random loop sequences are listed in table b) of this file). Upon hybridization with the target sequence listed in table table1,1, we note a monotonic decrease of the fluorescent signal as a function of loop length. After averaging over all loop positions, we compare the experimental signals as a function of loop length with the model predictions. We show that the experimental data is reproduced by our theory.

Click here for file^{(92K, PDF)}

**Influence of the number of MMs on the fluorescent signal**. In order to reproduce our experimental data, it is sufficient to consider up to 3 synthesis-related defects in the zipper model. We confirm this by measuring the fluorescent signals of probes with 1 to 4 MMs. MMs are incorporated into the PM probe motif at 8 given positions resulting in 162 different probe sequences. To generate the MMs, we replace the bases at these specific positions with a thymine base (or with an adenine base, if a thymine base is already present at the specific position). After categorizing the probes into groups according to their number of MMs, we calculate the average signal of each group and plot it against the number of MMs (PM signal is set to 1, background signal is set to 0). Based on this data, we can estimate the error caused by neglecting probes with more than 3 synthesis defects: the error 4%, smaller than the experimental error.

Click here for file^{(86K, PDF)}

**Hybridization signals resulting from Z _{double zipper }and comparison to Z_{extended,right }+ Z_{extended,left}**. We compare the calculated hybridization signals resulting from

Click here for file^{(129K, PDF)}

The authors thank Jona Kayser for many helpful discussions on this work. Our research was supported by the University of Saarland.

- Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM. Expression profiling using cDNA microarrays. Nat Genet Suppl. 1999;21 [PubMed]
- Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays. Nat Genet. 1999;21(1 Suppl):33–37. [PubMed]
- Stoughton RB. Applications of DNA Microarrays in Biology. Annu Rev Biochem. 2005;74:53–82. doi: 10.1146/annurev.biochem.74.082803.133212. [PubMed] [Cross Ref]
- Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M. Minimum information about a microarray experiment (MIAME) - towards standards for microarray data. Nat Genet. 2001;29:365–371. doi: 10.1038/ng1201-365. [PubMed] [Cross Ref]
- Dharmadi Y, Gonzalez R. DNA Microarrays: Experimental Issues, Data Analysis, and Application to Bacterial Systems. Biotechnol Prog. 2004;20:1309–1324. doi: 10.1021/bp0400240. [PubMed] [Cross Ref]
- King HC, Sinha AA. Gene Expression Profile Analysis by DNA Microarrays: Promise and Pitfalls. JAMA. 2001;286(18) (Reprinted) [PubMed]
- Sambriski EJ, Ortiz V, de Pablo JJ. Sequence effects in the melting and renaturation of short DNA oligonucleotides: structure and mechanistic pathways. J Phys: Condens Matter. 2009;21 034105 [PubMed]
- Ambia-Garrido J, Vainrub A, Pettitt M. A model for structure and thermodynamics of ssDNA and dsDNA near a surface: A coarse grained approach. Computer Physics Communications. 2010;181:2001–2007. doi: 10.1016/j.cpc.2010.08.029. [PMC free article] [PubMed] [Cross Ref]
- Knotts TA, Rathore N, Schwartz DC, de Pablo JJ. A coarse grain model for DNA. The Journal of Chemical Physics. 2007;126, 084901 [PubMed]
- Crothers DM, Zimm BH. Theory of the Melting Transition of Synthetic Polynucleotides: Evaluation of the Stacking Free Energy. J Mol Bio. 1964;9:1–9. doi: 10.1016/S0022-2836(64)80086-3. [PubMed] [Cross Ref]
- Breslauer KJ, Franks R, Blockers H, Marky LA. Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci USA. 1986;83:3746–3750. doi: 10.1073/pnas.83.11.3746. [PubMed] [Cross Ref]
- SantaLucia J. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA. 1998;95:1460–1465. doi: 10.1073/pnas.95.4.1460. [PubMed] [Cross Ref]
- SantaLucia J, Hicks D. The Thermodynamics of DNA Structural Motifs. Annu Rev Biophys Biomol Struct. 2004;33:415–40. doi: 10.1146/annurev.biophys.32.110601.141800. [PubMed] [Cross Ref]
- Borer PN, Dengler B, Tinoco I, Uhlenbeck OC. Stability of Ribonucleic acid Double-stranded Helices. J Mol Bio. 1974;86:843–853. doi: 10.1016/0022-2836(74)90357-X. [PubMed] [Cross Ref]
- Zhang L, Miles MF, Aldape KD. A model of molecular interactions on short oligo-nucleotide microarrays. Nature Biotechnology. 2003;21(7) [PubMed]
- Naiser T, Kayser J, Mai T, Michel W, Ott A. Position dependent mismatch discrimination on DNA microarrays - experiments and model. BMC Bioinformatics. 2008;9:509. doi: 10.1186/1471-2105-9-509. [PMC free article] [PubMed] [Cross Ref]
- Naiser T, Kayser J, Mai T, Michel W, Ott A. Stability of a Surface-Bound Oligonucleotide Duplex Inferred from Molecular Dynamics: A Study of Single Nucleotide Defects Using DNA Microarrays. Phys, Rev Lett. 2009;102, 218301 [PubMed]
- Naiser T, Ehler O, Kayser J, Mai T, Michel W, Ott A. Impact of point-mutations on the hybridization affinity of surface-bound DNA/DNA and RNA/DNA oligonucleotide-duplexes: Comparison of single base mismatches and base bulges. BMC Bioinformatics. 2008. [PMC free article] [PubMed]
- Gibbs J, DiMarzio E. Statistical Mechanics of Helix-Coil Transitions in Biological Macromolecules. The Journal of Chemical Physics. 1959;30(1)
- Kittel C. Phase Transition of a Molecular Zipper. American Journal of Physics. 1969;37(9)
- Deutsch JM, Liang S, Narayan O. Modeling of microarray data with zippering. arXiv:q-bio/0406039v1. 2004.
- Naiser T. PhD thesis. Universität Bayreuth; 2007. Characterization of Oligonucleotide Microarray Hybridization.
- Vanderzande C. Lattice models of polymers. Cambridge University Press; 1998.
- Blossey R, Carlon E. Reparametrizing the loop entropy weights: Effect on DNA melting curves. Phys, Rev Lett. 2003;E 68, 061911 [PubMed]
- Randall D, Sinclair A. Self-testing algorithms for self-avoiding walks. Journal of Mathematical Physics. 2000;41(3)
- Tinland B, Pluen A, Sturm J, Weill G. Persistence Length of Single-Stranded DNA. Macromolecules. 1997;30:5763–5765. doi: 10.1021/ma970381+. [Cross Ref]
- Murphy MC, Rasnik I, Cheng W, Lohman TM, Ha T. Probing Single-Stranded DNA Conformational Flexibility Using Fluorescence Spectroscopy. Biophysical Journal. 2004;86:2530–2537. doi: 10.1016/S0006-3495(04)74308-8. [PubMed] [Cross Ref]
- Rechendorff K, Witz G, Adamcik J, Dietler G. Persistence length and scaling properties of single-stranded DNA adsorbed on modified graphite. The Journal of Chemical Physics. 2009;131, 095103 [PubMed]
- Ke SH, Wartell RM. Influence of Neighboring Base Pairs on the Stability of Single Base Bulges and Base Pairs in a DNA Fragmentt. Biochemistry. 1995;34:4593–4600. doi: 10.1021/bi00014a012. [PubMed] [Cross Ref]
- Znosko BM, Silvestri SB, Volkman H, Boswell B, Serra MJ. Thermodynamic Parameters for an Expanded Nearest-Neighbor Model for the Formation of RNA Duplexes with Single Nucleotide Bulges. Biochemistry. 2002;41:10406–10417. doi: 10.1021/bi025781q. [PubMed] [Cross Ref]
- Hanke A, Ochoa MG, Metzler R. Denaturation Transition of Stretched DNA. Phys, Rev Lett. 2008;100, 018106 [PubMed]
- Nuwaysir EF, Huang W, Albert TJ, Singh J, Nuwaysir K, Pitas A, Richmond T, Gorski T, Berg JP, Ballin J, McCormick M, Norton J, Pollock T, Sumwalt T, Butcher L, Porter D, Molla M, Hall C, Blattner F, Sussman MR, Wallace RL, Cerrina F, Green RD. Gene Expression Analysis Using Oligonucleotide Arrays Produced by Maskless Photolithography. Genome Res. 2002;12:1749–1755. doi: 10.1101/gr.362402. [PubMed] [Cross Ref]
- Naiser T, Mai T, Michel W, Ott A. A versatile maskless microscope projection photolithography system and its application in light-directed fabrication of DNA microarrays. Rev Sci Instrum. 2006;77, 063711
- Garland PB, Serafinowski PJ. Effects of stray light on the Fidelity of photodirected oligonucleotide array synthesis. Nucleic Acid Research. 2002;30(19) e99. [PMC free article] [PubMed]

Articles from BMC Biophysics are provided here courtesy of **BioMed Central**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |