DNA often occurs in an underwound, negatively superhelical topological state
in vivo. In bacteria, gyrase enzymes act to generate negative supercoils, while topoisomerases dissipate them. The dynamic balance between these two processes determines a basal level of superhelicity that can change according to the environmental or nutritional state of the organism
[1]. In addition, RNA polymerase translocation leaves a wake of negative supercoils and generates a bow wave of positive supercoils
[2]–
[4]. Together these effects induce substantial amounts of superhelicity in the topological domains into which bacterial genomes are subdivided. A variety of regulatory processes in prokaryotes, including the initiation of transcription from specific genes, are known to vary with the level of superhelicity experienced by the DNA involved
[5].
It has long been thought that unconstrained superhelicity was not a factor in eukaryotic genomic regulation. Eukaryotes do not commonly have negatively supercoiling gyrases while they do have relaxing topoisomerases. Also nucleosomal winding both stabilizes supercoils and could inhibit the transmission of unconstrained superhelicity. However, it is now known that substantial amounts of transcriptionally induced negative superhelicity occur upstream (i.e. 5′) of RNA polymerases in the human genome
[6],
[7]. A superhelix density of

is achieved there by a single transcriptional initiation event, while divergently oriented transcription can produce superhelix densities of

in the region between the polymerase complexes. This superhelicity extends over at least kilobase distances, hence must be transmitted either through or around nucleosomes. Kinetically, this transcription driven superhelicity is generated faster than topoisomerases act to relieve it, so it abides long enough to be able to affect subsequent regulatory processes.
The levels of negative superhelicity achieved in both prokaryotes and eukaryotes are sufficient to drive
in vivo structural transitions to alternative DNA conformations
[7],
[8]. The most studied DNA transition is superhelically induced duplex destabilization (SIDD), which facilitates or creates local sites of strand separation. SIDD has been implicated in a wide variety of regulatory processes, including the initiation of transcription from specific promoters in both prokaryotes and eukaryotes
[9]–
[17].
Here we focus on the transition from B-form to Z-form, a left-handed double helix. When the discovery of Z-DNA was announced this transition was predicted to occur at physiologically attained levels of negative superhelicity
[18]–
[20]. Z-DNA has been experimentally detected at inserted Z-susceptible sites in bacterial genomic DNA both
in vitro and
in vivo [21]–
[26].
The study of alternate DNA structures in eukaryotes is more challenging, in part because DNA superhelicity in these organisms seems not to be stable, but rather is a transient state driven by transcriptional activity. However, there is substantial indirect evidence that Z-DNA also can occur
in vivo in eukaryotes. Z-DNA has been implicated in a variety of regulatory events relating to replication, transcription, recombination, and other biological processes
[27]. For example, it has been shown that the negative torsional stress induced by polymerase translocation during transcription can stabilize Z-DNA near transcription start sites
[28]. The amount of Z-DNA found in these experiments was directly related to transcriptional activity, and thus to the level of transcription-driven superhelicity. Another set of experiments studied the formation of Z-DNA in the 5′ flank of the human c-
myc gene
[29],
[30]. Three Z-susceptible regions were identified near the promoters of this gene. These experimental results suggest that the regions involved transform to Z-form during c-
myc transcription, but revert to B-form when transcription is inhibited. These experiments indicate that transcriptionally driven superhelical stresses can drive B-Z transitions in mammalian cells.
Many attempts have been made to identify proteins that bind selectively to Z-DNA. A powerful method developed by Herbert
[31] led to the isolation of double-stranded RNA adenosine deaminase (ADAR1)
[32], a Z-DNA binding enzyme, as well as other Z-binding proteins. It has been shown that E3L, a Z-DNA binding protein found in poxviruses, inhibits the host cell's ability to perform transcription or mount an anti-viral response when it is bound to Z-DNA near transcription start sites
[33]. On this basis it was suggested that an inhibitor of E3L binding might protect against poxviral infection. Although there are some indications that Z-binding proteins may be involved in gene regulation, this remains an active area of research
[27].
The Z-form helix has dinucleotide repeat units, one of which must be in the
syn- and the other in the
anti-conformation, with helicity of −12 base pairs per turn
[34]. (The minus sign indicates the left-handedness of the helix.) The free energy required for the B-Z transition under low salt conditions has been determined for each of the ten dinucleotides
[21],
[35]–
[39]. The Z-form is energetically most accessible for certain alternating purine-pyrimidine sequences, the most favored being

, with guanine in the

and cytosine in the

conformations. Z-formation has also been observed in

sequences, although transitions there are almost twice as costly as at GC runs. The remaining alternating purine/pyrimidine sequence,

, has a very high transition energy and is not normally found in Z-form. Perturbations which break the purine/pyrimidine alternation, although energetically costly, have also been observed in Z-DNA, as will be discussed below. The substantial nucleation energy for initiating a run of Z-DNA, which may be regarded as the cost of generating two junctions between B-form and Z-form, also has been determined
[21],
[40].
Soon after the discovery of Z-DNA several simple theoretical analyses of superhelical B-Z transitions were developed. These all assumed the simplest conditions of a single, uniformly Z-susceptible site embedded in an entirely Z-resistant background. The first such analysis simply predicted that physiological levels of negative superhelicity could drive B-Z transitions
[18]. This approach was subsequently used to investigate the basic properties of these transitions, and to assess how the B-Z transition might compete with others in simple paradigm cases
[19],
[36],
[41]–
[43]. Finally, these simple theoretical approaches were applied to determine the energy parameters of the transition from experiments in which a single uniform insert (commonly

) placed within a superhelical plasmid was observed to undergo transition
[21],
[36],
[40].
In this paper we present the first method to analyze the superhelical B-Z transition in its full complexity. This method, which we call SIBZ, can calculate the B-Z transition behavior of multi-kilobase length genomic DNA sequences under superhelical stress. It specifically includes the competition for transition among all sites within the sequence. SIBZ analyzes the states available to the entire sequence, where each base can be found in either the B-conformation or as a part of a Z-form dinucleotide pair. It then uses statistical mechanics to determine the equilibrium distribution among these states. Specifically, it calculates the probability of B-Z transition for each base pair in the sequence under the given conditions. In this way it identifies the Z-susceptible regions within the sequence, and assesses how they compete at any given level of superhelicity.
SIBZ was developed by modifying the SIDD algorithm to treat the B-Z transition, as described in the following section. Several other theoretical strategies have been developed or proposed for analyzing superhelical DNA transitions, which also might have been modified for this purpose. Although a formally exact method has been suggested based on recursion relations, it was found to be too computationally inefficient to warrant development
[43],
[44]. So an approximate algorithm was presented in the same paper that could make base pair-specific calculations. This method has not been made available for public use or evaluation. An alternative exact algorithmic strategy also has been developed and presented
[45]. Although this approach could compute transition profiles (i.e. transition probabilities for each base pair), it too was found to be too computationally cumbersome to be practical. So a more efficient approximate method based on its approach was also presented. To create SIBZ we chose to modify the SIDD approach because it has been extensively developed, optimized and implemented in this group, and it features an attractive combination of high accuracy and computational efficiency.
There have been three previous theoretical methods implemented that analyze DNA sequences to identify potential Z-DNA forming regions
[35],
[46]–
[48]. The first method, developed by the Jovin group, seeks to identify Z-susceptible sites based solely on their sequence characteristics
[46]. The energetics of transition were not considered in this approach. Another method, called
Z-Catcher, performs a mechanical calculation, but does not consider the thermodynamic equilibrium of the system
[47].
Z-Hunt [35],
[48] uses statistical mechanics, but only calculates the propensity of each fixed region within the sequence to form a Z-helix in isolation. Since the superhelical stresses that drive B-Z transitions couple together the transition behaviors of all base pairs that experience them, these approaches do not give information about how these competitive transitions behave
in situ.