1.  Potential non-B DNA regions in the human genome are associated with higher rates of nucleotide mutation and expression variation 
Nucleic Acids Research  2014;42(20):12367-12379.
While individual non-B DNA structures have been shown to impact gene expression, their broad regulatory role remains elusive. We utilized genomic variants and expression quantitative trait loci (eQTL) data to analyze genome-wide variation propensities of potential non-B DNA regions and their relation to gene expression. Independent of genomic location, these regions were enriched in nucleotide variants. Our results are consistent with previously observed mutagenic properties of these regions and counter a previous study concluding that G-quadruplex regions have a reduced frequency of variants. While such mutagenicity might undermine functionality of these elements, we identified in potential non-B DNA regions a signature of negative selection. Yet, we found a depletion of eQTL-associated variants in potential non-B DNA regions, opposite to what might be expected from their proposed regulatory role. However, we also observed that genes downstream of potential non-B DNA regions showed higher expression variation between individuals. This coupling between mutagenicity and tolerance for expression variability of downstream genes may be a result of evolutionary adaptation, which allows reconciling mutagenicity of non-B DNA structures with their location in functionally important regions and their potential regulatory role.
PMCID: PMC4227770  PMID: 25336616
2.  Competitive superhelical transitions involving cruciform extrusion 
Nucleic Acids Research  2013;41(21):9610-9621.
A DNA molecule under negative superhelical stress becomes susceptible to transitions to alternate structures. The accessible alternate conformations depend on base sequence and compete for occupancy. We have developed a method to calculate equilibrium distributions among the states available to such systems, as well as their average thermodynamic properties. Here we extend this approach to include superhelical cruciform extrusion at both perfect and imperfect inverted repeat (IR) sequences. We find that short IRs do not extrude cruciforms, even in the absence of competition. But as the length of an IR increases, its extrusion can come to dominate both strand separation and B-Z transitions. Although many IRs are present in human genomic DNA, we find that extrusion-susceptible ones occur infrequently. Moreover, their avoidance of transcription start sites in eukaryotes suggests that cruciform formation is rarely involved in mechanisms of gene regulation. We examine a set of clinically important chromosomal translocation breakpoints that occur at long IRs, whose rearrangement has been proposed to be driven by cruciform extrusion. Our results show that the susceptibilities of these IRs to cruciform formation correspond closely with their observed translocation frequencies.
PMCID: PMC3834812  PMID: 23969416
3.  Theoretical Analysis of Competing Conformational Transitions in Superhelical DNA 
PLoS Computational Biology  2012;8(4):e1002484.
We develop a statistical mechanical model to analyze the competitive behavior of transitions to multiple alternate conformations in a negatively supercoiled DNA molecule of kilobase length and specified base sequence. Since DNA superhelicity topologically couples together the transition behaviors of all base pairs, a unified model is required to analyze all the transitions to which the DNA sequence is susceptible. Here we present a first model of this type. Our numerical approach generalizes the strategy of previously developed algorithms, which studied superhelical transitions to a single alternate conformation. We apply our multi-state model to study the competition between strand separation and B-Z transitions in superhelical DNA. We show this competition to be highly sensitive to temperature and to the imposed level of supercoiling. Comparison of our results with experimental data shows that, when the energetics appropriate to the experimental conditions are used, the competition between these two transitions is accurately captured by our algorithm. We analyze the superhelical competition between B-Z transitions and denaturation around the c-myc oncogene, where both transitions are known to occur when this gene is transcribing. We apply our model to explore the correlation between stress-induced transitions and transcriptional activity in various organisms. In higher eukaryotes we find a strong enhancement of Z-forming regions immediately 5′ to their transcription start sites (TSS), and a depletion of strand separating sites in a broad region around the TSS. The opposite patterns occur around transcript end locations. We also show that susceptibility to each type of transition is different in eukaryotes and prokaryotes. By analyzing a set of untranscribed pseudogenes we show that the Z-susceptibility just downstream of the TSS is not preserved, suggesting it may be under selection pressure.
Author Summary
The stresses imposed on DNA within organisms can drive the molecule from its standard B-form double-helical structure into other conformations at susceptible sites within the sequence. We present a theoretical method to calculate this transition behavior due to stresses induced by supercoiling. We also develop a numerical algorithm that calculates the transformation probability of each base pair in a user-specified DNA sequence under stress. We apply this method to analyze the competition between transitions to strand separated and left-handed Z-form structures. We find that these two conformations are both competitive under physiological environmental conditions, and that this competition is especially sensitive to temperature. By comparing its results to experimental data we also show that the algorithm properly describes the competition between melting and Z-DNA formation. Analysis of large gene sets from various organisms shows a correlation between sites of stress-induced transitions and locations that are involved in regulating gene expression.
PMCID: PMC3343103  PMID: 22570598
4.  Theoretical Analysis of the Stress Induced B-Z Transition in Superhelical DNA 
PLoS Computational Biology  2011;7(1):e1001051.
We present a method to calculate the propensities of regions within a DNA molecule to transition from B-form to Z-form under negative superhelical stresses. We use statistical mechanics to analyze the competition that occurs among all susceptible Z-forming regions at thermodynamic equilibrium in a superhelically stressed DNA of specified sequence. This method, which we call SIBZ, is similar to the SIDD algorithm that was previously developed to analyze superhelical duplex destabilization. A state of the system is determined by assigning to each base pair either the B- or the Z-conformation, accounting for the dinucleotide repeat unit of Z-DNA. The free energy of a state is comprised of the nucleation energy, the sequence-dependent B-Z transition energy, and the energy associated with the residual superhelicity remaining after the change of twist due to transition. Using this information, SIBZ calculates the equilibrium B-Z transition probability of each base pair in the sequence. This can be done at any physiologically reasonable level of negative superhelicity. We use SIBZ to analyze a variety of representative genomic DNA sequences. We show that the dominant Z-DNA forming regions in a sequence can compete in highly complex ways as the superhelicity level changes. Despite having no tunable parameters, the predictions of SIBZ agree precisely with experimental results, both for the onset of transition in plasmids containing introduced Z-forming sequences and for the locations of Z-forming regions in genomic sequences. We calculate the transition profiles of 5 kb regions taken from each of 12,841 mouse genes and centered on the transcription start site (TSS). We find a substantial increase in the frequency of Z-forming regions immediately upstream from the TSS. The approach developed here has the potential to illuminate the occurrence of Z-form regions in vivo, and the possible roles this transition may play in biological processes.
Author Summary
We present the SIBZ algorithm that calculates the equilibrium properties of the transition from right-handed B-form to left-handed Z-form in a DNA sequence that is subjected to imposed stresses. SIBZ calculates the probability of transition of each base pair in a user-defined sequence. By examining illustrative examples, we show that the transition behaviors of all Z-susceptible regions in a sequence are coupled together by the imposed stresses. We show that the results produced by SIBZ agree closely with experimental observations of both the onset of transitions and the locations of Z-form sites in molecules of specified sequence. By analyzing 12,841 mouse genes, we show that sites susceptible to the B-Z transition cluster upstream from gene start sites. As this is where stresses generated by transcription accumulate, these sites may actually experience this transition when the genes involved are being expressed. This suggests that these transitions may serve regulatory functions.
PMCID: PMC3024258  PMID: 21283778

