Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Potential non-B DNA regions in the human genome are associated with higher rates of nucleotide mutation and expression variation 
Nucleic Acids Research  2014;42(20):12367-12379.
While individual non-B DNA structures have been shown to impact gene expression, their broad regulatory role remains elusive. We utilized genomic variants and expression quantitative trait loci (eQTL) data to analyze genome-wide variation propensities of potential non-B DNA regions and their relation to gene expression. Independent of genomic location, these regions were enriched in nucleotide variants. Our results are consistent with previously observed mutagenic properties of these regions and counter a previous study concluding that G-quadruplex regions have a reduced frequency of variants. While such mutagenicity might undermine functionality of these elements, we identified in potential non-B DNA regions a signature of negative selection. Yet, we found a depletion of eQTL-associated variants in potential non-B DNA regions, opposite to what might be expected from their proposed regulatory role. However, we also observed that genes downstream of potential non-B DNA regions showed higher expression variation between individuals. This coupling between mutagenicity and tolerance for expression variability of downstream genes may be a result of evolutionary adaptation, which allows reconciling mutagenicity of non-B DNA structures with their location in functionally important regions and their potential regulatory role.
PMCID: PMC4227770  PMID: 25336616
2.  Competitive superhelical transitions involving cruciform extrusion 
Nucleic Acids Research  2013;41(21):9610-9621.
A DNA molecule under negative superhelical stress becomes susceptible to transitions to alternate structures. The accessible alternate conformations depend on base sequence and compete for occupancy. We have developed a method to calculate equilibrium distributions among the states available to such systems, as well as their average thermodynamic properties. Here we extend this approach to include superhelical cruciform extrusion at both perfect and imperfect inverted repeat (IR) sequences. We find that short IRs do not extrude cruciforms, even in the absence of competition. But as the length of an IR increases, its extrusion can come to dominate both strand separation and B-Z transitions. Although many IRs are present in human genomic DNA, we find that extrusion-susceptible ones occur infrequently. Moreover, their avoidance of transcription start sites in eukaryotes suggests that cruciform formation is rarely involved in mechanisms of gene regulation. We examine a set of clinically important chromosomal translocation breakpoints that occur at long IRs, whose rearrangement has been proposed to be driven by cruciform extrusion. Our results show that the susceptibilities of these IRs to cruciform formation correspond closely with their observed translocation frequencies.
PMCID: PMC3834812  PMID: 23969416
3.  The genome-wide distribution of non-B DNA motifs is shaped by operon structure and suggests the transcriptional importance of non-B DNA structures in Escherichia coli 
Nucleic Acids Research  2013;41(12):5965-5977.
Although the right-handed double helical B-form DNA is most common under physiological conditions, DNA is dynamic and can adopt a number of alternative structures, such as the four-stranded G-quadruplex, left-handed Z-DNA, cruciform and others. Active transcription necessitates strand separation and can induce such non-canonical forms at susceptible genomic sequences. Therefore, it has been speculated that these non-B DNA motifs can play regulatory roles in gene transcription. Such conjecture has been supported in higher eukaryotes by direct studies of several individual genes, as well as a number of large-scale analyses. However, the role of non-B DNA structures in many lower organisms, in particular proteobacteria, remains poorly understood and incompletely documented. In this study, we performed the first comprehensive study of the occurrence of B DNA–non-B DNA transition-susceptible sites (non-B DNA motifs) within the context of the operon structure of the Escherichia coli genome. We compared the distributions of non-B DNA motifs in the regulatory regions of operons with those from internal regions. We found an enrichment of some non-B DNA motifs in regulatory regions, and we show that this enrichment cannot be simply explained by base composition bias in these regions. We also showed that the distribution of several non-B DNA motifs within intergenic regions separating divergently oriented operons differs from the distribution found between convergent ones. In particular, we found a strong enrichment of cruciforms in the termination region of operons; this enrichment was observed for operons with Rho-dependent, as well as Rho-independent terminators. Finally, a preference for some non-B DNA motifs was observed near transcription factor-binding sites. Overall, the conspicuous enrichment of transition-susceptible sites in these specific regulatory regions suggests that non-B DNA structures may have roles in the transcriptional regulation of specific operons within the E. coli genome.
PMCID: PMC3695496  PMID: 23620297
4.  Theoretical Analysis of Competing Conformational Transitions in Superhelical DNA 
PLoS Computational Biology  2012;8(4):e1002484.
We develop a statistical mechanical model to analyze the competitive behavior of transitions to multiple alternate conformations in a negatively supercoiled DNA molecule of kilobase length and specified base sequence. Since DNA superhelicity topologically couples together the transition behaviors of all base pairs, a unified model is required to analyze all the transitions to which the DNA sequence is susceptible. Here we present a first model of this type. Our numerical approach generalizes the strategy of previously developed algorithms, which studied superhelical transitions to a single alternate conformation. We apply our multi-state model to study the competition between strand separation and B-Z transitions in superhelical DNA. We show this competition to be highly sensitive to temperature and to the imposed level of supercoiling. Comparison of our results with experimental data shows that, when the energetics appropriate to the experimental conditions are used, the competition between these two transitions is accurately captured by our algorithm. We analyze the superhelical competition between B-Z transitions and denaturation around the c-myc oncogene, where both transitions are known to occur when this gene is transcribing. We apply our model to explore the correlation between stress-induced transitions and transcriptional activity in various organisms. In higher eukaryotes we find a strong enhancement of Z-forming regions immediately 5′ to their transcription start sites (TSS), and a depletion of strand separating sites in a broad region around the TSS. The opposite patterns occur around transcript end locations. We also show that susceptibility to each type of transition is different in eukaryotes and prokaryotes. By analyzing a set of untranscribed pseudogenes we show that the Z-susceptibility just downstream of the TSS is not preserved, suggesting it may be under selection pressure.
Author Summary
The stresses imposed on DNA within organisms can drive the molecule from its standard B-form double-helical structure into other conformations at susceptible sites within the sequence. We present a theoretical method to calculate this transition behavior due to stresses induced by supercoiling. We also develop a numerical algorithm that calculates the transformation probability of each base pair in a user-specified DNA sequence under stress. We apply this method to analyze the competition between transitions to strand separated and left-handed Z-form structures. We find that these two conformations are both competitive under physiological environmental conditions, and that this competition is especially sensitive to temperature. By comparing its results to experimental data we also show that the algorithm properly describes the competition between melting and Z-DNA formation. Analysis of large gene sets from various organisms shows a correlation between sites of stress-induced transitions and locations that are involved in regulating gene expression.
PMCID: PMC3343103  PMID: 22570598
5.  Superhelical Duplex Destabilization and the Recombination Position Effect 
PLoS ONE  2011;6(6):e20798.
The susceptibility to recombination of a plasmid inserted into a chromosome varies with its genomic position. This recombination position effect is known to correlate with the average G+C content of the flanking sequences. Here we propose that this effect could be mediated by changes in the susceptibility to superhelical duplex destabilization that would occur. We use standard nonparametric statistical tests, regression analysis and principal component analysis to identify statistically significant differences in the destabilization profiles calculated for the plasmid in different contexts, and correlate the results with their measured recombination rates. We show that the flanking sequences significantly affect the free energy of denaturation at specific sites interior to the plasmid. These changes correlate well with experimentally measured variations of the recombination rates within the plasmid. This correlation of recombination rate with superhelical destabilization properties of the inserted plasmid DNA is stronger than that with average G+C content of the flanking sequences. This model suggests a possible mechanism by which flanking sequence base composition, which is not itself a context-dependent attribute, can affect recombination rates at positions within the plasmid.
PMCID: PMC3111454  PMID: 21695263
6.  Theoretical Analysis of the Stress Induced B-Z Transition in Superhelical DNA 
PLoS Computational Biology  2011;7(1):e1001051.
We present a method to calculate the propensities of regions within a DNA molecule to transition from B-form to Z-form under negative superhelical stresses. We use statistical mechanics to analyze the competition that occurs among all susceptible Z-forming regions at thermodynamic equilibrium in a superhelically stressed DNA of specified sequence. This method, which we call SIBZ, is similar to the SIDD algorithm that was previously developed to analyze superhelical duplex destabilization. A state of the system is determined by assigning to each base pair either the B- or the Z-conformation, accounting for the dinucleotide repeat unit of Z-DNA. The free energy of a state is comprised of the nucleation energy, the sequence-dependent B-Z transition energy, and the energy associated with the residual superhelicity remaining after the change of twist due to transition. Using this information, SIBZ calculates the equilibrium B-Z transition probability of each base pair in the sequence. This can be done at any physiologically reasonable level of negative superhelicity. We use SIBZ to analyze a variety of representative genomic DNA sequences. We show that the dominant Z-DNA forming regions in a sequence can compete in highly complex ways as the superhelicity level changes. Despite having no tunable parameters, the predictions of SIBZ agree precisely with experimental results, both for the onset of transition in plasmids containing introduced Z-forming sequences and for the locations of Z-forming regions in genomic sequences. We calculate the transition profiles of 5 kb regions taken from each of 12,841 mouse genes and centered on the transcription start site (TSS). We find a substantial increase in the frequency of Z-forming regions immediately upstream from the TSS. The approach developed here has the potential to illuminate the occurrence of Z-form regions in vivo, and the possible roles this transition may play in biological processes.
Author Summary
We present the SIBZ algorithm that calculates the equilibrium properties of the transition from right-handed B-form to left-handed Z-form in a DNA sequence that is subjected to imposed stresses. SIBZ calculates the probability of transition of each base pair in a user-defined sequence. By examining illustrative examples, we show that the transition behaviors of all Z-susceptible regions in a sequence are coupled together by the imposed stresses. We show that the results produced by SIBZ agree closely with experimental observations of both the onset of transitions and the locations of Z-form sites in molecules of specified sequence. By analyzing 12,841 mouse genes, we show that sites susceptible to the B-Z transition cluster upstream from gene start sites. As this is where stresses generated by transcription accumulate, these sites may actually experience this transition when the genes involved are being expressed. This suggests that these transitions may serve regulatory functions.
PMCID: PMC3024258  PMID: 21283778
7.  The distribution of inverted repeat sequences in the Saccharomyces cerevisiae genome 
Current Genetics  2010;56(4):321-340.
Although a variety of possible functions have been proposed for inverted repeat sequences (IRs), it is not known which of them might occur in vivo. We investigate this question by assessing the distributions and properties of IRs in the Saccharomyces cerevisiae (SC) genome. Using the IRFinder algorithm we detect 100,514 IRs having copy length greater than 6 bp and spacer length less than 77 bp. To assess statistical significance we also determine the IR distributions in two types of randomization of the S. cerevisiae genome. We find that the S. cerevisiae genome is significantly enriched in IRs relative to random. The S. cerevisiae IRs are significantly longer and contain fewer imperfections than those from the randomized genomes, suggesting that processes to lengthen and/or correct errors in IRs may be operative in vivo. The S. cerevisiae IRs are highly clustered in intergenic regions, while their occurrence in coding sequences is consistent with random. Clustering is stronger in the 3′ flanks of genes than in their 5′ flanks. However, the S. cerevisiae genome is not enriched in those IRs that would extrude cruciforms, suggesting that this is not a common event. Various explanations for these results are considered.
Electronic supplementary material
The online version of this article (doi:10.1007/s00294-010-0302-6) contains supplementary material, which is available to authorized users.
PMCID: PMC2908449  PMID: 20446088
Palindromes; Imperfect inverted repeats; Hairpins; Yeast
8.  GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes 
Standards in Genomic Sciences  2009;1(2):204-215.
We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from Supplemental material including interactive atlases is available online at
PMCID: PMC3035224  PMID: 21304658
9.  Superhelical Destabilization in Regulatory Regions of Stress Response Genes  
PLoS Computational Biology  2008;4(1):e17.
Stress-induced DNA duplex destabilization (SIDD) analysis exploits the known structural and energetic properties of DNA to predict sites that are susceptible to strand separation under negative superhelical stress. When this approach was used to calculate the SIDD profile of the entire Escherichia coli K12 genome, it was found that strongly destabilized sites occur preferentially in intergenic regions that are either known or inferred to contain promoters, but rarely occur in coding regions. Here, we investigate whether the genes grouped in different functional categories have characteristic SIDD properties in their upstream flanks. We report that strong SIDD sites in the E. coli K12 genome are statistically significantly overrepresented in the upstream regions of genes encoding transcriptional regulators. In particular, the upstream regions of genes that directly respond to physiological and environmental stimuli are more destabilized than are those regions of genes that are not involved in these responses. Moreover, if a pathway is controlled by a transcriptional regulator whose gene has a destabilized 5′ flank, then the genes (operons) in that pathway also usually contain strongly destabilized SIDD sites in their 5′ flanks. We observe this statistically significant association of SIDD sites with upstream regions of genes functioning in transcription in 38 of 43 genomes of free-living bacteria, but in only four of 18 genomes of endosymbionts or obligate parasitic bacteria. These results suggest that strong SIDD sites 5′ to participating genes may be involved in transcriptional responses to environmental changes, which are known to transiently alter superhelicity. We propose that these SIDD sites are active and necessary participants in superhelically mediated regulatory mechanisms governing changes in the global pattern of gene expression in prokaryotes in response to physiological or environmental changes.
Author Summary
DNA in vivo experiences regulated amounts of untwisting stress. If sufficiently large, these stresses can destabilize the double helix at specific locations. These sites then become favored locations for strand separations. Gene expression and DNA replication, the two major jobs of DNA, both require the strands of the duplex to be separated. Thus, events that affect the ease of strand separation can regulate the initiation of these processes. Stress-induced DNA duplex destabilization (SIDD) has been implicated in mechanisms regulating several biological processes, including the initiation of gene expression and replication. We have developed computational methods that accurately predict the locations and extents of destabilization within genomic DNA sequences that occur in response to specified stress levels. Here, we report that the easily destabilized sites we find in the Escherichia coli K12 genome are statistically significantly overrepresented in the upstream regions of genes encoding proteins that regulate transcription. In particular, the regions upstream of genes that directly respond to physiological and environmental stimuli are more destabilized than are those regions of genes that are not involved in these responses. These results suggest that strong SIDD sites upstream of participating genes may be involved in transcriptional responses to environmental changes.
PMCID: PMC2211533  PMID: 18208321
10.  Integration Site Choice of a Feline Immunodeficiency Virus Vector 
Journal of Virology  2006;80(17):8820-8823.
We mapped 226 unique integration sites in human hepatoma cells following gene transfer with a feline immunodeficiency virus (FIV)-based lentivirus vector. FIV integrated across the entire length of the transcriptional units. Microarray data indicated that FIV integration favored actively transcribed genes. Approximately 21% of FIV integrations within transcriptional units occurred in genes regulated by the LEDGF/p75 transcriptional coactivator. DNA in regions of FIV insertion sites exhibited a “bendable” structure and a pattern of duplex destabilization favoring strand separation. FIV integration preferences are more similar to those of primate lentiviruses and distinct from those of Moloney murine leukemia virus, avian sarcoma leukosis virus, and foamy virus.
PMCID: PMC1563849  PMID: 16912328
11.  OriDB: a DNA replication origin database 
Nucleic Acids Research  2006;35(Database issue):D40-D46.
Replication of eukaryotic chromosomes initiates at multiple sites called replication origins. Replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where several complementary studies have mapped their locations genome-wide. We have collated these datasets, taking account of the resolution of each study, to generate a single list of distinct origin sites. OriDB provides a web-based catalogue of these confirmed and predicted S.cerevisiae DNA replication origin sites. Each proposed or confirmed origin site appears as a record in OriDB, with each record comprising seven pages. These pages provide, in text and graphical formats, the following information: genomic location and chromosome context of the origin site; time of origin replication; DNA sequence of proposed or experimentally confirmed origin elements; free energy required to open the DNA duplex (stress-induced DNA duplex destabilization or SIDD); and phylogenetic conservation of sequence elements. In addition, OriDB encourages community submission of additional information for each origin site through a User Notes facility. Origin sites are linked to several external resources, including the Saccharomyces Genome Database (SGD) and relevant publications at PubMed. Finally, a Chromosome Viewer utility allows users to interactively generate graphical representations of DNA replication data genome-wide. OriDB is available at .
PMCID: PMC1781122  PMID: 17065467
12.  Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress 
BMC Bioinformatics  2006;7:248.
In our previous studies, we found that the sites in prokaryotic genomes which are most susceptible to duplex destabilization under the negative superhelical stresses that occur in vivo are statistically highly significantly associated with intergenic regions that are known or inferred to contain promoters. In this report we investigate how this structural property, either alone or together with other structural and sequence attributes, may be used to search prokaryotic genomes for promoters.
We show that the propensity for stress-induced DNA duplex destabilization (SIDD) is closely associated with specific promoter regions. The extent of destabilization in promoter-containing regions is found to be bimodally distributed. When compared with DNA curvature, deformability, thermostability or sequence motif scores within the -10 region, SIDD is found to be the most informative DNA property regarding promoter locations in the E. coli K12 genome. SIDD properties alone perform better at detecting promoter regions than other programs trained on this genome. Because this approach has a very low false positive rate, it can be used to predict with high confidence the subset of promoters that are strongly destabilized. When SIDD properties are combined with -10 motif scores in a linear classification function, they predict promoter regions with better than 80% accuracy. When these methods were tested with promoter and non-promoter sequences from Bacillus subtilis, they achieved similar or higher accuracies. We also present a strictly SIDD-based predictor for annotating promoter sequences in complete microbial genomes.
In this report we show that the propensity to undergo stress-induced duplex destabilization (SIDD) is a distinctive structural attribute of many prokaryotic promoter sequences. We have developed methods to identify promoter sequences in prokaryotic genomes that use SIDD either as a sole predictor or in combination with other DNA structural and sequence properties. Although these methods cannot predict all the promoter-containing regions in a genome, they do find large sets of potential regions that have high probabilities of being true positives. This approach could be especially valuable for annotating those genomes about which there is limited experimental data.
PMCID: PMC1468432  PMID: 16677393
13.  SIDDBASE: a database containing the stress-induced DNA duplex destabilization (SIDD) profiles of complete microbial genomes 
Nucleic Acids Research  2005;34(Database issue):D373-D378.
Prokaryotic genomic DNA is generally negatively supercoiled in vivo. Many regulatory processes, including the initiation of transcription, are known to depend on the superhelical state of the DNA substrate. The stresses induced within DNA by negative superhelicity can destabilize the DNA duplex at specific sites. Various experiments have either shown or suggested that stress-induced DNA duplex destabilization (SIDD) is involved in specific regulatory mechanisms governing a variety of biological processes. We have developed methods to evaluate the SIDD properties of DNA sequences, including complete chromosomes. This analysis predicts the locations where the duplex becomes destabilized under superhelical stress. Previous studies have shown that the SIDD-susceptible sites predicted in this way occur at rates much higher than expected at random in transcriptional regulatory regions, and much lower than expected in coding regions. Analysis of the SIDD profiles of 42 bacterial genomes chosen for their diversity confirms this pattern. Predictions of SIDD sites have been used to identify potential genomic regulatory regions, and suggest both possible regulatory mechanisms involving stress-induced destabilization and experimental tests of these mechanisms. Here we describe the SIDDBASE database which enables users to retrieve and visualize the results of SIDD analyses of completely sequenced prokaryotic and archaeal genomes, together with their annotations. SIDDBASE is available at .
PMCID: PMC1347370  PMID: 16381890
14.  Susceptibility to Superhelically Driven DNA Duplex Destabilization: A Highly Conserved Property of Yeast Replication Origins 
Strand separation is obligatory for several DNA functions, including replication. However, local DNA properties such as A+T content or thermodynamic stability alone do not determine the susceptibility to this transition in vivo. Rather, superhelical stresses provide long-range coupling among the transition behaviors of all base pairs within a topologically constrained domain. We have developed methods to analyze superhelically induced duplex destabilization (SIDD) in genomic DNA that take into account both this long-range stress-induced coupling and sequence-dependent local thermodynamic stability. Here we apply this approach to examine the SIDD properties of 39 experimentally well-characterized autonomously replicating DNA sequences (ARS elements), which function as replication origins in the yeast Saccharomyces cerevisiae. We find that these ARS elements have a strikingly increased susceptibility to SIDD relative to their surrounding sequences. On average, these ARS elements require 4.78 kcal/mol less free energy to separate than do their immediately surrounding sequences, making them more than 2,000 times easier to open. Statistical analysis shows that the probability of this strong an association between SIDD sites and ARS elements arising by chance is approximately 4 × 10−10. This local enhancement of the propensity to separate to single strands under superhelical stress has obvious implications for origin function. SIDD properties also could be used, in conjunction with other known origin attributes, to identify putative replication origins in yeast, and possibly in other metazoan genomes.
Several DNA functions require the two strands of the DNA duplex to transiently separate. Examples include the initiation of gene expression and of DNA replication. Here the authors examine the strand separation properties of the DNA duplex at autonomously replicating sequences (ARS elements), which are the potential replication origins in yeast.
In vivo, susceptibility to strand separation does not depend only on local DNA properties such as adenine plus thymine content or thermodynamic stability. Rather, stresses imposed on the DNA in vivo couple together the strand-opening behaviors of all base pairs that experience them. The authors use computational methods for analyzing stress-driven strand separation to examine the susceptibility to opening of 39 experimentally well-characterized ARS elements. They show that these ARS elements have strikingly increased susceptibilities to stress-induced separation relative to the surrounding sequences. On average, these ARS elements require 4.78 kcal/mol less free energy to separate than do surrounding sequences, making them more than 2,000 times easier to open. This enhanced susceptibility to stress-driven strand separation has obvious implications for the mechanisms that begin the process of replication. This property is also shared by bacterial and viral replication start points, suggesting that it may be a general attribute of replication origins.
PMCID: PMC1183513  PMID: 16103908

Results 1-14 (14)