PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Cell Mol Life Sci. Author manuscript; available in PMC 2011 January 7.
Published in final edited form as:
PMCID: PMC3017512
NIHMSID: NIHMS257586

Non-B DNA structure-induced genetic instability and evolution

Abstract

Repetitive DNA motifs are abundant in the genomes of various species and have the capacity to adopt non-canonical (i.e. non-B) DNA structures. Several non-B DNA structures, including cruciforms, slipped structures, triplexes, G-quadruplexes, and Z-DNA, have been shown to cause mutations, such as deletions, expansions, and translocations in both prokaryotes and eukaryotes. Their distributions in genomes are not random and often co-localize with sites of chromosomal breakage associated with genetic diseases. Current genome-wide sequence analyses suggest that the genomic instabilities induced by non-B DNA structure-forming sequences not only result in predisposition to disease, but also contribute to rapid evolutionary changes, particularly in genes associated with development and regulatory functions. In this review, we describe the occurrence of non-B DNA-forming sequences in various species, the classes of genes enriched in non-B DNA-forming sequences, and recent mechanistic studies on DNA structure-induced genomic instability to highlight their importance in genomes.

Keywords: non-B DNA structures, repetitive sequences, genomic instability, evolutionary change, gene regulation

Introduction

The canonical right-handed double helical structure of B-form DNA [1] has had a profound influence over studies designed to determine the function of DNA. However, many alternative DNA structures (reviewed in [2]) have been known to exist since the late 1950s and their roles in biological functions have begun to be elucidated, with substantial progress over the past decade. In 1957, sedimentation coefficient and optical absorption measurements revealed the association of ribonucleic poly-A and poly-U polymers into three-stranded complexes [3]. The DNA of a d(CpGpCpGpCpG) fragment was crystallized in 1979, which revealed a left-handed conformation (Z-DNA) with altered helical parameters relative to the right-handed B-form [4]. Soon after, cruciform structures formed by inverted repeats were identified by S1 nuclease probing [5,6] and by two-dimensional gel electrophoresis [7]. During this same period, parallel four-stranded complexes (tetraplex or G-quadruplex DNA) were discovered to form by guanine-rich DNA sequences [8]. To date, more than 10 different DNA conformations are known to exist from biophysical and biochemical studies, and more are likely to be identified.

Non-B DNA-forming sequences in genomes affect DNA replication and transcription, and contribute to genome instability [912]. In 1984, Glickman and Ripley reported the induction of deletions in the lacI gene of Escherichia coli by putative cruciform structures [13]. More recently, studies in model systems (bacteria, yeast, and mammalian cell culture) on trinucleotide repeat sequences, whose expansions in disease-related genes are involved in approximately 30 human hereditary neurological disorders (reviewed in [14]), support the mutagenic role of non-B DNA structures [1517]. Similarly, DNA sequences capable of forming non-canonical structures from the human c-MYC and BCL-2 loci that colocalize with translocation breakpoints, undergo frequent double-strand breaks (DSBs) in mammalian cells [12,1822]. In support of these results, the same non-B DNA structure-forming sequence from the human c-MYC gene, also stimulates genomic instability on chromosomes in transgenic mice [23]. Large (A+T)-rich inverted repeats on chromosome 22q11 and other chromosomes, such as 11q23 and 17q11 were found to cause recurrent translocations both in sperm cells in the general population [24] and in cell culture [25], providing evidence that cruciform structures may cause genomic rearrangements (reviewed in [2628]). Thus, alternative DNA conformations are believed to contribute to mutations and to the dysregulation of cancer-related genes in translocation-related malignant diseases such as myeloma, leukemia, and lymphoma (reviewed in [11,29]).

Recent advances in the field of genomics have revealed the widespread occurrence of non-B DNA-forming motifs in various genomes, their selective enrichment within specific classes of genes and/or chromosomes, and the asymmetric frequency distributions within transcriptional units. These data are paradoxical given the mutagenic role of repeating sequences and their involvement in human disease. Herein, we describe the structural features and biological functions (and potential mechanisms involved) of the most well-characterized DNA structures, i.e., Z-DNA, cruciforms, triplexes or H-DNA, G-quadrupexes, and looped-out slipped structures. We also provide evidence for novel roles of non-B DNA structure-forming sequences as co-regulators of transcriptional activity and as genomic elements through which positive selective pressures have acted during evolutionary time so as to shape and preserve specific genomic functions.

Non-B DNA structures

The distribution of nucleotides in genomes is not random. Many DNA sequence patterns exist throughout genomes from bacteria to human, such as direct repeats of homo-, di-, or tri-nucleotides, inverted repeats, mirror repeats, etc. Unlike the majority of DNA sequences which form the canonical right-handed B-form [1], repeated sequences have the capacity to also adopt alternative conformations (i.e. non-B DNA structures). To date, nearly a dozen types of non-B DNA structures have been described, including hairpins/cruciforms, Z-DNA, triplexes (H-DNA), tetraplexes, slipped DNA, and sticky DNA.

Hairpins/cruciforms

Hairpin/cruciform structures can form at inverted repeats [30]. One side of an inverted repeat, equidistant to the symmetric center, is complementary to the sequence on the other side, e.g. 5'-GACTGC….GCAGTC-3' (Fig. 1A). The two inverted repeats base pair with one another and form an intrastrand hairpin stem, leaving the sequence at the symmetric center looped out as a single strand. The cruciform structure consists of two hairpin-loop arms and a 4-way junction, which is structurally similar to a Holliday junction recombination intermediate [31]. Formation of hairpin/cruciform structures from double-stranded DNA requires energy that may come from negative supercoiling [32,33]. Under these conditions, two inverted repeats as short as 7 bp are sufficient for the formation of hairpin structures [34].

Figure 1
Non-B DNA structures. (A) Cruciform DNA, (B) Z-DNA, (C) H-DNA (triplex DNA), (D) G-quadruplex (tetraplex) DNA, and (E) Slipped DNA.

Z-DNA

Sequences with alternating pyrimidines and purines, such as (CG:CG)n and (CA:TG)n, may wind the double helix into a left-handed zigzag pattern (Z-DNA), as depicted in Fig. 1B. Whereas CG:CG repeats are most likely to form the Z-DNA structure, GT:AC repeats are more abundant in the human genome [35]. Compared to the right-handed B-form, the left-handed Z-DNA contains inverted purines in the syn-conformation while pyrimidines remain in the anti-conformation with the sugar-pucker altered from the C2- to the C3-endo position so as to maintain the Watson-Crick base-pairing [36]. These alterations cause a change in the sugar-phosphate backbone that changes the organization of the double helix. Therefore, unlike B-form DNA, which possesses one major groove and one minor groove, Z-DNA has only one deep and narrow groove with 12 bp per helical turn [4,37,38]. The crystal structure of a B- to Z-DNA junction was solved in 2005 and revealed an extruded base pair on each side of the DNA duplex, which is susceptible to DNA modification [39].

Triplex DNA (H-DNA)

Intramolecular triplex DNA structures can form at homopurine:homopyrimidine sequences with mirror symmetry, where a single-stranded region can bind in the major groove of the underlying DNA duplex to form a three-stranded helix [4042] (Fig. 1C). Triplex DNA can be classified according to the orientation and composition of the third strand, which can form either Hoogsteen or reverse-Hoogsteen hydrogen bonds with the purine-rich strand of the duplex DNA. Hence, the third strand can be either pyrimidine-rich and parallel to the complementary strand (Y*R:Y), or purine-rich and anti-parallel to the complementary strand (R*R:Y). Whereas (R*RY) triplexes form under conditions of physiological pH, triplex structures of the (Y*R:Y) composition form most readily under conditions of acidic pH. At physiological pH, triplex structures may be stabilized by negative supercoiling, modification with phosphorothioate groups, or polyvalent cations such as spermine and spermidine (reviewed in [42]).

Tetraplex DNA (G-quadruplex DNA)

This four-stranded structure consists of a square co-planar array of four guanines formed by a stretch of guanine-rich DNA [43] (Fig. 1D). Each guanine acts as a donor and acceptor of Hoogsteen hydrogen bonds in a cyclic arrangement involving N-1, N-2, O-6, and N-7. In vitro, these structures are stabilized by K+ or Na+ ions at physiological pH and temperature. Quadruplex structures may be formed by one, two, or four interacting strands and exist in a variety of conformations depending on the polarity of the strands (parallel or anti-parallel), glycosidic torsion angles, groove size, base sequence of the connecting loops, and the participation of cations.

Slipped strand DNA

When direct repeats are base paired with the complementary strand in a misaligned fashion, a slipped structure forms, particularly following unwinding, yielding hairpins or looped-out bases [44] (Fig. 1E). When direct repeats involve several units, like the telomeric or triplet repeat sequences (CGG, CTG, and CAG), the looped-out bases may form duplexes stabilized by interstrand stacking interactions [45].

Non-B conformations in vivo

The formation of DNA secondary structures in vitro has been demonstrated by several methods, including polyacrylamide gel electrophoresis, nuclease cleavage, chemical probing, circular dichroism, NMR, ultraviolet absorption, electron microscopy, atomic force microscopy, and crystallography (reviewed in [46]).

In vivo, non-B DNA conformations are believed to form, at least transiently, during DNA metabolic processes such as replication, transcription, repair, or recombination [9,11]. The expansion of slipped DNA-forming trinucleotide repeats observed in neurological diseases [14] correlates with the stability of secondary structures in vitro. For example, interruptions in the trinucleotide repeats of the SCA1 (CAG:CTG) and FRAXA (CCG:CGG) genes exert a protective role against instability [4749]. These interruptions reduce the propensity of DNA secondary structure formation in vitro and the correlation between rates of expansion in individuals and slipped DNA formation has been taken to support a role for slipped DNA in genetic instability. Nevertheless, the transient existence of these, and other non-B DNA structures, has made their detection difficult in genomic DNA [50], particularly in cases such as inverted repeats, in which multiple conformations are possible, depending upon the environmental conditions [51].

To date, fluorescence immunostaining by antibodies against specific DNA structures rather than the sequences per se is considered the most direct method for detecting non-B DNA structures in vivo. Rabbit antibodies specific for the Z-DNA structure formed by brominated poly[d(GC)]:poly[d(GC)] were generated in 1981 [52], and used to bind the interband regions of Drosophila polytene chromosomes [53] and to detect Z-DNA formed by GT repeats in negatively supercoiled plasmids in vitro [54]. Currently, antibodies against Z-DNA are commercially available (Abcam, GeneTex, etc.). One caveat of this methodology is that the estimation of non-B DNA structures formation in vivo may not reflect the physiological equilibrium conditions, since binding of Z-DNA antibodies may shift the B- to Z-DNA equilibrium towards the non-B conformation [55].

Several mouse monoclonal antibodies were developed to detect triplex DNA in chromosomes [56,57], and were demonstrated to bind triplex DNA specifically [58]. H-DNA structures in human interphase nuclei were also detected by fluorescently labeled single-stranded DNA oligonucleotides (complementary to the single-stranded region of the H-DNA structure) in vivo [59]. A quadruplex monoclonal antibody was first developed in 1998 [60,61] in mice against the quadruplexes formed by synthetic d(CGCG4GCG) and the telomere-derived d(TG4) and d(T2G4)4 sequences in vitro. Later, a high affinity (Kd ~ 4 nM) antibody against tetraplex DNA structures was developed and used in in vivo studies of telomeric tetraplex structures in the macronucleus of Stylonychia lemnae [62]. Finally, a monoclonal antibody was developed to recognize cruciform and T-shaped DNA structures [63].

Additional evidence for the existence of non-B DNA structures in vivo has been generated using methods such as chemical probing and DNA cross-linking of genomic DNA sequences [64,65]. However, most of these methods require DNA extraction (before and/or after treatment) for analyses, as it has proven difficult to directly detect these structures in living cells. In addition to technical challenges associated with the detection of non-B DNA, these structures are certainly transient in nature in cells, making their detection even more challenging.

Genome-wide analyses and evolutionary relationships

Abundance and distribution of non-B DNA-forming sequences

Since the abundance and distribution of non-B DNA-forming sequences may provide insights into their functions in DNA metabolism, analyses were carried out to compare the abundance of these structures in the genomes of various organisms. Overall, non-B DNA-forming sequences are more abundant in eukaryotic genomes than in prokaryotes [66].

Hairpin/cruciform (Inverted repeats)

Analysis of human sequences containing 157 genes for a total of 1 Mb of genomic sequence (including exons, introns, and 5’- or 3’-UTRs) revealed many dA:dT sequences, which may form cruciforms [51]. In this sample set, the overall dA:dT abundance was ~49.7%, and the cruciform-forming sequences (≥8 bp (A+T)-rich inverted repeats) in the human genome was ~1/41,700 bp. Additional analyses of genomic sequences in E. coli and yeast revealed that the cruciform-forming sequences were more abundant in yeast (1/19,700 bp) and human than in E. coli [51]. The distributions of hairpin/cruciform structure-forming sequences often overlap with chromosomal regions prone to gross rearrangements both in somatic and in germ cells [6769].

Z-DNA

Although the human genome is less (G+C)-rich than prokaryotic genomes, Z-DNA-forming sequences are in fact very abundant. The GT:AC repeats are estimated to account for more than 0.25% of the entire human genome [35]. A computer-based thermodynamic search strategy (Z-Hunt-II) used by the Ho group to analyze the complete human genome showed that Z-DNA-forming sequences occur approximately once every 3,000 bp [70]. Furthermore, Z-DNA-forming regions were found to be distinctly located near the 5’ ends of genes in the genome, and the proximity between these regions and the transcription start sites became more pronounced during the divergence from prokaryotes to eukaryotes [70]. Therefore, the location bias of these GT:AC repeats is supportive of Z-DNA formation and stabilization by the transient surges in negative supercoiling associated with transcription. As early as in 1983, Nordheim and Rich suggested that three 8-bp Z-DNA-forming sequences in the simian virus 40 enhancer region may function in transcriptional activation [71]. Studies in yeast showed that Z-DNA structures can be induced or stabilized by Z-DNA-binding proteins and function in gene regulation and chromatin-remodeling [72,73]. The occurrence of Z-DNA-forming sequences at chromosomal breakpoints in human tumors suggests that Z-DNA plays a role in causing genomic instability, perhaps by inducing double strand breaks and large deletions [18].

H-DNA-forming sequences (R:Y tracts with mirror symmetry)

H-DNA-forming sequences occur at higher levels than expected in mammalian genomes. Using the same 1 Mb sequence sample set from the human genome as in the study of hairpin/cruciform structure-forming sequences, Schroth and Ho found that the occurrence of H-DNA sequences (≥10 bp 100% homopurine:homopyrimidines but <80% (A+T)-rich) in the human genome was ~1/49,400 bp [51]. The distribution of long (≥100 bp) homopurine:homopyrimide sequences in human genes was confined to introns of genes coding for products localized to the cell membrane, phosphorylation, signal transduction, and development and morphogenesis [74]. H-DNA structure-forming sequences are also found flanking proto-oncogenes, such as c-MYC, and may cause genomic instability, such as deletions and other rearrangements [12,23].

Tetraplex (G-quadruplex DNA)

Two independent genome-wide surveys for potential intramolecular G-quadruplex-forming sequences identified ~37,000 sites in the human genome, approximately 1 tetraplex every 10 kb [75,76], with ~60% of them located outside of coding regions [75]. Tetraplex-forming guanine-rich sequences are found in immunoglobulin switch regions [8], telomeric DNA [77,78], poly (dG) runs [79], and promoter regions [80]. An analysis of promoter regions of 19,268 validated human genes in ENSEMBL (NCBI 34) showed that ~42.7% of human gene promoters contain at least one quadruplex-forming sequence [80]. Du et al (2007, 2008) analyzed 13,276 human Reference Sequence (RefSeq) genes and 2,892 chicken RefSeq genes for potential G-quadruplex-forming sequences and identified one or more G4 DNA motifs in >60% of the genes studied [81,82]. The distribution of the more stable form of G-tetraplex, which contains single-nucleotide loops, is more abundant near transcription start sites, suggesting that this stable secondary structure may have been under positive selection to influence the transcription of particular groups of genes [80]. In addition, a high proportion of genes also contain G4 motifs in 3’-UTRs, implying a role in facilitating transcriptional termination, perhaps by weakening the association of an RNA polymerase complex with template DNA [83]. Therefore, the distribution of G-rich sequences in genomes supports their involvement in the regulation of transcription, in addition to other roles, such as homologous recombination [8,84] and telomere maintenance [78].

Slipped DNA (S-DNA)

Repetitive DNA sequences account for nearly 30% of the human genome, and are interspersed throughout chromosomes [85,86]. These repeats are referred to as microsatellites (1–7 nt, [48]) or minisatellites (10–100 nt, [87]). Various human diseases have been demonstrated to be associated with either expansion or contraction of microsatellites and minisatellites [48,87]. Although microsatellites are abundant in the human genome, their representation varies greatly depending on sequence composition. For example, whereas >16,000 tracts comprised of A or T mononucleotide runs were present in the hg16 assembly at length ≥30 nt, only seven analogous tracts of Gs and Cs were found [88]. Closer examination of the physical properties of tri- and tetra-nucleotide repeats revealed an inverse relationship between their number in vertebrate genomes and the propensity to fold into the hairpin or quadruplex structures [89]. These data suggest that sequences with the propensity to form stable secondary structures have not been maintained as efficiently as their less stable counterparts during evolutionary time. Nevertheless, a comparison of the distribution of these tri- and tetra-nucleotide sequences in protein coding vs. non-coding regions revealed that the number of certain “strong secondary structure-forming” sequences, such as AGC, CCG, CCCG, AGCG, CCGG and ACCG was higher than expected in coding regions [89], supporting the idea that selective pressures acted so as to preserve the amino acid coding ability of these inherently unstable sequences.

It is important to point out that not all the repeated sequences analyzed to date have the same capacity to form non-B DNA structures. The search criteria used in different reports were set to answer different questions. For example, the Ho group alerted that although (G+C)-rich sequences are abundant in E. coli, not all of them meet the requirement for forming stable secondary structures. Rather, these (G+C)-rich repeats in bacteria are mostly recognized as transcription termination sequences when transcribed into RNA [70]. Also, the most abundant tetraplex-forming G-rich sequences in the human genome analyzed by Huppert and Balasubramanian (2005) are located on the coding strand and therefore may fold into alternative structures in the RNA transcripts rather than in genomic DNA [76]. Therefore, all repeat-based analyses should be interpreted with the realization that some of these ‘unusual’ sequences may not form ‘unusual’ DNA structures.

Gene categories

The completion of the Human Genome Project (HGP) [35,90] has made it possible to address the question of the distribution of non-B DNA-forming sequences in relation to transcribed DNA. More than 99% of euchromatic DNA, which contains genes and putative genes, is currently assembled. The remaining 0.5–1% of gapped DNA (~24 Mb) mostly contains segmental duplications, i.e. nearly identical sequences present at different chromosomal locations [91], for which clones are available to enable covering. Hence, the data summarized below is expected to capture most of the global genomic organization of genes in relation to non-B DNA-forming sequences. One notable exception is represented by the 18S- and 28S-ribosomal RNA gene arrays in acrocentric chromosomes, which like centromeric, pericentromeric and subtelomeric heterochromatin, are not targeted for sequencing. Indeed, few clones are available for such recalcitrant regions. Heterochromatin, which amounts to ~5–7% (~200 Mb) [91], is almost entirely populated by tandem repeats and shows limited transcriptional activity.

The first genome-wide search for inverted repeats (IRs) in the human genome revealed the prevalence of large IRs (96 with arm size ≥8 kb and ≥95% sequence identity) on the X (~25%) and Y (~15%) chromosomes [69]. Of the 49 IRs whose arms shared >97–99% sequence identity, eleven from chromosome X, six from chromosome Y, and one from chromosome 11 contained genes/gene clusters predominantly expressed in the testis (Table 1). Indeed, all annotated genes present on the IRs from chromosome Y display testis-restricted expression and have a function in sperm production and maturation [92].

Table 1
Gene categories and DNA repeats

A subsequent search for the distribution of long, i.e. ≥100 and ≥250 nt, R:Y tracts within human genes indicated the presence of such sequences in the introns of 1,951 and 228, respectively, non-redundant transcriptional units [74]. Strong enrichment (P-values as low as 10−15) was observed for sequences in genes encoding proteins with ion channel activity, cell adhesion, and cell-cell communication functions, particularly in subcellular structures, such as the post-synaptic density, critical to the transmission of nerve impulses (Table 1).

Herein, we report the analysis of the distribution of tetranucleotide repeat (TR) sequences ≥8 units [89] in human genes. Of the 29,708 TR tracts found genome-wide [89], 8,943 (~1/3) were located in 4,182 non-redundant RefSeq genes (~1/5 of all annotated genes), or within 1 kb of their transcriptional boundaries, with an average of 2 TR tracts per gene. Also, 114 genes were found to contain the repeats in the promoter region (within 1 kb of the predominant transcription start site), two in the 5’-UTR, 4,485 in introns, 23 in the 3’-UTR, and 100 within 1 kb downstream of the transcriptional unit. Thus, ~95% of gene-associated TRs are located within introns. The group of TR-containing genes was found to be most enriched for genes involved in cell adhesion, localization to the plasma membrane, ion channel function, and receptors involved in signal transduction pathways, cell communication, and transmission of the nerve impulse (Table 1 and Supplementary Information). In addition, genes associated with glutamate receptor activity were progressively enriched as a function of TR length (Supplementary Fig. 1 and Supplementary Information).

The enrichment analyses for the two gene datasets containing either TR (≥8 units, 4,182 genes) sequences or long R:Y tracts (≥100 nt, 1,951 genes) were extended to additional genomic functions [93]. Both datasets were highly enriched in genes known to undergo alternative splicing and prone to DNA breakage leading to chromosomal translocations (Table 1). These data enable the following conclusions: 1) the categories of genes enriched in long R:Y tracts are also enriched in TR sequences; 2) the gene functions involved are associated, as a whole, with communication between cells; 3) long R:Y tracts (which also include most, TRs ≥18 units, Supplementary Information and Supplementary Fig. 1) are an exquisite property of synaptic glutamatergic activity; 4) intragenic R:Y and TR tracts are characteristic of genes that have acquired a complex organization through alternative splicing and thus, may encode proteins with multiple functions and 5) the genes involved are generally prone to breakage. An important aspect of these studies is the association between R:Y-tract containing genes and genes that confer susceptibility to complex mental disorders [74]. This association has recently been strengthened by genome-wide case-control analyses [94] in subjects afflicted with schizophrenia [74,95]. Hence, triplex-forming sequences are attributes of genes involved in integrative networking functions in the brain.

Analysis of the distribution of micro and minisatellites ranging from dinucleotides to 11-mer repeats in human cDNAs [89] identified 2,626 unique RefSeq genes. The set displayed strong enrichment for genes associated with transcription factors, the regulation of transcription and specific signaling pathways, including genes from the MAPK and WNT pathways (Table 1). Similar searches at the proteomic level also showed preferential enrichment for transcription factors, chromatin binding proteins, DNA and RNA binding proteins, and proteins involved in translation [96,97]. The current rationale for these observations consists of a model whereby homo-amino acid runs constitute disordered protein regions that become ordered upon nucleic acid and/or cognate protein binding. The transition from a disordered to an ordered state would then greatly enhance the stability of the ensuing complexes and therefore elicit specific biological functions (reviewed in [29]).

As mentioned above, G-quadruplex-forming repeats predominate in gene regions flanking the transcription start sites but are also abundant in 3’-UTRs. The classes of genes most enriched in such repeats belong to the family of small GTPases, such as Rho, which play critical roles in signal transduction [98] and in the regulation of stress fibers, including the actin cytoskeleton [99] (Table 1).

In summary, the association of repetitive DNA with gene function follows specific patterns, i.e. genes involved in male reproduction for large IRs, cell-cell communication for long R:Y tracts, transcription and its regulation for coding microsatellites and small GTPase signaling/regulation for G-quadruplexes. Therefore, it is likely that selective pressures have acted so as to maintain specific DNA sequences in coding regions and to enable the acquisition and maintenance of novel gene functions during the course of evolution.

Patterns of global gene expression

The first analyses on the genome-wide distributions of quadruplex-forming motifs (G3+N1–7 G3+N1–7 G3+N1–7 G3+) revealed their high prevalence in warm-blooded species [100] and an overrepresentation in the promoter region of genes [75,80,82,101]. Indeed, a recent investigation on a dataset of 13,276 non-redundant human RefSeq genes established the presence of one or more G4 motifs in the 500-nt region flanking the transcription start site (TSS) of 8,214 (~62%) such genes [81], a significant proportion. When the expression value of the RefSeq genes was analyzed in 79 human tissues/cell types, a significant association was found between G4 motifs downstream, but not upstream, of the TSS and an increase in gene expression. Moreover, a direct relationship was evident between the number of G4 motifs (0–4) and the levels of gene expression. Further analyses indicated that the average levels of gene expression for both the G4-negative and G4-positive genes varied according to tissue/cell type. Nevertheless, in each case the G4-positive gene set displayed higher transcriptional values than the G4-negative set (Fig. 2A). Hence, a direct association exists between G4 motifs and gene transcription, supporting a genome-wide role for quadruplex structures in either promoting transcriptional activity and/or stabilizing the ensuing pre-mRNA transcripts.

Figure 2
(A) Expression profiles of quadruplex-containing genes. Comparison of the gene expression levels between genes containing quadruplex-forming sequences (PG4MD500-positive, black squares) and genes without PG4MD500 (open squares) in each human tissue/cell ...

Quadruplex nucleic acid structures are likely to regulate transcriptional activity by several, and perhaps opposing, mechanisms. A recent search for G4 motifs in 32,985 annotated 5-UTRs and 32,818 3’-UTRs from a compilation of 21,658 human genes yielded the following trend in relative frequencies per kb of DNA: 5’-UTR>3’-UTR>transcriptome>whole-genome, with values ranging from 0.382 to 0.057 [83]. Significantly, not only G4 motifs were overrepresented in the 3’-UTRs in addition to 5’-UTRs, but also for a high proportion of genes (97/561 or ~17%) with G4 motifs in 3’-UTRs, the genomic distance from the end of transcription to the next gene was shorter (within 1 kb) than genome-average, suggesting a role for G-quadruplex structures in transcription termination. Finally, a large body of evidence (reviewed in [102]) supports the conclusion that quadruplex DNA may form in the promoter region of oncogenes and elicit functional roles, such as the transcriptional inhibitory activity observed in c-MYC [103,104].

Herein, we contrast the global gene expression profile of genes that contain quadruplex-forming sequences with those that harbor triplex-forming sequences, i.e. the set of 228 genes (set 1) containing the longest (≥250 nt) R:Y tracts (Table 1) and the set of 190 genes (set 2) containing ≥18 TR units (Table 1, Supplementary Information and Supplementary Fig. 1). Analysis of the gene expression data in 70 tissues/cell lines (cancer tissues and cancer cell lines were not included) showed that for the 16,146 probe-set comprising the control genes (i.e. sets 1 and 2 excluded) the transcriptional values followed a bimodal distribution composed of two overlapping Gaussian curves (Supplementary Fig. 2), the first accounting for 75% of the data and showing high levels of gene expression (HGE) and the second comprising the remaining 25% of the data and displaying low levels of gene expression (LGE) (Fig. 2B). For comparative purposes, the HGE mean value was normalized to 1. Accordingly, the LGE mean value was 0.13 when the respective natural logarithms were transformed in raw gene expression data, a 9-fold reduction. Set 1 also displayed a bimodal distribution. However, whereas the LGE mean value did not differ from the control probe-set, the HGE distribution was shifted to significantly lower values (normalized mean = 0.73, P<0.001, an ~25% reduction relative to the control data-set mean). Similarly, set 2 displayed significant reduction in gene expression for both the HGE and the LGE distributions (normalized mean values 0.75 and 0.11, respectively; P<0.001) (Fig. 2B). Hence, genes containing long R:Y tracts with the potential to form triplex DNA structures are generally transcribed at lower levels than genes that do not contain such elements. A previous analysis [74] of the tissue-specific patterns of gene expression for set 1 after z-scoring (which normalizes the average expression of any given gene across all tissues) indicated that the highest transcriptional activity occurred in the brain. Hence, taken together, these data suggest brain-specific roles for long R:Y tracts in transcriptional regulation. Finally, these analyses reveal the contrasting transcriptional profiles of genes harboring quadruplex-forming repeats (increased transcription) and those containing triplex-forming sequences (decreased transcription).

Cruciforms and the genomic architecture of the human Y-chromosome

Sex-specific genes are clustered in the arms of IRs on the X and Y chromosomes [92]. The Y-chromosome comprises two external pseudo-autosomal (PAR1 and PAR2) regions (≤1.5 Mb) homologous to the X-chromosome and essential for chromosome segregation at meiosis, and a central male-specific segment (MSY) functionally divided into euchromatin (shorter p-arm) and heterochromatin (distal q-arm).

The euchromatin region is itself a complex mosaic of modular DNA sequences characterized by eight large (up to 1.46 Mb in length) inverted repeats, commonly referred to as palindromes 1–8, shorter inverted and direct repeats, all of which contain gene families with expression patterns specific to the testis and performing essential functions in the production and maturation of sperm [92]. Two other regions, X-transposed and X-degenerate, harbor paralogous genes with copies on the X-chromosome. Modular tandem arrays also compose the entire heterochromatic region, whose length variation caused by polymorphic tandem array repeat number confers large-scale differences to the size of Y-chromosomes in the general population (Fig. 3A). Hence, inverted and direct repeats compose most of the human Y-chromosome, thus conferring higher-order structural architectures to the primary genomic sequence.

Figure 3
(A) Y chromosome genealogical tree (left) and identified structural polymorphisms (right). Chromosomes were assigned to one of 47 branches by typing for the stable, biallelic polymorphisms indicated. Red arrows indicate major branches confined to Africa. ...

The MSY region does not have a counterpart in other chromosomes and thus it is excluded from sexual recombination. This unique behavior has prompted speculation [105] that Y-chromosome extinction is inevitable given that gene decay, consequent to naturally occurring mutations, would be irreversible. Indeed, the Y-chromosome has degenerated substantially both in size and gene content in comparison with the X chromosome. However, the ampliconic gene families nested within the palindromic arms and key to spermatogenesis have sustained much lower-than-expected mutation rates during evolutionary time [106]. For example, not only the intrapalindromic (arm-to-arm) sequences share on average >99% sequence identity, but also gene pairs located at symmetrical positions within palindromic arms are generally identical or nearly so [107]. In contrast, substantial sequence divergence exists between gene pairs belonging to the same gene families but located at different arm positions [107]. Thus, high rates of gene conversion are believed to have occurred among testis-specific genes in the human Y-chromosome, which have effectively counteracted the threat of gene decay imposed by the absence of meiotic recombination [106,108]. In fact, comparative analyses between the human and chimpanzee Y-chromosomes strongly supports the conclusion that the ampliconic gene families in palindromes have been under strong positive selective pressure, most likely because of their key role in spermatogenesis (Fig. 3A) [92].

These observations raise a number of questions. Did the inverted repeat architecture of palindromes play a critical role in shaping and preserving Y-chromosome function? How did gene conversion take place between the arms of palindromes? Several studies have been performed to address these issues. First, analyses from representative ethnic groups revealed that the IR3/IR3 region of the Y-chromosome was inverted in 16/47 cases [92]. This corresponds to a frequency of ~9.2×10−4 inversion events per father-to-son transmission, a frequency that is at least 10,000 times higher than that of single nt changes. Second, recent detailed sequence analyses of microinversions that distinguish the human and chimpanzee genomes showed that in all cases inverted repeats were present at breakpoints [109]. Therefore, whereas inverted repeats may suppress random nucleotide changes arising from within their repeating arms [107], they nevertheless represent a structural unit capable of changing genomic orientation over time. We and others [110] have proposed that large inverted repeats may promote strand exchange and form stem-loop structures, which may account for these features [109]. Accordingly (Fig. 3B), the two arms of an inverted repeat may interact and engage in a strand-exchange reaction leading to the formation of intra-strand Watson-Crick hydrogen bonded base pairs (Fig. 3B, Structure I). This gives rise to a stem-loop structure characterized by two Holliday-like junctions, one at the apex between the stem and the looped-out intervening sequence, the other at the base between the stem and the sequences flanking the inverted repeats (Fig. 3B, Structure II). Resolution of the Holliday-like junctions would yield two types of events. First, in 50% of cases the intervening sequence will invert, assuming equal rates of cleavage at the intersecting vs. non-intersecting strands (Fig. 3B, Structure III). Second, upon inversion the DNA complementary strands of the inverted repeats will contain the nucleotide that were previously located on the same DNA strand, effectively providing a means for the correction of mispairs, through mismatch or other repair pathways (Fig. 3B, Structure IV). These models (Fig 3B and [110]) offer a rationale for the observations that: a) inverted repeats mediate genomic inversions [109]; b) high rates of “gene conversion” events take place between the arms of palindromes [106]; and c) genes of the same family show a pair-wise pattern of sequence identity based upon their location at similar palindromic arm position [107]. In addition, these models provide a rationale for the formation of large stem-loop structures, including cruciforms [2428,111], for which the physiologic levels of negative supercoiling appear insufficient [112]. Finally, because strand exchange may initiate and terminate anywhere along the inverted repeat sequences, their total lengths do not impose a size constrain to stem-loop structures, which may vary in length. This contrasts with the “classic” cruciform structure (Fig. 1A), which nucleates from the apical loop.

In summary, these composite data provide empirical evidence in support of the notion that cruciforms have played a pivotal role during evolutionary time by providing a genomic structure upon which selection acted so as to preserve, and perhaps shape, the sex-specific functions of the human Y-chromosome.

Mechanisms of DNA structure-induced genomic instability

Studies using model systems suggest that instability caused by trinucleotide repeats and other non-B DNA-forming sequences may occur via aberrant DNA replication events (reviewed in [16,113,114]), as well as replication-independent mechanisms in non-proliferating tissues (reviewed in [115]). We discuss results to support both replication-dependent and replication-independent mechanisms of DNA structure-induced genetic instability below.

Replication

Human fragile sites often consist of non-B DNA-forming tandem repeats [22]. Studies of model sequences have provided links between DNA replication and fragile site instability (reviewed in [114,116]). For example, the mutation rate of hairpin-forming CAG repeats increased when the DNA polymerase zeta subunit rev1 was mutated in Saccharomyces cerevisiae [117], suggesting that the transient formation of single-strand DNA during replication and the ensuing slipped DNA structures are mutagenic. Indeed, replication slippage at repetitive sequences (e.g. CTG:CAG, GAA:TTC, CGG:CCG, and GAC:GTC) has been implicated in mutations, deletions, or expansions of repeating units, causing genetic instability related to hereditary neurological diseases (reviewed in [15]).

Replication stalling

Direct evidence for a link between replication and non-B DNA structures was provided by the ability of non-B DNA structure-forming sequences to slow replication forks. Using 2D gel electrophoretic analyses and electron microscopy, stalling of replication intermediates by trinucleotide repeats, inverted repeats of Alu elements [118], and an (A+T)-rich fragile site (FLEX1) from the human FRA16D gene [119] was detected when these elements were cloned into bacterial, yeast, and human cells. Replication attenuation was dependent on the length and/or sequence of these repeats and correlated with their capacity to form DNA secondary structures. A stalled replication fork will give rise to longer exposure of ssDNA, and may cause replication fork collapse and DSBs, which may be processed in a mutagenic fashion. DNA triplex structures can also block replication forks and cause DSBs [12,42].

Orientation of repeat sequences

Due to the differences between leading and lagging strand DNA synthesis during replication, the orientation of repeat sequences greatly influences their stability in model systems such as bacteria, yeast, and cultured mammalian cells [120123]. Most non-B DNA structure-forming trinucleotide repeats are more unstable when they serve as lagging strand templates. The instability of GAA repeats in the FRDA gene responsible for Friedreich ataxia is dependent on the orientation of DNA replication. In yeast for example, GAA repeats display nearly 100-fold higher instability on the lagging strand than on the leading strand [124]. Similarly, CTG repeats show higher levels of DNA instability when used as a template for lagging strand synthesis (to the replication origin ColE1) in a recA strain of E. coli, upon induction of DSBs [125]. A long (CTG)130 repeat from a myotonic dystrophy patient was unstable on the lagging-strand template but was stable on the leading strand template in yeast [123]. Also, the (CGG)160 repeat from the 5'-UTR region of the FMR1 gene contracts when placed as the lagging strand template in the yeast chromosome, but yields few contractions when the repeat is located in the leading strand template [120]. The strand-preference of trinucleotide repeat instability indicated that the ability to form secondary structures differs for the two complementary sequences. For example, the CTG repeats adopt a more stable hairpin structure than CAG repeats [126,127]. Hence, when CAG repeats serve as the lagging strand template, the newly synthesized complementary CTG repeats would be prone to form non-B structures that may cause repeated synthesis, resulting in expansion of the repeat [15,17]. At the same time, if the leading strand template with CTG repeats forms secondary structures, it may be bypassed and give rise to contractions within the repeat. Whereas, contractions of trinucleotide repeats are seen in many yeast and bacterial models, expansions are prevalent in human diseases [14,15,128]. The reasons for this discrepancy remain to be clarified, however transacting factors may be involved. For example, the human MSH2–MSH3 complex can bind CAG or CTG repeats [129], and knockdown of the proteins in this complex has been shown to reduce trinucleotide repeat instability [130,131]. Thus, it is possible that the MSH2–MSH3 complex might stabilize the repeats rather than processing the “mismatched” nucleotides (discussed below). Due to its strand discrimination ability, MSH2–MSH3 might then stabilize the structure formed on trinucleotide repeat tracts on the newly synthesized strand preferentially, leading to expansion events.

Replication proteins

The ability of non-B DNA structure-forming sequences to stall replication forks can be counteracted by proteins that stabilize replication forks. Studies on CGG repeats and inverted repeats in yeast indicate that the replication fork-stabilizing proteins Mrc1 and Tof1 could reduce the replication stalling effect of non-B DNA structures [118,132]. Proteins functioning in the maturation of Okazaki fragments also influence the expansion and contraction of repeat sequences. For example, mutations in yeast Rad27 (homologous to the human FEN-1 flap endonuclease 1) lead to the expansion of repeated CAG:CTG sequences and to the recombination/instability of inverted Alu elements [133,134]. The interactions among Rad27, DNA ligase I, and proliferating cell nuclear antigen (PCNA) are critical for the maintenance of CAG:CTG repeats in yeast [135]. Similar to Rad27, which prevents the expansion of trinucleotide repeats, the yeast helicase Srs2 unwinds the secondary structures formed by trinucleotide repeats, and together with post-replication repair proteins prevents the expansion of CAG:CTG repeats [136138]. However, these results demonstrating a role for Rad27 in repeat stability in yeast are not consistent with those observed in mammalian cells. For example, the CAG:CTG repeat from the Huntington locus was stable over 27 successive cell passages when FEN-1 was continuously knocked-down by siRNA [139]. Similarly, in mice, haploinsufficiency of Fen1 increases the expansion of CAG:CTG repeats at the Huntington locus but does not affect their stability at the myotonic dystrophy type 1 (DM1) locus in knock-in models [140].

Whereas DNA replication-related mechanisms may largely be responsible for non-B DNA structure-induced genomic instability in proliferating tissues, they do not account for genetic instabilities found in non-proliferative tissues (reviewed in [115]). For example, analyses of patients with Huntington disease and spinocerebellar ataxias showed instability of CAG:CTG repeats in their non-proliferative tissues, such as brain and sperm [141,142]. Similarly, H- and Z-DNA structures were found to induce large-scale deletions and rearrangements in replication-deficient HeLa cells ([18] and our unpublished results). In transgenic mice CAG repeats might expand by gap repair in germ cells without replication or recombination taking place [128,143]. In addition, the translocation of the palindromic AT-rich repeat has been shown to be independent of replication [25,111]. Several DNA repair-related mechanisms have been proposed to explain replication-independent mutagenesis events at non-B DNA conformations (reviewed in [115]).

Recognition of non-B DNA structures

Being different from the canonical B-form DNA conformation, non-B DNA structures represent distortions of the DNA double helix, including the non-B structure itself, and the non-B to B-form junctions. These distortions may be recognized as “damage” by DNA repair proteins. One consequence of such “damage” recognition is the introduction of mutations/deletions, causing genomic instability (Fig. 4). Many non-B DNA structures can lead to the generation of DSBs during DNA repair, which are critical lesions that can lead to cell death or chromosomal rearrangements [10].

Figure 4
DNA damage and non-B DNA structures. Unwrapping of a non-B DNA-forming sequence (red box) from the histone core during DNA metabolism (Step 1), facilitates the non-B DNA conformation (Step 2). The non-B DNA conformation is more susceptible to DNA damage ...

Hairpins/cruciforms

Trinucleotide repeats can form hairpins with mismatched nucleotides in the stems. This structural property may be recognized as “damage” by repair proteins. The Mre11/Rad50 complex was shown to cleave hairpins/cruciforms in a structure-specific manner [144]. Inverted repeats also generate DSBs and stimulate unequal sister-chromatid exchange in yeast [129]. Although it was not evaluated whether replication is important for DNA breakage and translocation in this case, the rate of this spontaneous exchange was reduced to ~50% in yeast strains with mutations in the mismatch repair (MMR) genes Msh2 or Msh3, suggesting a role for DNA repair in non-B DNA structure-induced mutagenesis [145]. Kirkpatrick and Petes (1997) reported that repair of 26-base loops in yeast involved both Msh2 and Rad1, suggesting that these repair proteins recognize helical distortions as “DNA damage” and remove DNA loops formed by trinucleotide repeats [146]. The absence of a functional nucleotide excision repair (NER) protein UvrA has been shown to increase the instability of long CTG repeats in E. coli [146,147]. However, conflicting results on the roles of MMR and NER repair proteins on repeat instability have been reported in human cell lines or mouse model systems [130,148150]. Thus, further studies in this area are warranted.

Z-DNA

While it is clear that Z-DNA-forming sequences can cause genetic instability in a number of organisms, the underlying mechanisms remain largely speculative (reviewed in [151]). Studies from our laboratory have demonstrated that the instability of Z-DNA-forming sequences (CG)14 results from the DSBs induced by these sequences in mammalian cells [18]. However, the mutation spectrum induced by the same (CG)14 sequence in bacteria is quite different [18]. In bacteria the predominant mutation/deletion appears to be within the CG repeat with a gain or loss of dinucleotides, likely caused by slippage events during replication. In contrast, replication was not required for the (CG)14-induced mutations in mammalian cells, where predominant mutation events were large (>50 bp) deletions [18]. It is possible that these deletions were the result of error-generating DNA repair processing events at these unusual DNA structures. Chromatin immunoprecipitation experiments showed that Z-DNA-forming (CG)14 repeats were enriched relative to B-DNA sequence controls in the precipitations with antibodies against the NER protein, XPA, and the MMR protein, MSH2 (Wang & Vasquez, unpublished data). Moreover, the mutation frequency of this Z-DNA-forming sequence was lower in XPA- or MSH2-deficient human cells than in their isogenic wild-type counterparts, suggesting that these proteins contribute to Z-DNA induced mutagenesis in human cells.

H-DNA

We have demonstrated that the naturally occurring H-DNA structure-forming sequence from the human c-MYC gene, which co-localizes with translocation breakpoints, can induce DSBs within these sequences in mammalian cells and cause genomic instability in mice [12,23]. Similarly, the instability of H-DNA structure-forming sequences from the polycystic kidney disease 1 (PKD1) gene was lower in MMR-deficient bacterial cells compared to wild type cells [10]. Our data suggest that like Z-DNA, the mutagenicity of H-DNA-forming sequences involves XPA and MSH2 (Wang & Vasquez, unpublished data). Recently, we discovered that the MMR protein complex, MSH2–MSH3 (MutSβ), cooperates with two key NER protein complexes (XPA-RPA and XPC-RAD23B) in the recognition of triplex structures in the presence of a psoralen interstrand crosslink. This interaction was enhanced up to 10-fold in the presence of a psoralen interstrand crosslink within a triplex structure compared to a psoralen interstrand crosslink within a duplex DNA substrate, suggesting that the non-B DNA structure is a strong recognition signal for both NER and MMR proteins [152].

However, binding of DNA repair proteins to non-B DNA structure-forming sequences does not always result in increased instability. In some cases, binding of MSH2 or MSH3 to the hairpin structures formed by trinucleotide repeats may prevent the structure from being processed. In yeast, the Msh2–Msh3 complex binds preferentially to the imperfect stem formed by interrupted trinucleotide repeats and blocks their expansion [153]. The human MMR protein complex MSH2–MSH3 was confirmed to preferentially bind looped-out secondary structures formed by CTG repeats, and the ATPase activity required for its repair function was decreased after binding to the non-B DNA structure-forming sequences [129].

DNA repair and non-B DNA structure-forming sequences

DNA repair processes may promote the transition from B- to non-B DNA structures. When DNA damage occurs at or near repeated sequences, the subsequent repair processes may unwrap the DNA from the chromatin, which generates negative superhelical stress and promotes the transition to non-B DNA. Alternatively, single-stranded DNA regions may form, which then allow the folding of secondary structures to take place (Fig. 4). Genetic experiments in a mouse system demonstrated that knockdown of the recombination protein Rad52 decreased the expansion of CTG repeats [154]. Introducing DSBs within the GAA repeats or within CTG repeats in E. coli resulted in deletion, but this stimulatory effect only occurred when DSBs were located within the repeats [155,156]. Similarly, more instability was seen in the processing of DSBs with a CTG repeat sequence in mammalian cells when the CTG repeat was capable of forming slipped DNA structures compared to a linear DNA control [157]. These results suggest that hairpin/cruciform structure-forming sequences may be more susceptible to deletion or rearrangement events during DNA repair in the surrounding regions.

On the other hand, the formation of DNA secondary structures near DNA damage might influence the repair processing, depending on the type of damage, the environment, and the nature of the secondary structures. For example, the Malkova group has shown that in yeast, the inverted Ty elements promote the repair of DSBs at distances of up to 30 kb from the elements by forming dicentric inverted dimers [158,159]. The existence of inverted repeats flanking a DSB is thought to channel repair from a homologous recombination pathway into a single-strand annealing-gross chromosomal rearrangements (SSA-GCR) pathway in yeast [158]. This pathway is not dependent on homologous recombination because in a rad51Δ strain, the existence of intact large inverted repeats near the DSB reduces the broken chromosomal loss from roughly 40% to ~13% [158]. Unlike inverted repeats which promote the repair of DSBs, the secondary structures formed by CTG units in a plasmid reporter system in mammalian cells showed decreased repair efficiency of the DSB within the repeat, compared to a control of linearized plasmid containing the same CTG sequence and DSB [157]. These results suggest that non-B DNA structures are able to form during DNA repair and the formation of such structures can potentially alter repair. If the non-B DNA structure-forming sequences near the damage site are processed during the repair of the lesion, they may contribute to the error-generating repair and lead to genomic instability. This notion is supported by data from patients showing that gene conversion contributes to the instability of CGG:CCG repeats in the FRAXA and CTG:CAG tracts in DM1 cases (reviewed in [160]).

Non-B DNA structures may also affect DNA repair by increasing DNA damage susceptibility and/or damage accumulation [115]. The distortion of the DNA helix and the altered arrangement of the bases and sugar moiety in non-B DNA conformations can influence the interactions of DNA damaging factors with the nucleotides, and thus modify their accessibility to DNA damage. For example, many types of non-B DNA conformations, e.g., H-DNA, B-Z junctions, hairpin and loop structures, contain single-stranded regions that are not protected by hydrogen bonding, and are often precluded from chromatin that can otherwise protect the bases. Thus, non-B DNA structures may be more accessible to DNA damaging factors than B-DNA [115]. For example, the guanines in a Z-DNA structure are more sensitive to ionizing radiation [161], and are more sensitive to oxidative damage in the single-stranded regions compared to B-form duplex DNA [162]. On the other hand, it is also possible that DNA in non-B conformations are more resistant to certain types of damaging agents, e.g., interstrand crosslinks are less likely to be formed in the single-stranded regions of non-B DNA structures than in duplex DNA.

The abnormal positioning of the bases and sugar moiety in non-B DNA conformations can also impact the function of some DNA repair proteins on damaged DNA. For example, alkylating damage such as N7-methylguanine or O6-methylguanine is not repaired as efficiently in Z-DNA as it is in B-DNA [163,164]. This topic is covered in depth in a recent review by Wang and Vasquez (2009), which describes a model of “DNA repair-stimulated non-B DNA structure formation” [115].

Concluding remarks

Since the discovery of non-B DNA structures several decades ago, these structures have been shown to influence critical genetic transactions, such as DNA replication, transcription, recombination, and repair. Our knowledge of the role of non-B DNA structures in genomic instability has recently been gained along with the progress made in understanding the DNA structural characteristics, the correlations between DNA structure and genetic diseases, and the proteins that influence the stability of DNA structures. Genome-wide analyses have greatly influenced our view on DNA structure-induced genomic plasticity and its consequence in human disease and the evolutionary changes that took place since the divergence from prokaryotes to eukaryotes. The capability of non-B DNA structures to induce mutations/deletions and to promote chromosome rearrangements gives them potential evolutionary functions; e.g., mutating to adapt to rapid changes and at the same time, keeping DNA information through recombination (in the case of the human Y chromosome mentioned above).

However, there are still many questions to be answered regarding the relationships between DNA sequence, structure, and function. For example, what environmental conditions promote non-B structure formation? What proteins function in the recognition and subsequent processing of non-B DNA structures? What proteins/pathways are involved in their error-generating repair causing genomic instability? The same trinucleotide repeat sequences in various systems do not always result in genetic instability, suggesting that DNA sequence context and/or location in the genome may be critical factors in repeat instability. In our studies, H-DNA sequences are mutagenic in mammalian cells, but are not mutagenic when introduced in bacteria, suggesting a requirement for transacting factors/proteins in a host-specific fashion for structure formation and/or processing. The observation that specific types of non-B DNA-forming sequences are enriched in gene families with particular functions, and the correlation between gene expression levels and the presence of non-B DNA-forming sequences in these gene regions, emphasizes the need to further investigate the regulatory function of repetitive elements. It is not clear whether these elements are enriched due to their regulatory function, or due to the higher mobility of unstable non-B DNA structure-forming sequences.

The current mechanisms proposed for non-B DNA-induced genetic instability include abnormal DNA replication that can explain the contraction and expansion of trinucleotide repeats in replicating systems, and processing by DNA repair proteins that contribute to replication-independent mutagenesis induced by non-B DNA structures. Many DNA repair proteins have been found to interact with non-B DNA structures in vitro; while some protein-non-B DNA interactions lead to repair processing and DNA breakage, some other proteins might stabilize the non-B DNA conformations. Furthermore, a particular protein may have different affects on non-B DNA conformation in different species. The much-needed screening for proteins that interact with non-B DNA structures is in progress and will provide more information about their recognition and structure-induced genomic instability at the molecular level. These results will help us to comprehensively understand how these DNA structures influence genome stability, DNA metabolic functions (e.g. gene function and regulation), and the balance between selection stress and adaptation to changing environmental conditions.

Supplementary Material

Supp Material

References

1. Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 1953;171:737–738. [PubMed]
2. Mirkin SM. Discovery of alternative DNA structures: a heroic decade (1979–1989) Front Biosci. 2008;13:1064–1071. [PubMed]
3. Felsenfeld G, Davies DR, Rich A. Formation of a three-stranded polynucleotide molecule. J. Am. Chem. Soc. 1957;79:2023–2024.
4. Wang AH, Quigley GJ, Kolpak FJ, Crawford JL, van Boom JH, van der Marel G, Rich A. Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Nature. 1979;282:680–686. [PubMed]
5. Lilley DM. The inverted repeat as a recognizable structural feature in supercoiled DNA molecules. Proc Natl Acad Sci U S A. 1980;77:6468–6472. [PubMed]
6. Panayotatos N, Wells RD. Cruciform structures in supercoiled DNA. Nature. 1981;289:466–470. [PubMed]
7. Lyamichev VI, Panyutin IG, Frank-Kamenetskii MD. Evidence of cruciform structures in superhelical DNA provided by two-dimensional gel electrophoresis. FEBS Lett. 1983;153:298–302. [PubMed]
8. Sen D, Gilbert W. Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature. 1988;334:364–366. [PubMed]
9. Bacolla A, Wells RD. Non-B DNA conformations, genomic rearrangements, and human disease. J Biol Chem. 2004;279:47411–47414. [PubMed]
10. Bacolla A, Jaworski A, Larson JE, Jakupciak JP, Chuzhanova N, Abeysinghe SS, O'Connell CD, Cooper DN, Wells RD. Breakpoints of gross deletions coincide with non-B DNA conformations. Proc Natl Acad Sci U S A. 2004;101:14162–14167. [PubMed]
11. Wang G, Vasquez KM. Non-B DNA structure-induced genetic instability. Mutat Res. 2006;598:103–119. [PubMed]
12. Wang G, Vasquez KM. Naturally occurring H-DNA-forming sequences are mutagenic in mammalian cells. Proc Natl Acad Sci U S A. 2004;101:13448–13453. [PubMed]
13. Glickman BW, Ripley LS. Structural intermediates of deletion mutagenesis: a role for palindromic DNA. Proc Natl Acad Sci U S A. 1984;81:512–516. [PubMed]
14. Orr HT, Zoghbi HY. Trinucleotide repeat disorders. Annu Rev Neurosci. 2007;30:575–621. [PubMed]
15. Mirkin SM. Expandable DNA repeats and human disease. Nature. 2007;447:932–940. [PubMed]
16. Lahue RS, Slater DL. DNA repair and trinucleotide repeat instability. Front Biosci. 2003;8:s653–s665. [PubMed]
17. Wells RD, Dere R, Hebert ML, Napierala M, Son LS. Advances in mechanisms of genetic instability related to hereditary neurological diseases. Nucleic Acids Res. 2005;33:3785–3798. [PMC free article] [PubMed]
18. Wang G, Christensen LA, Vasquez KM. Z-DNA-forming sequences generate large-scale deletions in mammalian cells. Proc Natl Acad Sci U S A. 2006;103:2677–2682. [PubMed]
19. Adachi M, Tsujimoto Y. Potential Z-DNA elements surround the breakpoints of chromosome translocation within the 5' flanking region of bcl-2 gene. Oncogene. 1990;5:1653–1657. [PubMed]
20. Raghavan SC, Lieber MR. Chromosomal translocations and non-B DNA structures in the human genome. Cell Cycle. 2004;3:762–768. [PubMed]
21. Raghavan SC, Chastain P, Lee JS, Hegde BG, Houston S, Langen R, Hsieh CL, Haworth IS, Lieber MR. Evidence for a triplex DNA conformation at the bcl-2 major breakpoint region of the t(14;18) translocation. J Biol Chem. 2005;280:22749–22760. [PubMed]
22. Raghavan SC, Lieber MR. DNA structures at chromosomal translocation sites. Bioessays. 2006;28:480–494. [PubMed]
23. Wang G, Carbajal S, Vijg J, DiGiovanni J, Vasquez KM. DNA structure-induced genomic instability in vivo. J Natl Cancer Inst. 2008;100:1815–1817. [PMC free article] [PubMed]
24. Kato T, Inagaki H, Yamada K, Kogo H, Ohye T, Kowa H, Nagaoka K, Taniguchi M, Emanuel BS, Kurahashi H. Genetic variation affects de novo translocation frequency. Science. 2006;311:971. [PMC free article] [PubMed]
25. Inagaki H, Ohye T, Kogo H, Kato T, Bolor H, Taniguchi M, Shaikh TH, Emanuel BS, Kurahashi H. Chromosomal instability mediated by non-B DNA: cruciform conformation and not DNA sequence is responsible for recurrent translocation in humans. Genome Res. 2009;19:191–198. [PubMed]
26. Emanuel BS. Molecular mechanisms and diagnosis of chromosome 22q11.2 rearrangements. Dev Disabil Res Rev. 2008;14:11–18. [PMC free article] [PubMed]
27. Kurahashi H, Inagaki H, Ohye T, Kogo H, Kato T, Emanuel BS. Palindrome-mediated chromosomal translocations in humans. DNA Repair (Amst) 2006;5:1136–1145. [PMC free article] [PubMed]
28. Gotter AL, Shaikh TH, Budarf ML, Rhodes CH, Emanuel BS. A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2. Hum Mol Genet. 2004;13:103–115. [PMC free article] [PubMed]
29. Bacolla A, Wells RD. Non-B DNA conformations as determinants of mutagenesis and human disease. Mol Carcinog. 2009;48:273–285. [PubMed]
30. Smith GR. Meeting DNA palindromes head-to-head. Genes Dev. 2008;22:2612–2620. [PubMed]
31. Watson J, Hays FA, Ho PS. Definitions and analysis of DNA Holliday junction geometry. Nucleic Acids Res. 2004;32:3017–3027. [PMC free article] [PubMed]
32. Sinden RR, Pettijohn DE. Cruciform transitions in DNA. J Biol Chem. 1984;259:6593–6600. [PubMed]
33. Oussatcheva EA, Pavlicek J, Sankey OF, Sinden RR, Lyubchenko YL, Potaman VN. Influence of global DNA topology on cruciform formation in supercoiled DNA. J Mol Biol. 2004;338:735–743. [PubMed]
34. Nag DK, Petes TD. Seven-base-pair inverted repeats in DNA form stable hairpins in vivo in Saccharomyces cerevisiae. Genetics. 1991;129:669–673. [PubMed]
35. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed]
36. Harvey SC. DNA structural dynamics: longitudinal breathing as a possible mechanism for the B in equilibrium Z transition. Nucleic Acids Res. 1983;11:4867–4878. [PMC free article] [PubMed]
37. Peck LJ, Nordheim A, Rich A, Wang JC. Flipping of cloned d(pCpG)n.d(pCpG)n DNA sequences from right- to left-handed helical structure by salt, Co(III), or negative supercoiling. Proc Natl Acad Sci U S A. 1982;79:4560–4564. [PubMed]
38. Singleton CK, Klysik J, Stirdivant SM, Wells RD. Left-handed Z-DNA is induced by supercoiling in physiological ionic conditions. Nature. 1982;299:312–316. [PubMed]
39. Ha SC, Lowenhaupt K, Rich A, Kim YG, Kim KK. Crystal structure of a junction between B-DNA and Z-DNA reveals two extruded bases. Nature. 2005;437:1183–1186. [PubMed]
40. Htun H, Dahlberg JE. Single strands, triple strands, and kinks in H-DNA. Science. 1988;241:1791–1796. [PubMed]
41. Wells RD. Unusual DNA structures. J Biol Chem. 1988;263:1095–1098. [PubMed]
42. Jain A, Wang G, Vasquez KM. DNA triple helices: Biological consequences and therapeutic potential. Biochimie. 2008;90:1117–1130. [PMC free article] [PubMed]
43. Majumdar A, Patel DJ. Identifying hydrogen bond alignments in multistranded DNA architectures by NMR. Acc Chem Res. 2002;35:1–11. [PubMed]
44. Sinden RR, Pytlos-Sinden MJ, Potaman VN. Slipped strand DNA structures. Front Biosci. 2007;12:4788–4799. [PubMed]
45. Chou SH, Chin KH, Wang AH. Unusual DNA duplex and hairpin motifs. Nucleic Acids Res. 2003;31:2461–2474. [PMC free article] [PubMed]
46. Wang G, Zhao J, Vasquez KM. Methods to determine DNA structural alterations and genetic instability. Methods. 2009;48:54–62. [PMC free article] [PubMed]
47. Pearson CE, Eichler EE, Lorenzetti D, Kramer SF, Zoghbi HY, Nelson DL, Sinden RR. Interruptions in the triplet repeats of SCA1 and FRAXA reduce the propensity and complexity of slipped strand DNA (S-DNA) formation. Biochemistry. 1998;37:2701–2708. [PubMed]
48. Caskey CT, Pizzuti A, Fu YH, Fenwick RG, Jr, Nelson DL. Triplet repeat mutations in human disease. Science. 1992;256:784–789. [PubMed]
49. Benton CS, de Silva R, Rutledge SL, Bohlega S, Ashizawa T, Zoghbi HY. Molecular and clinical studies in SCA-7 define a broad clinical spectrum and the infantile phenotype. Neurology. 1998;51:1081–1086. [PubMed]
50. Palecek E. Local supercoil-stabilized DNA structures. Crit Rev Biochem Mol Biol. 1991;26:151–226. [PubMed]
51. Schroth GP, Ho PS. Occurrence of potential cruciform and H-DNA forming sequences in genomic DNA. Nucleic Acids Res. 1995;23:1977–1983. [PMC free article] [PubMed]
52. Lafer EM, Moller A, Nordheim A, Stollar BD, Rich A. Antibodies specific for left-handed Z-DNA. Proc Natl Acad Sci U S A. 1981;78:3546–3550. [PubMed]
53. Nordheim A, Pardue ML, Lafer EM, Moller A, Stollar BD, Rich A. Antibodies to left-handed Z-DNA bind to interband regions of Drosophila polytene chromosomes. Nature. 1981;294:417–422. [PubMed]
54. Nordheim A, Lafer EM, Peck LJ, Wang JC, Stollar BD, Rich A. Negatively supercoiled plasmids contain left-handed Z-DNA segments as detected by specific antibody binding. Cell. 1982;31:309–318. [PubMed]
55. Lafer EM, Sousa R, Ali R, Rich A, Stollar BD. The effect of anti-Z-DNA antibodies on the B-DNA-Z-DNA equilibrium. J Biol Chem. 1986;261:6438–6443. [PubMed]
56. Agazie YM, Lee JS, Burkholder GD. Characterization of a new monoclonal antibody to triplex DNA and immunofluorescent staining of mammalian chromosomes. J Biol Chem. 1994;269:7019–7023. [PubMed]
57. Lee JS, Burkholder GD, Latimer LJ, Haug BL, Braun RP. A monoclonal antibody to triplex DNA binds to eucaryotic chromosomes. Nucleic Acids Res. 1987;15:1047–1061. [PMC free article] [PubMed]
58. Agazie YM, Burkholder GD, Lee JS. Triplex DNA in the nucleus: direct binding of triplex-specific antibodies and their effect on transcription, replication and cell growth. Biochem J. 1996;316(Pt 2):461–466. [PubMed]
59. Ohno M, Fukagawa T, Lee JS, Ikemura T. Triplex-forming DNAs in the human interphase nucleus visualized in situ by polypurine/polypyrimidine DNA probes and antitriplex antibodies. Chromosoma. 2002;111:201–213. [PubMed]
60. Brown JC, Brown BA, 2nd, Li Y, Hardin CC. Construction and characterization of a quadruplex DNA selective single-chain autoantibody from a viable motheaten mouse hybridoma with homology to telomeric DNA binding proteins. Biochemistry. 1998;37:16338–16348. [PubMed]
61. Brown BA, 2nd, Li Y, Brown JC, Hardin CC, Roberts JF, Pelsue SC, Shultz LD. Isolation and characterization of a monoclonal anti-quadruplex DNA antibody from autoimmune "viable motheaten" mice. Biochemistry. 1998;37:16325–16337. [PubMed]
62. Schaffitzel C, Berger I, Postberg J, Hanes J, Lipps HJ, Pluckthun A. In vitro generated antibodies specific for telomeric guanine-quadruplex DNA react with Stylonychia lemnae macronuclei. Proc Natl Acad Sci U S A. 2001;98:8572–8577. [PubMed]
63. Frappier L, Price GB, Martin RG, Zannis-Hadjopoulos M. Characterization of the binding specificity of two anticruciform DNA monoclonal antibodies. J Biol Chem. 1989;264:334–341. [PubMed]
64. Sinden RR. Cruciform structures in DNA and triplex DNA in DNA structure and function, pp160-164 and p241-242. San Diego: Academic Press; 1994.
65. Raghavan SC, Tsai A, Hsieh CL, Lieber MR. Analysis of non-B DNA structure at chromosomal sites in the mammalian genome. Methods Enzymol. 2006;409:301–316. [PubMed]
66. Cox R, Mirkin SM. Characteristic enrichment of DNA repeats in different genomes. Proc Natl Acad Sci U S A. 1997;94:5237–5242. [PubMed]
67. Repping S, Skaletsky H, Lange J, Silber S, Van Der Veen F, Oates RD, Page DC, Rozen S. Recombination between palindromes P5 and P1 on the human Y chromosome causes massive deletions and spermatogenic failure. Am J Hum Genet. 2002;71:906–922. [PubMed]
68. Lobachev KS, Rattray A, Narayanan V. Hairpin- and cruciform-mediated chromosome breakage: causes and consequences in eukaryotic cells. Front Biosci. 2007;12:4208–4220. [PubMed]
69. Warburton PE, Giordano J, Cheung F, Gelfand Y, Benson G. Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. Genome Res. 2004;14:1861–1869. [PubMed]
70. Khuu P, Sandor M, DeYoung J, Ho PS. Phylogenomic analysis of the emergence of GC-rich transcription elements. Proc Natl Acad Sci U S A. 2007;104:16528–16533. [PubMed]
71. Nordheim A, Rich A. Negatively supercoiled simian virus 40 DNA contains Z-DNA segments within transcriptional enhancer sequences. Nature. 1983;303:674–679. [PubMed]
72. Oh DB, Kim YG, Rich A. Z-DNA-binding proteins can act as potent effectors of gene expression in vivo. Proc Natl Acad Sci U S A. 2002;99:16666–16671. [PubMed]
73. Wong B, Chen S, Kwon JA, Rich A. Characterization of Z-DNA as a nucleosome-boundary element in yeast Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 2007;104:2229–2234. [PubMed]
74. Bacolla A, Collins JR, Gold B, Chuzhanova N, Yi M, Stephens RM, Stefanov S, Olsh A, Jakupciak JP, Dean M, Lempicki RA, Cooper DN, Wells RD. Long homopurine*homopyrimidine sequences are characteristic of genes expressed in brain and the pseudoautosomal region. Nucleic Acids Res. 2006;34:2663–2675. [PMC free article] [PubMed]
75. Todd AK, Johnston M, Neidle S. Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res. 2005;33:2901–2907. [PMC free article] [PubMed]
76. Huppert JL, Balasubramanian S. Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 2005;33:2908–2916. [PMC free article] [PubMed]
77. Sundquist WI, Klug A. Telomeric DNA dimerizes by formation of guanine tetrads between hairpin loops. Nature. 1989;342:825–829. [PubMed]
78. Williamson JR, Raghuraman MK, Cech TR. Monovalent cation-induced structure of telomeric DNA: the G-quartet model. Cell. 1989;59:871–880. [PubMed]
79. Panyutin IG, Kovalsky OI, Budowsky EI. Magnesium-dependent supercoiling-induced transition in (dG)n.(dC)n stretches and formation of a new G-structure by (dG)n strand. Nucleic Acids Res. 1989;17:8257–8271. [PMC free article] [PubMed]
80. Huppert JL, Balasubramanian S. G-quadruplexes in promoters throughout the human genome. Nucleic Acids Res. 2007;35:406–413. [PMC free article] [PubMed]
81. Du Z, Zhao Y, Li N. Genome-wide analysis reveals regulatory role of G4 DNA in gene transcription. Genome Res. 2008;18:233–241. [PubMed]
82. Du Z, Kong P, Gao Y, Li N. Enrichment of G4 DNA motif in transcriptional regulatory region of chicken genome. Biochem Biophys Res Commun. 2007;354:1067–1070. [PubMed]
83. Huppert JL, Bugaut A, Kumari S, Balasubramanian S. G-quadruplexes: the beginning and end of UTRs. Nucleic Acids Res. 2008;36:6260–6268. [PMC free article] [PubMed]
84. Sen D, Gilbert W. A sodium-potassium switch in the formation of four-stranded G4- DNA. Nature. 1990;344:410–414. [PubMed]
85. Moyzis RK, Torney DC, Meyne J, Buckingham JM, Wu JR, Burks C, Sirotkin KM, Goad WB. The distribution of interspersed repetitive DNA sequences in the human genome. Genomics. 1989;4:273–289. [PubMed]
86. Stallings RL, Torney DC, Hildebrand CE, Longmire JL, Deaven LL, Jett JH, Doggett NA, Moyzis RK. Physical mapping of human chromosomes by repetitive sequence fingerprinting. Proc Natl Acad Sci U S A. 1990;87:6218–6222. [PubMed]
87. Krontiris TG. Minisatellites and human disease. Science. 1995;269:1682–1683. [PubMed]
88. Bacolla A, Wojciechowska M, Kosmider B, Larson JE, Wells RD. The involvement of non-B DNA structures in gross chromosomal rearrangements. DNA Repair (Amst) 2006;5:1161–1170. [PubMed]
89. Bacolla A, Larson JE, Collins JR, Li J, Milosavljevic A, Stenson PD, Cooper DN, Wells RD. Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties. Genome Res. 2008;18:1545–1553. [PubMed]
90. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. [PubMed]
91. Eichler EE, Clark RA, She X. An assessment of the sequence gaps: unfinished business in a finished human genome. Nat Rev Genet. 2004;5:345–354. [PubMed]
92. Repping S, van Daalen SK, Brown LG, Korver CM, Lange J, Marszalek JD, Pyntikova T, van der Veen F, Skaletsky H, Page DC, Rozen S. High mutation rates have driven extensive structural polymorphism among human Y chromosomes. Nat Genet. 2006;38:463–467. [PubMed]
93. Huang da W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13. [PMC free article] [PubMed]
94. Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, Cooper GM, Nord AS, Kusenda M, Malhotra D, Bhandari A, Stray SM, Rippey CF, Roccanova P, Makarov V, Lakshmi B, Findling RL, Sikich L, Stromberg T, Merriman B, Gogtay N, Butler P, Eckstrand K, Noory L, Gochman P, Long R, Chen Z, Davis S, Baker C, Eichler EE, Meltzer PS, Nelson SF, Singleton AB, Lee MK, Rapoport JL, King MC, Sebat J. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science. 2008;320:539–543. [PubMed]
95. Venkatasubramanian G. Triplex DNA, human evolution and schizophrenia. Acta Neuropsychiatrica. 2009;21:100–101.
96. Alba MM, Guigo R. Comparative analysis of amino acid repeats in rodents and humans. Genome Res. 2004;14:549–554. [PubMed]
97. Faux NG, Bottomley SP, Lesk AM, Irving JA, Morrison JR, de la Banda MG, Whisstock JC. Functional insights from the distribution and role of homopeptide repeat-containing proteins. Genome Res. 2005;15:537–551. [PubMed]
98. Schaafsma D, Roscioni SS, Meurs H, Schmidt M. Monomeric G-proteins as signal transducers in airway physiology and pathophysiology. Cell Signal. 2008;20:1705–1714. [PubMed]
99. Burridge K, Wennerberg K. Rho and Rac take center stage. Cell. 2004;116:167–179. [PubMed]
100. Zhao Y, Du Z, Li N. Extensive selection for the enrichment of G4 DNA motifs in transcriptional regulatory regions of warm blooded animals. FEBS Lett. 2007;581:1951–1956. [PubMed]
101. Rawal P, Kummarasetti VB, Ravindran J, Kumar N, Halder K, Sharma R, Mukerji M, Das SK, Chowdhury S. Genome-wide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation. Genome Res. 2006;16:644–655. [PubMed]
102. Qin Y, Hurley LH. Structures, folding patterns, and functions of intramolecular DNA G-quadruplexes found in eukaryotic promoter regions. Biochimie. 2008;90:1149–1171. [PMC free article] [PubMed]
103. Siddiqui-Jain A, Grand CL, Bearss DJ, Hurley LH. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc Natl Acad Sci U S A. 2002;99:11593–11598. [PubMed]
104. Hurley LH, Von Hoff DD, Siddiqui-Jain A, Yang D. Drug targeting of the c-MYC promoter to repress gene expression via a G-quadruplex silencer element. Semin Oncol. 2006;33:498–512. [PubMed]
105. Aitken RJ, Marshall Graves JA. The future of sex. Nature. 2002;415:963. [PubMed]
106. Rozen S, Skaletsky H, Marszalek JD, Minx PJ, Cordum HS, Waterston RH, Wilson RK, Page DC. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature. 2003;423:873–876. [PubMed]
107. Bhowmick BK, Satta Y, Takahata N. The origin and evolution of human ampliconic gene families and ampliconic structure. Genome Res. 2007;17:441–450. [PubMed]
108. Hughes JF, Skaletsky H, Pyntikova T, Minx PJ, Graves T, Rozen S, Wilson RK, Page DC. Conservation of Y-linked genes during human evolution revealed by comparative sequencing in chimpanzee. Nature. 2005;437:100–103. [PubMed]
109. Kolb J, Chuzhanova NA, Hogel J, Vasquez KM, Cooper DN, Bacolla A, Kehrer-Sawatzki H. Cruciform-forming inverted repeats appear to have mediated many of the microinversions that distinguish the human and chimpanzee genomes. Chromosome Res. 2009 10.1007/s10577-009-9039-9. [PubMed]
110. Losch FO, Bredenbeck A, Hollstein VM, Walden P, Wrede P. Evidence for a large double-cruciform DNA structure on the X chromosome of human and chimpanzee. Hum Genet. 2007;122:337–343. [PubMed]
111. Kurahashi H, Inagaki H, Kato T, Hosoba E, Kogo H, Ohye T, Tsutsumi M, Bolor H, Tong M, Emanuel BS. Impaired DNA replication prompts deletions within palindromic sequences, but does not induce translocations in human cells. Hum Mol Genet. 2009 10.1093/hmg/ddp279. [PMC free article] [PubMed]
112. Sinden RR, Bat O, Kramer PR. Psoralen cross-linking as probe of torsional tension and topological domain size in vivo. Methods. 1999;17:112–124. [PubMed]
113. Pearson CE, Nichol Edamura K, Cleary JD. Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet. 2005;6:729–742. [PubMed]
114. Mirkin EV, Mirkin SM. Replication fork stalling at natural impediments. Microbiol Mol Biol Rev. 2007;71:13–35. [PMC free article] [PubMed]
115. Wang G, Vasquez KM. Models for chromosomal replication-independent non-B DNA structure-induced genetic instability. Mol Carcinog. 2009;48:286–298. [PMC free article] [PubMed]
116. Freudenreich CH. Chromosome fragility: molecular mechanisms and cellular consequences. Front Biosci. 2007;12:4911–4924. [PubMed]
117. Collins NS, Bhattacharyya S, Lahue RS. Rev1 enhances CAG.CTG repeat stability in Saccharomyces cerevisiae. DNA Repair (Amst) 2007;6:38–44. [PubMed]
118. Voineagu I, Narayanan V, Lobachev KS, Mirkin SM. Replication stalling at unstable inverted repeats: interplay between DNA hairpins and fork stabilizing proteins. Proc Natl Acad Sci U S A. 2008;105:9936–9941. [PubMed]
119. Zhang H, Freudenreich CH. An AT-rich sequence in human common fragile site FRA16D causes fork stalling and chromosome breakage in S. cerevisiae. Mol Cell. 2007;27:367–379. [PMC free article] [PubMed]
120. Balakumaran BS, Freudenreich CH, Zakian VA. CGG/CCG repeats exhibit orientation-dependent instability and orientation-independent fragility in Saccharomyces cerevisiae. Hum Mol Genet. 2000;9:93–100. [PubMed]
121. Panigrahi GB, Cleary JD, Pearson CE. In vitro (CTG)*(CAG) expansions and deletions by human cell extracts. J Biol Chem. 2002;277:13926–13934. [PubMed]
122. Kang S, Jaworski A, Ohshima K, Wells RD. Expansion and deletion of CTG repeats from human disease genes are determined by the direction of replication in E. coli. Nat Genet. 1995;10:213–218. [PubMed]
123. Freudenreich CH, Stavenhagen JB, Zakian VA. Stability of a CTG/CAG trinucleotide repeat in yeast is dependent on its orientation in the genome. Mol Cell Biol. 1997;17:2090–2098. [PMC free article] [PubMed]
124. Kim HM, Narayanan V, Mieczkowski PA, Petes TD, Krasilnikova MM, Mirkin SM, Lobachev KS. Chromosome fragility at GAA tracts in yeast depends on repeat orientation and requires mismatch repair. EMBO J. 2008;27:2896–2906. [PubMed]
125. Hebert ML, Spitz LA, Wells RD. DNA double-strand breaks induce deletion of CTG.CAG repeats in an orientation-dependent manner in Escherichia coli. J Mol Biol. 2004;336:655–672. [PubMed]
126. Mitas M. Trinucleotide repeats associated with human disease. Nucleic Acids Res. 1997;25:2245–2254. [PMC free article] [PubMed]
127. Pearson CE, Tam M, Wang YH, Montgomery SE, Dar AC, Cleary JD, Nichol K. Slipped-strand DNAs formed by long (CAG)*(CTG) repeats: slipped-out repeats and slip-out junctions. Nucleic Acids Res. 2002;30:4534–4547. [PMC free article] [PubMed]
128. Kovtun IV, McMurray CT. Features of trinucleotide repeat instability in vivo. Cell Res. 2008;18:198–213. [PubMed]
129. Owen BA, Yang Z, Lai M, Gajec M, Badger JD, 2nd, Hayes JJ, Edelmann W, Kucherlapati R, Wilson TM, McMurray CT. (CAG)(n)-hairpin DNA binds to Msh2-Msh3 and changes properties of mismatch recognition. Nat Struct Mol Biol. 2005;12:663–670. [PubMed]
130. Manley K, Shirley TL, Flaherty L, Messer A. Msh2 deficiency prevents in vivo somatic instability of the CAG repeat in Huntington disease transgenic mice. Nat Genet. 1999;23:471–473. [PubMed]
131. Lin Y, Dion V, Wilson JH. Transcription promotes contraction of CAG repeat tracts in human cells. Nat Struct Mol Biol. 2006;13:179–180. [PubMed]
132. Voineagu I, Surka CF, Shishkin AA, Krasilnikova MM, Mirkin SM. Replisome stalling and stabilization at CGG repeats, which are responsible for chromosomal fragility. Nat Struct Mol Biol. 2009;16:226–228. [PMC free article] [PubMed]
133. Lobachev KS, Stenger JE, Kozyreva OG, Jurka J, Gordenin DA, Resnick MA. Inverted Alu repeats unstable in yeast are excluded from the human genome. EMBO J. 2000;19:3822–3830. [PubMed]
134. Spiro C, Pelletier R, Rolfsmeier ML, Dixon MJ, Lahue RS, Gupta G, Park MS, Chen X, Mariappan SV, McMurray CT. Inhibition of FEN-1 processing by DNA secondary structure at trinucleotide repeats. Mol Cell. 1999;4:1079–1085. [PubMed]
135. Refsland EW, Livingston DM. Interactions among DNA ligase I, the flap endonuclease and proliferating cell nuclear antigen in the expansion and contraction of CAG repeat tracts in yeast. Genetics. 2005;171:923–934. [PubMed]
136. Bhattacharyya S, Lahue RS. Srs2 helicase of Saccharomyces cerevisiae selectively unwinds triplet repeat DNA. J Biol Chem. 2005;280:33311–33317. [PubMed]
137. Kerrest A, Anand RP, Sundararajan R, Bermejo R, Liberi G, Dujon B, Freudenreich CH, Richard GF. SRS2 and SGS1 prevent chromosomal breaks and stabilize triplet repeats by restraining recombination. Nat Struct Mol Biol. 2009;16:159–167. [PubMed]
138. Daee DL, Mertz T, Lahue RS. Postreplication repair inhibits CAG.CTG repeat expansions in Saccharomyces cerevisiae. Mol Cell Biol. 2007;27:102–110. [PMC free article] [PubMed]
139. Moe SE, Sorbo JG, Holen T. Huntingtin triplet-repeat locus is stable under long-term Fen1 knockdown in human cells. J Neurosci Methods. 2008;171:233–238. [PubMed]
140. van den Broek WJ, Nelen MR, van der Heijden GW, Wansink DG, Wieringa B. Fen1 does not control somatic hypermutability of the (CTG)(n)*(CAG)(n) repeat in a knock-in mouse model for DM1. FEBS Lett. 2006;580:5208–5214. [PubMed]
141. Chong SS, McCall AE, Cota J, Subramony SH, Orr HT, Hughes MR, Zoghbi HY. Gametic and somatic tissue-specific heterogeneity of the expanded SCA1 CAG repeat in spinocerebellar ataxia type 1. Nat Genet. 1995;10:344–350. [PubMed]
142. Telenius H, Kremer B, Goldberg YP, Theilmann J, Andrew SE, Zeisler J, Adam S, Greenberg C, Ives EJ, Clarke L, et al. Somatic and gonadal mosaicism of the Huntington disease gene CAG repeat in brain and sperm. Nat Genet. 1994;6:409–414. [PubMed]
143. Kovtun IV, McMurray CT. Trinucleotide expansion in haploid germ cells by gap repair. Nat Genet. 2001;27:407–411. [PubMed]
144. Trujillo KM, Sung P. DNA structure-specific nuclease activities in the Saccharomyces cerevisiae Rad50*Mre11 complex. J Biol Chem. 2001;276:35458–35464. [PubMed]
145. Nag DK, Fasullo M, Dong Z, Tronnes A. Inverted repeat-stimulated sister-chromatid exchange events are RAD1-independent but reduced in a msh2 mutant. Nucleic Acids Res. 2005;33:5243–5249. [PMC free article] [PubMed]
146. Kirkpatrick DT, Petes TD. Repair of DNA loops involves DNA-mismatch and nucleotide-excision repair proteins. Nature. 1997;387:929–931. [PubMed]
147. Parniewski P, Bacolla A, Jaworski A, Wells RD. Nucleotide excision repair affects the stability of long transcribed (CTG*CAG) tracts in an orientation-dependent manner in Escherichia coli. Nucleic Acids Res. 1999;27:616–623. [PMC free article] [PubMed]
148. Pelletier R, Farrell BT, Miret JJ, Lahue RS. Mechanistic features of CAG*CTG repeat contractions in cultured cells revealed by a novel genetic assay. Nucleic Acids Res. 2005;33:5667–5676. [PMC free article] [PubMed]
149. Panigrahi GB, Lau R, Montgomery SE, Leonard MR, Pearson CE. Slipped (CTG)*(CAG) repeats can be correctly repaired, escape repair or undergo error-prone repair. Nat Struct Mol Biol. 2005;12:654–662. [PubMed]
150. Savouret C, Garcia-Cordier C, Megret J, te Riele H, Junien C, Gourdon G. MSH2-dependent germinal CTG repeat expansions are produced continuously in spermatogonia from DM1 transgenic mice. Mol Cell Biol. 2004;24:629–637. [PMC free article] [PubMed]
151. Wang G, Vasquez KM. Z-DNA, an active element in the genome. Front Biosci. 2007;12:4424–4438. [PubMed]
152. Zhao J, Jain A, Iyer RR, Modrich PL, Vasquez KM. Mismatch repair and nucleotide excision repair proteins cooperate in the recognition of DNA interstrand crosslinks. Nucleic Acids Res. 2009 10.1093/nar/gkp399. [PMC free article] [PubMed]
153. Rolfsmeier ML, Dixon MJ, Lahue RS. Mismatch repair blocks expansions of interrupted trinucleotide repeats in yeast. Mol Cell. 2000;6:1501–1507. [PubMed]
154. Savouret C, Brisson E, Essers J, Kanaar R, Pastink A, te Riele H, Junien C, Gourdon G. CTG repeat instability and size variation timing in DNA repair-deficient mice. EMBO J. 2003;22:2264–2273. [PubMed]
155. Hebert ML, Wells RD. Roles of double-strand breaks, nicks, and gaps in stimulating deletions of CTG.CAG repeats by intramolecular DNA repair. J Mol Biol. 2005;353:961–979. [PubMed]
156. Pollard LM, Bourn RL, Bidichandani SI. Repair of DNA double-strand breaks within the (GAA*TTC)n sequence results in frequent deletion of the triplet-repeat sequence. Nucleic Acids Res. 2008;36:489–500. [PMC free article] [PubMed]
157. Marcadier JL, Pearson CE. Fidelity of primate cell repair of a double-strand break within a (CTG).(CAG) tract. Effect of slipped DNA structures. J Biol Chem. 2003;278:33848–33856. [PubMed]
158. Downing B, Morgan R, VanHulle K, Deem A, Malkova A. Large inverted repeats in the vicinity of a single double-strand break strongly affect repair in yeast diploids lacking Rad51. Mutat Res. 2008;645:9–18. [PMC free article] [PubMed]
159. VanHulle K, Lemoine FJ, Narayanan V, Downing B, Hull K, McCullough C, Bellinger M, Lobachev K, Petes TD, Malkova A. Inverted DNA repeats channel repair of distant double-strand breaks into chromatid fusions and chromosomal rearrangements. Mol Cell Biol. 2007;27:2601–2614. [PMC free article] [PubMed]
160. Jakupciak JP, Wells RD. Gene conversion (recombination) mediates expansions of CTG•CAG repeats. J Biol Chem. 2000;275:40003–40013. [PubMed]
161. Tartier L, Michalik V, Spotheim-Maurizot M, Rahmouni AR, Sabattier R, Charlier M. Radiolytic signature of Z-DNA. Nucleic Acids Res. 1994;22:5565–5570. [PMC free article] [PubMed]
162. Ribeiro DT, Madzak C, Sarasin A, Di Mascio P, Sies H, Menck CF. Singlet oxygen induced DNA damage and mutagenicity in a single-stranded SV40-based shuttle vector. Photochem Photobiol. 1992;55:39–45. [PubMed]
163. Lagravere C, Malfoy B, Leng M, Laval J. Ring-opened alkylated guanine is not repaired in Z-DNA. Nature. 1984;310:798–800. [PubMed]
164. Boiteux S, Costa de Oliveira R, Laval J. The Escherichia coli O6-methylguanine-DNA methyltransferase does not repair promutagenic O6-methylguanine residues when present in Z-DNA. J Biol Chem. 1985;260:8711–8715. [PubMed]