Meiotic recombination, which promotes proper homologous chromosome segregation at the first meiotic division, normally occurs between allelic sequences on homologues. However, recombination can also take place between non-allelic DNA segments that share high sequence identity. Such non-allelic homologous recombination (NAHR) can markedly alter genome architecture during gametogenesis by generating chromosomal rearrangements. Indeed, NAHR-mediated deletions, duplications, inversions and other alterations have been implicated in numerous human genetic disorders. Studies in yeast have revealed insights into the molecular mechanisms of meiotic NAHR as well as the cellular strategies that limit NAHR.
We demonstrated previously that 75% of infertile men with round, acrosomeless spermatozoa (globozoospermia) had a homozygous 200-Kb deletion removing the totality of DPY19L2. We showed that this deletion occurred by Non-Allelic Homologous Recombination (NAHR) between two homologous 28-Kb Low Copy Repeats (LCRs) located on each side of the gene. The accepted NAHR model predicts that inter-chromatid and inter-chromosome NAHR create a deleted and a duplicated recombined allele, while intra-chromatid events only generate deletions. Therefore more deletions are expected to be produced de novo. Surprisingly, array CGH data show that, in the general population, DPY19L2 duplicated alleles are approximately three times as frequent as deleted alleles. In order to shed light on this paradox, we developed a sperm-based assay to measure the de novo rates of deletions and duplications at this locus. As predicted by the NAHR model, we identified an excess of de novo deletions over duplications. We calculated that the excess of de novo deletion was compensated by evolutionary loss, whereas duplications, not subjected to selection, increased gradually. Purifying selection against sterile, homozygous deleted men may be sufficient for this compensation, but heterozygously deleted men might also suffer a small fitness penalty. The recombined alleles were sequenced to pinpoint the localisation of the breakpoints. We analysed a total of 15 homozygous deleted patients and 17 heterozygous individuals carrying either a deletion (n = 4) or a duplication (n = 13). All but two alleles fell within a 1.2-Kb region central to the 28-Kb LCR, indicating that >90% of the NAHR took place in that region. We showed that a PRDM9 13-mer recognition sequence is located right in the centre of that region. Our results therefore strengthen the link between this consensus sequence and the occurrence of NAHR.
We demonstrated previously that most men with globozoospermia, who produce only round acrosomeless spermatozoa and are 100% infertile, had a homozygous deletion removing the totality of DPY19L2. We also showed that this deletion occurred by Non-Allelic Homologous Recombination (NAHR). NAHR results in the production of deletions and duplications of regions encompassed by two homologous sequences, normally with a higher occurrence of deletions over duplications. Analysis of public databases at the DPY19L2 locus paradoxically revealed that, in the general population, duplications were approximately three times as frequent as deletions. Analysis of sperm DNA permits us to quantify de novo events that take place during male meiosis. We therefore measured the rates of de novo deletion and duplication in the sperm of three healthy donors. As predicted by the NAHR theoretical model and contrary to the allelic frequency observed in the general population, we identified an approximate 2-fold excess of deletions over duplications. We calculated that the measured rate of de novo deletion was compensated by evolutionary loss, whereas duplications, not subjected to selection, increased gradually. Purifying selection against infertile homozygous deleted men may be sufficient for this compensation, or heterozygously deleted men may also suffer a small fitness penalty.
Tandem direct duplications are a common feature of the genomes of eukaryotes ranging from yeast to human, where they comprise a significant fraction of copy number variations. The prevailing model for the formation of tandem direct duplications is non-allelic homologous recombination (NAHR). Here we report the isolation of a series of duplications and reciprocal deletions isolated de novo from a maize allele containing two Class II Ac/Ds transposons. The duplication/deletion structures suggest that they were generated by alternative transposition reactions involving the termini of two nearby transposable elements. The deletion/duplication breakpoint junctions contain 8 bp target site duplications characteristic of Ac/Ds transposition events, confirming their formation directly by an alternative transposition mechanism. Tandem direct duplications and reciprocal deletions were generated at a relatively high frequency (∼0.5 to 1%) in the materials examined here in which transposons are positioned nearby each other in appropriate orientation; frequencies would likely be much lower in other genotypes. To test whether this mechanism may have contributed to maize genome evolution, we analyzed sequences flanking Ac/Ds and other hAT family transposons and identified three small tandem direct duplications with the structural features predicted by the alternative transposition mechanism. Together these results show that some class II transposons are capable of directly inducing tandem sequence duplications, and that this activity has contributed to the evolution of the maize genome.
The recent explosion of genome sequence data has greatly increased the need to understand the forces that shape eukaryotic genomes. A common feature of higher plant genomes is the presence of large numbers of duplications, often occurring as tandem repeats of thousands of base pairs. Despite the importance of gene duplications in evolution and disease, the precise mechanism(s) that generate tandem duplications are still unclear. In this study we identified nine new spontaneous duplications that arose flanking elements of the Ac transposon system. These duplications range in size from 8 kbp to >5,000 kbp, and all cases exhibit features characteristic of Ac transposition. Using similar criteria in a bioinformatics search, we identified three smaller duplications adjacent to other hAT family transposons in the maize B73 reference genome sequence. Our results show that transposable elements can directly generate tandem duplications via alternative transposition, and that this mechanism is responsible for at least some of the duplications present in the maize B73 genome. This work extends the significance of Barbara McClintock's discovery of transposable elements by demonstrating how they can act as agents of genome expansion.
Segmental duplications (SDs) on 22q11.2 (LCR22), serve as substrates for meiotic non-allelic homologous recombination (NAHR) events resulting in several clinically significant genomic disorders.
To understand the duplication activity leading to the complicated SD structure of this region, we have applied the A-Bruijn graph algorithm to decompose the 22q11.2 SDs to 523 fundamental duplication sequences, termed subunits. Cross-species syntenic analysis of primate genomes demonstrates that many of these LCR22 subunits emerged very recently, especially those implicated in human genomic disorders. Some subunits have expanded more actively than others, and young Alu SINEs, are associated much more frequently with duplicated sequences that have undergone active expansion, confirming their role in mediating recombination events. Many copy number variations (CNVs) exist on 22q11.2, some flanked by SDs. Interestingly, two chromosome breakpoints for 13 CNVs (mean length 65 kb) are located in paralogous subunits, providing direct evidence that SD subunits could contribute to CNV formation. Sequence analysis of PACs or BACs identified extra CNVs, specifically, 10 insertions and 18 deletions within 22q11.2; four were more than 10 kb in size and most contained young AluYs at their breakpoints.
Our study indicates that AluYs are implicated in the past and current duplication events, and moreover suggests that DNA rearrangements in 22q11.2 genomic disorders perhaps do not occur randomly but involve both actively expanded duplication subunits and Alu elements.
Rearrangements of our genome can be responsible for inherited as well as sporadic traits. The analyses of chromosome breakpoints in the proximal short arm of Chromosome 17 (17p) reveal nonallelic homologous recombination (NAHR) as a major mechanism for recurrent rearrangements whereas nonhomologous end-joining (NHEJ) can be responsible for many of the nonrecurrent rearrangements. Genome architectural features consisting of low-copy repeats (LCRs), or segmental duplications, can stimulate and mediate NAHR, and there are hotspots for the crossovers within the LCRs. Rearrangements introduce variation into our genome for selection to act upon and as such serve an evolutionary function analogous to base pair changes. Genomic rearrangements may cause Mendelian diseases, produce complex traits such as behaviors, or represent benign polymorphic changes. The mechanisms by which rearrangements convey phenotypes are diverse and include gene dosage, gene interruption, generation of a fusion gene, position effects, unmasking of recessive coding region mutations (single nucleotide polymorphisms, SNPs, in coding DNA) or other functional SNPs, and perhaps by effects on transvection.
We report 24 unrelated individuals with deletions and 17 additional cases with duplications at 10q11.21q21.1 identified by chromosomal microarray analysis. The rearrangements range in size from 0.3 to 12 Mb. Nineteen of the deletions and eight duplications are flanked by large, directly oriented segmental duplications of >98% sequence identity, suggesting that nonallelic homologous recombination (NAHR) caused these genomic rearrangements. Nine individuals with deletions and five with duplications have additional copy number changes. Detailed clinical evaluation of 20 patients with deletions revealed variable clinical features, with developmental delay (DD) and/or intellectual disability (ID) as the only features common to a majority of individuals. We suggest that some of the other features present in more than one patient with deletion, including hypotonia, sleep apnea, chronic constipation, gastroesophageal and vesicoureteral refluxes, epilepsy, ataxia, dysphagia, nystagmus, and ptosis may result from deletion of the CHAT gene, encoding choline acetyltransferase, and the SLC18A3 gene, mapping in the first intron of CHAT and encoding vesicular acetylcholine transporter. The phenotypic diversity and presence of the deletion in apparently normal carrier parents suggest that subjects carrying 10q11.21q11.23 deletions may exhibit variable phenotypic expressivity and incomplete penetrance influenced by additional genetic and nongenetic modifiers.
CHAT; SLC18A3; genomic rearrangement; array CGH
Recombination between homologous, but non-allelic, stretches of DNA such as gene families, segmental duplications and repeat elements is an important source of mutation. In humans, recent studies have identified short DNA motifs that both determine the location of 40 per cent of meiotic cross-over hotspots and are significantly enriched at the breakpoints of recurrent non-allelic homologous recombination (NAHR) syndromes. Unexpectedly, the most highly penetrant form of the motif occurs on the background of an inactive repeat element family (THE1 elements) and the motif also has strong recombinogenic activity on currently active element families including Alu and LINE2 elements. Analysis of genetic variation among members of these repeat families indicates an important role for NAHR in their evolution. Given the potential for double-strand breaks within repeat DNA to cause pathological rearrangement, the association between repeats and hotspots is surprising. Here we consider possible explanations for why selection acting against NAHR has not eliminated hotspots from repeat DNA including mechanistic constraints, possible benefits to repeat DNA from recruiting hotspots and rapid evolution of the recombination machinery. I suggest that rapid evolution of hotspot motifs may, surprisingly, tend to favour sequences present in repeat DNA and outline the data required to differentiate between hypotheses.
recombination hotspot; mutation; repeat element
Genomic disorders are often caused by non-allelic homologous recombination between segmental duplications. Chromosome 16 is especially rich in a chromosome-specific low copy repeat, termed LCR16.
Methods and Results:
A bacterial artificial chromosome (BAC) array comparative genome hybridisation (CGH) screen of 1027 patients with mental retardation and/or multiple congenital anomalies (MR/MCA) was performed. The BAC array CGH screen identified five patients with deletions and five with apparently reciprocal duplications of 16p13 covering 1.65 Mb, including 15 RefSeq genes. In addition, three atypical rearrangements overlapping or flanking this region were found. Fine mapping by high-resolution oligonucleotide arrays suggests that these deletions and duplications result from non-allelic homologous recombination (NAHR) between distinct LCR16 subunits with >99% sequence identity. Deletions and duplications were either de novo or inherited from unaffected parents. To determine whether these imbalances are associated with the MR/MCA phenotype or whether they might be benign variants, a population of 2014 normal controls was screened. The absence of deletions in the control population showed that 16p13.11 deletions are significantly associated with MR/MCA (p = 0.0048). Despite phenotypic variability, common features were identified: three patients with deletions presented with MR, microcephaly and epilepsy (two of these had also short stature), and two other deletion carriers ascertained prenatally presented with cleft lip and midline defects. In contrast to its previous association with autism, the duplication seems to be a common variant in the population (5/1682, 0.29%).
These findings indicate that deletions inherited from clinically normal parents are likely to be causal for the patients’ phenotype whereas the role of duplications (de novo or inherited) in the phenotype remains uncertain. This difference in knowledge regarding the clinical relevance of the deletion and the duplication causes a paradigm shift in (cyto)genetic counselling.
Identical sequences with a minimal length of about 300 base pairs (bp) have been involved in the generation of various meiotic/mitotic genomic rearrangements through non-allelic homologous recombination (NAHR) events. Genomic disorders and structural variation, together with gene remodelling processes have been associated with many of these rearrangements. Based on these observations, we identified and integrated all the 100% identical repeats of at least 300 bp in the NCBI version 36.2 human genome reference assembly into non-overlapping regions, thus defining the Identical Repeated Backbone (IRB) of the reference human genome.
The IRB sequences are distributed all over the genome in 66,600 regions, which correspond to ~2% of the total NCBI human genome reference assembly. Important structural and functional elements such as common repeats, segmental duplications, and genes are contained in the IRB. About 80% of the IRB bp overlap with known copy-number variants (CNVs). By analyzing the genes embedded in the IRB, we were able to detect some identical genes not previously included in the Ensembl release 50 annotation of human genes. In addition, we found evidence of IRB gene copy-number polymorphisms in raw sequence reads of two diploid sequenced genomes.
In general, the IRB offers new insight into the complex organization of the identical repeated sequences of the human genome. It provides an accurate map of potential NAHR sites which could be used in targeting the study of novel CNVs, predicting DNA copy-number variation in newly sequenced genomes, and improve genome annotation.
Retrotransposons have been suggested to provide a substrate for non-allelic homologous recombination (NAHR) and thereby promote gene family expansion. Their precise role, however, is controversial. Here we ask whether retrotransposons contributed to the recent expansions of the Androgen-binding protein (Abp) gene families that occurred independently in the mouse and rat genomes.
Using dot plot analysis, we found that the most recent duplication in the Abp region of the mouse genome is flanked by L1Md_T elements. Analysis of the sequence of these elements revealed breakpoints that are the relicts of the recombination that caused the duplication, confirming that the duplication arose as a result of NAHR using L1 elements as substrates. L1 and ERVII retrotransposons are considerably denser in the Abp regions than in one Mb flanking regions, while other repeat types are depleted in the Abp regions compared to flanking regions. L1 retrotransposons preferentially accumulated in the Abp gene regions after lineage separation and roughly followed the pattern of Abp gene expansion. By contrast, the proportion of shared vs. lineage-specific ERVII repeats in the Abp region resembles the rest of the genome.
We confirmed the role of L1 repeats in Abp gene duplication with the identification of recombinant L1Md_T elements at the edges of the most recent mouse Abp gene duplication. High densities of L1 and ERVII repeats were found in the Abp gene region with abrupt transitions at the region boundaries, suggesting that their higher densities are tightly associated with Abp gene duplication. We observed that the major accumulation of L1 elements occurred after the split of the mouse and rat lineages and that there is a striking overlap between the timing of L1 accumulation and expansion of the Abp gene family in the mouse genome. Establishing a link between the accumulation of L1 elements and the expansion of the Abp gene family and identification of an NAHR-related breakpoint in the most recent duplication are the main contributions of our study.
House mouse; Gene duplication; Androgen-binding protein; LINE1; ERVII; NAHR
We have investigated four ~1.6-Mb microduplications and 55 smaller 350–680-kb microduplications at 15q13.2–q13.3 involving the CHRNA7 gene that were detected by clinical microarray analysis. Applying high-resolution array-CGH, we mapped all 118 chromosomal breakpoints of these microduplications. We also sequenced 26 small microduplication breakpoints that were clustering at hotspots of nonallelic homologous recombination (NAHR). All four large microduplications likely arose by NAHR between BP4 and BP5 LCRs, and 54 small microduplications arose by NAHR between two CHRNA7-LCR copies. We identified two classes of ~1.6-Mb microduplications and five classes of small microduplications differing in duplication size, and show that they duplicate the entire CHRNA7. We propose that size differences among small microduplications result from preexisting heterogeneity of the common BP4–BP5 inversion. Clinical data and family histories of 11 patients with small microduplications involving CHRNA7 suggest that these microduplications might be associated with developmental delay/mental retardation, muscular hypotonia, and a variety of neuropsychiatric disorders. However, we conclude that these microduplications and their associated potential for increased dosage of the CHRNA7-encoded α7 subunit of nicotinic acetylcholine receptors are of uncertain clinical significance at present. Nevertheless, if they prove to have a pathological effects, their high frequency could make them a common risk factor for many neurobehavioral disorders.
microduplication; CHRNA7; NAHR; hypotonia; autism spectrum disorder
Copy number studies have led to an explosion in the discovery of new segmental duplication-mediated deletions and duplications. We have analyzed copy number changes in 2419 patients referred for clinical array comparative genomic hybridization studies. Twenty-three percent of the abnormal copy number changes we found are immediately flanked by segmental duplications ≥10 kb in size and ≥95% identical in direct orientation, consistent with deletions and duplications generated by non-allelic homologous recombination. Here, we describe copy number changes in five previously unreported loci with genomic organization characteristic of NAHR-mediated gains and losses; namely, 2q11.2, 7q36.1, 17q23, 2q13 and 7q11.21. Deletions and duplications of 2q11.2, deletions of 7q36.1 and deletions of 17q23 are interpreted as pathogenic based on their genomic size, gene content, de novo inheritance and absence from control populations. The clinical significance of 2q13 deletions and duplications is still emerging, as these imbalances are also found in phenotypically normal family members and control individuals. Deletion of 7q11.21 is a benign copy number change well represented in control populations and copy number variation databases. Here, we discuss the genetic factors that can modify the phenotypic expression of such gains and losses, which likely play a role in these and other recurrent genomic disorders.
Several recurrent, constitutional genomic disorders are present on chromosome 22q. These include the translocations and deletions associated with DiGeorge and velocardiofacial syndrome and the translocations that give rise to the recurrent t(11;22) supernumerary der(22) syndrome (Emanuel syndrome). The rearrangement breakpoints on 22q cluster around the chromosome-specific segmental duplications of proximal 22q11, which are involved in the etiology of these disorders. While the deletions are the result of nonallelic homologous recombination (NAHR) between low copy repeats or segmental duplications within 22q11, the t(11;22) is the result of rearrangement between palindromic AT-rich repeats on 11q and 22q. Here we describe the mechanisms responsible for these recurrent rearrangements, discuss the recurrent deletion endpoints that are the result of NAHR between chromosome 22q specific low copy repeats as well as present current diagnostic approaches to deletion detection.
22q11.2 rearrangement mechanisms; segmental duplications; 22q11.2 deletion diagnosis
The human Y chromosome shows frequent structural variants, some of which are selectively neutral, while others cause impaired fertility due to the loss of spermatogenic genes. The large-scale use of multiple Y-chromosomal microsatellites in forensic and population genetic studies can reveal such variants, through the absence or duplication of specific markers in haplotypes. We describe Y chromosomes in apparently normal males carrying null and duplicated alleles at the microsatellite DYS448, which lies in the proximal part of the azoospermia factor c (AZFc) region, important in spermatogenesis, and made up of “ampliconic” repeats that act as substrates for nonallelic homologous recombination (NAHR). Physical mapping in 26 DYS448 deletion chromosomes reveals that only three cases belong to a previously described class, representing independent occurrences of an~1.5-Mb deletion mediated by recombination between the b1 and b3 repeat units. The remainder belong to five novel classes; none appears to be mediated through homologous recombination, and all remove some genes, but are likely to be compatible with normal fertility. A combination of deletion analysis with binary-marker and microsatellite haplotyping shows that the 26 deletions represent nine independent events. Nine DYS448 duplication chromosomes can be explained by four independent events. Some lineages have risen to high frequency in particular populations, in particular a deletion within haplogroup (hg) C*(xC3a,C3c) found in 18 Asian males. The nonrandom phylogenetic distribution of duplication and deletion events suggests possible structural predisposition to such mutations in hgs C and G. Hum Mutat 29(10), 1171–1180, 2008.
Y chromosome; AZFc; microsatellite; deletion; duplication
Genomic rearrangements describe gross DNA changes of the size ranging from a couple of hundred base pairs, the size of an average exon, to megabases (Mb). When greater than 3 to 5 Mb, such changes are usually visible microscopically by chromosome studies. Human diseases that result from genomic rearrangements have been called genomic disorders. Three major mechanisms have been proposed for genomic rearrangements in the human genome. Non-allelic homologous recombination (NAHR) is mostly mediated by low-copy repeats (LCRs) with recombination hotspots, gene conversion and apparent minimal efficient processing segments. NAHR accounts for most of the recurrent rearrangements: those that share a common size, show clustering of breakpoints, and recur in multiple individuals. Non-recurrent rearrangements are of different sizes in each patient, but may share a smallest region of overlap whose change in copy number may result in shared clinical features among different patients. LCRs do not mediate, but may stimulate non-recurrent events. Some rare NAHRs can also be mediated by highly homologous repetitive sequences (for example, Alu, LINE); these NAHRs account for some of the non-recurrent rearrangements. Other non-recurrent rearrangements can be explained by non-homologous end-joining (NHEJ) and the Fork Stalling and Template Switching (FoSTeS) models. These mechanisms occur both in germ cells, where the rearrangements can be associated with genomic disorders, and in somatic cells in which such genomic rearrangements can cause disorders such as cancer. NAHR, NHEJ and FoSTeS probably account for the majority of genomic rearrangements in our genome and the frequency distribution of the three at a given locus may partially reflect the genomic architecture in proximity to that locus. We provide a review of the current understanding of these three models.
Inverse paralogous low-copy repeats (IP-LCRs) can cause genome instability by nonallelic homologous recombination (NAHR)-mediated balanced inversions. When disrupting a dosage-sensitive gene(s), balanced inversions can lead to abnormal phenotypes. We delineated the genome-wide distribution of IP-LCRs >1 kB in size with >95% sequence identity and mapped the genes, potentially intersected by an inversion, that overlap at least one of the IP-LCRs. Remarkably, our results show that 12.0% of the human genome is potentially susceptible to such inversions and 942 genes, 99 of which are on the X chromosome, are predicted to be disrupted secondary to such an inversion! In addition, IP-LCRs larger than 800 bp with at least 98% sequence identity (duplication/triplication facilitating IP-LCRs, DTIP-LCRs) were recently implicated in the formation of complex genomic rearrangements with a duplication-inverted triplication–duplication (DUP-TRP/INV-DUP) structure by a replication-based mechanism involving a template switch between such inverted repeats. We identified 1,551 DTIP-LCRs that could facilitate DUP-TRP/INV-DUP formation. Remarkably, 1,445 disease-associated genes are at risk of undergoing copy-number gain as they map to genomic intervals susceptible to the formation of DUP-TRP/INV-DUP complex rearrangements. We implicate inverted LCRs as a human genome architectural feature that could potentially be responsible for genomic instability associated with many human disease traits.
segmental duplications; inverted repeats; genomic inversions; MMBIR
Charcot-Marie-Tooth type 1 disease (CMT1) and hereditary neuropathy with liability to pressure palsies (HNPP) are common inherited disorders of the peripheral nervous system. The majority of CMT1 patients have a 1.5Mb tandem duplication (CMT1A) in chromosome 17p11.2 while most HNPP patients have a deletion of the same 1.5 Mb region. The CMT1A duplication and HNPP deletion are the reciprocal products of an unequal crossing over event between misaligned flanking CMT1A-REP elements. We analysed 162 unrelated CMT1A duplication patients and HNPP deletion patients from 11 different countries for the presence of a recombination hotspot in the CMT1A-REP sequences. A hotspot for unequal crossing over between the misaligned flanking CMT1A-REP elements was observed through the detection of novel junction fragments in 76.9% of 130 unrelated CMT1A patients and in 71.9% of 32 unrelated HNPP patients. This recombination hotspot was also detected in eight out of 10 de novo CMT1A duplication and in two de novo HNPP deletion patients. These data indicate that the hotspot of unequal crossing over occurs in several populations independently of ethnic background and is directly involved in the pathogenesis of CMT1A and HNPP. We conclude that the detection of junction fragments from the CMT1A-REP element on Southern blot analysis is a simple and reliable DNA diagnostic tool for the identification of the CMT1A duplication and HNPP deletion in most patients.
Hotspots regulate the position and frequency of Spo11 (Rec12)-initiated meiotic recombination, but paradoxically they are suicidal and are somehow resurrected elsewhere in the genome. After the DNA sequence-dependent activation of hotspots was discovered in fission yeast, nearly two decades elapsed before the key realizations that (A) DNA site-dependent regulation is broadly conserved and (B) individual eukaryotes have multiple different DNA sequence motifs that activate hotspots. From our perspective, such findings provide a conceptually straightforward solution to the hotspot paradox and can explain other, seemingly complex features of meiotic recombination. We describe how a small number of single-base-pair substitutions can generate hotspots de novo and dramatically alter their distribution in the genome. This model also shows how equilibrium rate kinetics could maintain the presence of hotspots over evolutionary timescales, without strong selective pressures invoked previously, and explains why hotspots localize preferentially to intergenic regions and introns. The model is robust enough to account for all hotspots of humans and chimpanzees repositioned since their divergence from the latest common ancestor.
The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR–mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.
The human genome contains many loci with high incidence of structural mutations, including insertions and deletions of chromosomal segments. This excessive mutability has accelerated evolution and contributed to human disease but has yet to be explained. Segments of DNA repeated in low-copy numbers (LCRs) have been previously implicated in promoting structural mutability in specific disease-associated loci. Lack of methylation (hypomethylation) of genomic DNA has been previously associated with high structural mutability in gibbons and in human cancer cells, but the association with structural mutability in the human germline has not been explored prior to this study. Our analyses confirm the role of LCRs in promoting structural mutability on the genome scale but also reveal a surprisingly strong association of genomic instability with hypomethylation. Specifically, evolutionary analyses reveal that methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in human sperm, harbor a tenfold higher number of structural mutations than genome-wide average. Moreover, the structural mutations in individuals diagnosed with schizophrenia, bipolar disorder, developmental delay, and autism are significantly more concentrated within hypomethylated regions. Our findings suggest a new connection between methylation of genomic DNA, selective structural mutability, evolution, and human disease.
Genome rearrangements often result from non-allelic homologous recombination (NAHR) between repetitive DNA elements dispersed throughout the genome. Here we systematically analyze NAHR between Ty retrotransposons using a genome-wide approach that exploits unique features of Saccharomyces cerevisiae purebred and Saccharomyces cerevisiae/Saccharomyces bayanus hybrid diploids. We find that DNA double-strand breaks (DSBs) induce NAHR–dependent rearrangements using Ty elements located 12 to 48 kilobases distal to the break site. This break-distal recombination (BDR) occurs frequently, even when allelic recombination can repair the break using the homolog. Robust BDR–dependent NAHR demonstrates that sequences very distal to DSBs can effectively compete with proximal sequences for repair of the break. In addition, our analysis of NAHR partner choice between Ty repeats shows that intrachromosomal Ty partners are preferred despite the abundance of potential interchromosomal Ty partners that share higher sequence identity. This competitive advantage of intrachromosomal Tys results from the relative efficiencies of different NAHR repair pathways. Finally, NAHR generates deleterious rearrangements more frequently when DSBs occur outside rather than within a Ty repeat. These findings yield insights into mechanisms of repeat-mediated genome rearrangements associated with evolution and cancer.
The human genome is structurally dynamic, frequently undergoing loss, duplication, and rearrangement of large chromosome segments. These structural changes occur both in normal and in cancerous cells and are thought to cause both benign and deleterious changes in cell function. Many of these structural alterations are generated when two dispersed repeated DNA sequences at non-allelic sites recombine during non-allelic homologous recombination (NAHR). Here we study NAHR on a genome-wide scale using the experimentally tractable budding yeast as a eukaryotic model genome with its fully sequenced family of repeated DNA elements, the Ty retrotransposons. With our novel system, we simultaneously measure the effects of known recombination parameters on the frequency of NAHR to understand which parameters most influence the occurrence of rearrangements between repetitive sequences. These findings provide a basic framework for interpreting how structural changes observed in the human genome may have arisen.
Non-allelic homologous recombination (NAHR) between segmental duplications in proximal chromosome 15q breakpoint (BP) regions can lead to microdeletions and microduplications. Several individuals with deletions flanked by BP3 and BP4 on 15q13, immediately distal to, and not including the Prader–Willi/Angelman syndrome (PW/AS) critical region and proximal to the BP4–BP5 15q13.3 microdeletion syndrome region, have been reported; however, because the deletion has also been found in normal relatives, the significance of these alterations is unclear. We have identified six individuals with deletions limited to the BP3–BP4 interval and an additional four individuals with deletions of the BP3–BP5 interval from 34 046 samples submitted for clinical testing by microarray-based comparative genomic hybridization (aCGH). Of four individuals with BP3–BP4 deletions for whom parental testing was conducted, two were apparently de novo and two were maternally inherited. A comparison of clinical features, available for five individuals in our study (four with deletions within BP3–BP4 and one with a BP3-BP5 deletion), with those in the literature show common features of short stature and/or failure to thrive, microcephaly, hypotonia, and premature breast development in some individuals. Although the BP3–BP4 deletion does not yet demonstrate statistically significant enrichment in abnormal populations compared with control populations, the presence of common clinical features among probands and the presence of genes with roles in development and nervous system function in the deletion region suggest that this deletion may have a role in abnormal phenotypes in some individuals.
15q13; segmental duplication; microdeletion; genotype–phenotype
Genomic instability is a feature of the human Xp22.31 region wherein deletions are associated with X-linked ichthyosis, mental retardation and attention deficit hyperactivity disorder. A putative homologous recombination hotspot motif is enriched in low copy repeats that mediate recurrent deletion at this locus. To date, few efforts have focused on copy number gain at Xp22.31. However, clinical testing revealed a high incidence of duplication of Xp22.31 in subjects ascertained and referred with neurobehavioral phenotypes. We systematically studied 61 unrelated subjects with rearrangements revealing gain in copy number, using multiple molecular assays. We detected not only the anticipated recurrent and simple nonrecurrent duplications, but also unexpectedly identified recurrent triplications and other complex rearrangements. Breakpoint analyses enabled us to surmise the mechanisms for many of these rearrangements. The clinical significance of the recurrent duplications and triplications were assessed using different approaches. We cannot find any evidence to support pathogenicity of the Xp22.31 duplication. However, our data suggest that the Xp22.31 duplication may serve as a risk factor for abnormal phenotypes. Our findings highlight the need for more robust Xp22.31 triplication detection in that such further gain may be more penetrant than the duplications. Our findings reveal the distribution of different mechanisms for genomic duplication rearrangements at a given locus, and provide insights into aspects of strand exchange events between paralogous sequences in the human genome.
The RCCX region is a complex, multiallelic, tandem copy number variation (CNV). Two complete genes, complement component 4 (C4) and steroid 21-hydroxylase (CYP21A2, formerly CYP21B), reside in its variable region. RCCX is prone to nonallelic homologous recombination (NAHR) such as unequal crossover, generating duplications and deletions of RCCX modules, and gene conversion. A series of allele-specific long-range polymerase chain reaction coupled to the whole-gene sequencing of CYP21A2 was developed for molecular haplotyping. By means of the developed techniques, 35 different kinds of CYP21A2 haplotype variant were experimentally determined from 112 unrelated European subjects. The number of the resolved CYP21A2 haplotype variants was increased to 61 by bioinformatic haplotype reconstruction. The CYP21A2 haplotype variants could be assigned to the haplotypic RCCX CNV structures (the copy number of RCCX modules) in most cases. The genealogy network constructed from the CYP21A2 haplotype variants delineated the origin of RCCX structures. The different RCCX structures were located in tight groups. The minority of groups with identical RCCX structure occurred once in the network, implying monophyletic origin, but the majority of groups occurred several times and in different locations, indicating polyphyletic origin. The monophyletic groups were often created by single unequal crossover, whereas recurrent unequal crossover events generated some of the polyphyletic groups. As a result of recurrent NAHR events, more CYP21A2 haplotype variants with different allele patterns belonged to the same RCCX structure. The intraspecific evolution of RCCX CNV described here has provided a reasonable expectation for that of complex, multiallelic, tandem CNVs in humans.
allele-specific long-range PCR; CNV; genealogy network; nonallelic homologous recombination
DNA double strand breaks (DSBs) in repetitive sequences are a potent source of genomic instability, due to the possibility of non-allelic homologous recombination (NAHR). Repetitive sequences are especially at risk during meiosis, when numerous programmed DSBs are introduced into the genome to initiate meiotic recombination 1. Within the budding yeast repetitive ribosomal (r)DNA array, meiotic DSB formation is prevented in part through Sir2-dependent heterochromatin 2,3. Here, we demonstrate that the edges of the rDNA array are exceptionally susceptible to meiotic DSBs, revealing an inherent heterogeneity within the rDNA array. We find that this localised DSB susceptibility necessitates a border-specific protection system consisting of the meiotic ATPase Pch2 and the origin recognition complex subunit Orc1. Upon disruption of these factors, DSB formation and recombination specifically increased in the outermost rDNA repeats, leading to NAHR and rDNA instability. Strikingly, the Sir2-dependent heterochromatin of the rDNA itself was responsible for the induction of DSBs at the rDNA borders in pch2Δ cells. Thus, while Sir2 activity globally prevents meiotic DSBs within the rDNA, it creates a highly permissive environment for DSB formation at the heterochromatin/euchromatin junctions. Heterochromatinised repetitive DNA arrays are abundantly present in most eukaryotic genomes. Our data define the borders of such chromatin domains as distinct high-risk regions for meiotic NAHR, whose protection may be a universal requirement to prevent meiotic genome rearrangements associated with genomic diseases and birth defects.
Atrioventricular septal defects (AVSDs) are a frequent but not universal component of Down syndrome (DS), while AVSDs in otherwise normal individuals have no well-defined genetic basis. The contribution of copy number variation (CNV) to specific congenital heart disease (CHD) phenotypes including AVSD is unknown. We hypothesized that de novo CNVs on chromosome 21 might cause isolated sporadic AVSDs, and separately that CNVs throughout the genome might constitute an additional genetic risk factor for AVSD in patients with DS. We utilized a custom oligonucleotide arrays targeted to CNV hotspots that are flanked by large duplicated segments of high sequence identity. We assayed 29 euploid and 50 DS individuals with AVSD, and compared to general population controls. In patients with isolated-sporadic AVSD we identified two large unique deletions outside of chromosome 21 not seen in the expanded set of 8,635 controls, each overlapping with larger deletions associated with similar CHD reported in the DECIPHER database. There was a small duplication in one patient with DS and AVSD. We conclude that isolated sporadic AVSDs may be occasionally associated with large de novo genomic structural variation outside of chromosome 21. The absence of CNVs on chromosome 21 in patients with isolated sporadic AVSD suggests that sub-chromosomal duplications or deletions of greater than 150 kbp on chromosome 21 do not cause sporadic AVSDs. Large CNVs do not appear to be an additive risk factor for AVSD in the DS population.
Down syndrome; atrioventricular septal defects; copy number variation; array CGH; congenital heart disease