Genomic disorders are diseases caused by recurrent meiotic chromosomal rearrangements involving unstable genomic architectures 1
. Most frequently these involve non-allelic homologous recombination (NAHR) between highly similar duplicated sequences. NAHR between duplicated sequences in direct orientation, results in deletion and duplication of intervening sequences, whereas inversions result from NAHR between duplicated sequences in inverted orientation. The predominant pathogenic mechanism for the genomic disorders associated with deletions and duplications is altered copy number of dosage sensitive genes 2
. In addition to its role in genomic disorders, NAHR is one of the major mechanisms contributing to non-pathogenic structural variation in the human genome 3
. The breakpoints of rearrangements caused by NAHR have been shown to cluster in defined hotspots within duplicated sequences, in a manner akin to that of allelic recombination hotspots 4
A simple model of NAHR suggests that recombination between duplicated sequences (paralogues) can take place in one of three ways, between paralogues on the same chromatid, on sister chromatids or on the homologous chromosome (). In only the latter two cases are deletion and duplication reciprocal products of NAHR. According to this model of NAHR, the relative rates of deletion and duplication will be determined by the relative contribution that intra-chromatid NAHR makes to the overall frequency of meiotic NAHR, and the rate of duplication (β+γ) should never exceed the rate of deletion (α+β+γ).
For several loci that undergo NAHR, reciprocal deletion and duplication disorders are known, including hereditary neuropathy with liability to pressure palsies (HNPP, deletion) and Charcot-Marie-Tooth disease Type 1A (CMT1A, duplication) 5
, Smith-Magenis Syndrome (SMS) and Potocki-Lupski Syndrome 6
, Prader-Willi Syndrome/Angelman Syndrome and 15q11q13 duplication 7
, Velocardiofacial Syndrome and dup22(q11.2q11.2) 8
and Williams-Beuren Syndrome (WBS) and dup7(q11.23) 9
. The rates at which these rearrangements occur during meiosis have been estimated from the prevalence of resultant dominant disease phenotypes. However, not all NAHR-induced pathogenic deletions have yet been associated with pathogenic reciprocal duplications 1
. There are several possible explanations for this, the duplication might: have a milder, diverse or negligible phenotypic impact 2
, which hinders clinical ascertainment; be embryonically lethal and thus never observed in the population; or may simply occur much less frequently than its reciprocal deletion.
To address this question, by comparing directly rates of deletion and duplication, we developed breakpoint-specific real-time PCR assays to measure NAHR activity in the male germline at 4 known hotspots. These hotspots lie within: the WBS-LCRs (Low Copy Repeats) that sponsor the WBS deletion and dup7(q11.23) duplication 10
; the AZFa-HERVs (Human Endogenous RetroVirus) that sponsor the AZFa deletion that causes male infertility and its reciprocal, apparently asymptomatic, duplication 11,12
; the CMT1A-REPs that sponsor the HNPP deletion and CMT1A duplication 13
; and the LCR17p repeats that sponsor uncommon recurrent SMS deletions 14
(genomic regions are shown schematically in Supplementary Figure 1
Haplotyping pools of sperm genomes allows detailed investigation of allelic 15
and non-allelic 16
homologous recombination by permitting the analysis of large numbers of meioses. We adapted real-time PCR amplification of recombinant alleles 17
to measuring rates of NAHR. Applying nested breakpoint-specific PCR to pools of genomes from either blood or sperm, we determined the rate of meiotic and mitotic deletion and duplication at four NAHR hotspots (). Each of the 8 rearrangements was assayed in five unrelated sperm donors (). We showed that all deletion and duplication events were specific to sperm. This observation affirms the specificity of our assay, and the absence of mis-priming and jumping PCR artefacts. Moreover, our rate estimate for duplication at the CMT1A-REP hotspot is in complete agreement with a previous sperm-based estimate for this rearrangement by a different laboratory using a different assay 16
Assaying NAHR using real-time PCR
Male meiotic rates of deletions and duplications caused by NAHR
NAHR rates differed significantly among the four hotspots, in ordinal agreement with rates estimated from disease prevalence. The lowest NAHR rate occurs between the repeats that are furthest apart, consistent with the hypothesis that the highest rates of NAHR occur at the closest repeats with the greatest homology 18
. However, NAHR rates at the CMT1A-REP hotspot are significantly higher than at the WBS-LCR hotspot despite the latter hotspot being embedded in longer duplicated sequences sharing greater homology, and similar separation between duplicated sequences at the two loci (). This suggests other factors, such as local recombinatorial activity, are also important in determining rates of NAHR 19
We showed that both duplication and deletion are occurring at each hotspot, with deletion occurring at the higher rate (). We observed similar two-fold ratios of deletion to duplication at the three autosomal NAHR hotspots, and confirmed in control experiments that these differences cannot be attributed to differential amplification efficiency between deletion and duplication assays (Methods and Supplementary Figure 2
Previous studies suggested variation in mutation rate between individuals, both for the WBS deletion 20
and CMT1A duplication 16
. We did not observe any striking differences in NAHR rates among the 5 unrelated donors analysed for each hotspot, although for some rearrangements different individuals do appear to have subtle, but significant, differences in NAHR rates. We performed genome-wide array-CGH on 8 out of the 9 sperm donors analysed in this study to examine whether structural variation might account for these subtle differences in rate. We observed no large-scale structural variants in the regions assayed in these individuals (Supplementary Figure 3
By scaling for NAHR events outside of the assayed hotspot and for the ratio of paternal and maternal NAHR, we compared our estimated rates of de novo
WBS deletions and CMT1A duplications with those estimated from disease prevalence. If we scale an averaged sperm-based estimate of the WBS deletion rate in this manner we estimate a population prevalence of de novo
rearrangements of 1/7,000 to 1/47,000 (Supplementary Note
), this range encompasses the disease-based estimates of the WBS deletion prevalence of 1/7,500 to 1/25,000 10
. Our averaged sperm-based estimates of the CMT1A duplication rate yield a population prevalence of 1/23,000 to 1/79,000, which similarly encompasses the disease-based estimate of 1/23,000 to 1/41,000 (Supplementary Note
We sequenced 96 breakpoints for each of the 8 rearrangements from a single sperm donor to confirm and fully characterise the breakpoint (). These sequences confirm the recombinant nature of the amplified sequences, and exclude the possibility of gene conversion of primer-binding sites contributing to our estimates of deletion and duplication. These breakpoint sequences revealed NAHR activity profiles for each NAHR hotspot, the resolution of which were governed by the location of paralogous sequence variants (PSVs) and varied between hotspots. At the CMT1A-REP hotspot, which has the most informative distribution of PSVs, NAHR was more frequent towards the centre of each hotspot (), as has been reported previously 16,21
. Comparable profiles have been observed for reciprocal events at allelic homologous recombination hotspots 17,22
. The NAHR profiles for reciprocal deletion and duplication events are highly similar at two hotspots (CMT1A-REPs: p=0.09; LCR17p: p=0.11; Fisher's Exact Test) but differ significantly at the other two (AZFa-HERV: p<0.0001; WBS-LCR: p<0.001) which is suggestive of reciprocal crossover asymmetry due to unequal efficiencies of crossover initiation in the two paralogous sequences 23
. We observed short tracts of patchy gene conversion in six out of the 768 breakpoint sequences ( and Supplementary Table 1
), five of which occurred at a common location in the CMT1A-REP hotspot, at which gene conversion has previously been reported 16,19
By comparing the rates of deletion and duplication we can estimate the rate of intra-chromatidal NAHR (α) and, by factoring in information from patient studies about the frequency with which de novo
rearrangements are accompanied by the recombination of flanking markers, we can solve simultaneous equations to estimate rates of inter-chromatidal (β) and inter-chromosomal (γ) NAHR (Supplementary Note
). Intra-chromatidal NAHR predominated at all NAHR hotspots, as evidenced by the more than two fold greater rate of deletions than duplications at every locus (α > β + γ). For the CMT1A duplication we estimated α, β and γ to be 2.47×10−5
, respectively. In other words, the rate of inter-chromosomal NAHR (γ) is estimated to be 50-fold higher than the rate of inter-chromatidal NAHR (β). Preliminary estimates for WBS deletions 10
also suggest that inter-chromatidal NAHR (β) at the WBS-LCR hotspot is much less frequent than either intra-chromatidal (α) or inter-chromosomal (γ) NAHR (Supplementary Note
The frequency of intra-chromosomal and inter-chromosomal NAHR at the CMT1A-REP hotspot can be estimated directly for one of our sperm donors, by determining haplotypes of pairs of heterozygous SNPs in breakpoint sequences (). Two of the four possible haplotypes would be generated by inter-chromatidal NAHR, and the other two haplotypes caused by inter-chromosomal NAHR. We observed only the inter-chromosomal NAHR haplotypes among the nineteen informative duplication breakpoint sequences (p< 0.0001; chi-square test; null hypothesis of random haplotype formation), which corroborates our estimates of β and γ above. By contrast, all four possible haplotypes of (different) informative SNPs were observed in informative breakpoint sequences of the reciprocal deletion. The absence of haplotype biases (p=0.23) confirms that inter-chromosomal (γ) and intra-chromosomal (α) deletion events occur at similar rates.
Haplotypes of NAHR breakpoints at CMT1A-REP
The rate of deletion relative to duplication (4.11:1) is highest at the haploid AZFa-HERV hotspot, which cannot undergo inter-chromosomal NAHR. Nevertheless, the relative rate of inter-chromatid NAHR is much higher than at CMT1A-REP and WBS-LCR NAHR hotspots, which suggests that it might be suppressed at autosomal loci, although this hypothesis needs further testing across many loci.
In conclusion, sperm-based assays support the simple model for NAHR elaborated in , and allow the relative contribution of different NAHR events to be dissected. Our results refute the common misconception that because deletion and duplication caused by NAHR can be reciprocal events, the rates are necessarily equal. The proportion of deletions to duplications in apparently healthy individuals is approximately equal in a recent survey of copy number variation 3
, which, given our finding that the rate of deletions caused by NAHR outstrips the rate of duplications by at least a factor of two, suggests either that other mechanisms generating copy number variation are biased towards duplications, or that selection removes deletions from the population more efficiently than duplications. These sperm-based assays are potentially of great value in guiding clinical diagnosis (see below), but require precise knowledge of NAHR locations and so we encourage researchers to fine-map breakpoints of pathogenic rearrangements.
Our results have important implications for clinical diagnosis of genomic disorders. First, we detected the reciprocal duplication for the LCR17p deletion 14
, which has not been previously reported. This LCR17p duplication should entirely encompass the duplication causing Potocki-Lupski syndrome, and so it should be expected that the larger duplication would also be characterised by developmental delay, cognitive impairment and autistic features 6,24
. Second, the ratio at which the dominant disorders resulting from reciprocal deletion and duplication at the CMT1A-REPs are diagnosed is 1:4 (Jim Lupski personal communication). Our results suggest that they should be 2:1, and that HNPP, which confers a relatively mild and variable phenotype 25
, is being greatly under-diagnosed 26
. Third, there are many hundreds of WBS deletions reported in the literature 27
, but only 4 reciprocal duplications (9,28-30
). Our results suggest that the syndrome resulting from these duplications remains undiagnosed in the majority of cases. These observations argue strongly that the clinical application of methods that allow these genomic imbalances to be detected directly (e.g. array comparative genome hybridisation) should be implemented more widely, so as to facilitate appropriate genetic counselling, improve diagnostic criteria, and increase our understanding of genotype-phenotype relationships.
Our findings underscore the rationale for pursuing similar detailed analyses for all genomic disorders, especially those at which inter-individual variation in rates are suspected, such as Sotos Syndrome. Moreover, we see great value in expanding these assays to more donors (including patients with genome instability disorders), more tissues, other species and more loci, so as to characterise fully the genetic and environmental determinants of chromosomal rearrangement rates.