|Home | About | Journals | Submit | Contact Us | Français|
Meiotic recombination between highly-similar duplicated sequences (non-allelic homologous recombination, NAHR) generates deletions, duplications, inversions, and translocations, and is responsible for genetic diseases known as ‘genomic disorders’, most of which are caused by altered copy number of dosage sensitive genes. NAHR Hotspots have been identified within some duplicated sequences. We have developed sperm-based assays to measure the de novo rate of reciprocal deletions and duplications at 4 NAHR hotspots. We used these assays to dissect the relative rates of NAHR between different pairs of duplicated sequences. We show that: (i) these NAHR hotspots are specific to meiosis, (ii) deletions are generated at a higher rate than their reciprocal duplications in the male germline and (iii) some of these genomic disorders are likely to have been under-ascertained clinically, most notably the duplication of 7q11, the reciprocal of the Williams-Beuren Syndrome deletion.
Genomic disorders are diseases caused by recurrent meiotic chromosomal rearrangements involving unstable genomic architectures 1. Most frequently these involve non-allelic homologous recombination (NAHR) between highly similar duplicated sequences. NAHR between duplicated sequences in direct orientation, results in deletion and duplication of intervening sequences, whereas inversions result from NAHR between duplicated sequences in inverted orientation. The predominant pathogenic mechanism for the genomic disorders associated with deletions and duplications is altered copy number of dosage sensitive genes 2. In addition to its role in genomic disorders, NAHR is one of the major mechanisms contributing to non-pathogenic structural variation in the human genome 3. The breakpoints of rearrangements caused by NAHR have been shown to cluster in defined hotspots within duplicated sequences, in a manner akin to that of allelic recombination hotspots 4.
A simple model of NAHR suggests that recombination between duplicated sequences (paralogues) can take place in one of three ways, between paralogues on the same chromatid, on sister chromatids or on the homologous chromosome (Figure 1). In only the latter two cases are deletion and duplication reciprocal products of NAHR. According to this model of NAHR, the relative rates of deletion and duplication will be determined by the relative contribution that intra-chromatid NAHR makes to the overall frequency of meiotic NAHR, and the rate of duplication (β+γ) should never exceed the rate of deletion (α+β+γ).
For several loci that undergo NAHR, reciprocal deletion and duplication disorders are known, including hereditary neuropathy with liability to pressure palsies (HNPP, deletion) and Charcot-Marie-Tooth disease Type 1A (CMT1A, duplication) 5, Smith-Magenis Syndrome (SMS) and Potocki-Lupski Syndrome 6, Prader-Willi Syndrome/Angelman Syndrome and 15q11q13 duplication 7, Velocardiofacial Syndrome and dup22(q11.2q11.2) 8 and Williams-Beuren Syndrome (WBS) and dup7(q11.23) 9. The rates at which these rearrangements occur during meiosis have been estimated from the prevalence of resultant dominant disease phenotypes. However, not all NAHR-induced pathogenic deletions have yet been associated with pathogenic reciprocal duplications 1. There are several possible explanations for this, the duplication might: have a milder, diverse or negligible phenotypic impact 2, which hinders clinical ascertainment; be embryonically lethal and thus never observed in the population; or may simply occur much less frequently than its reciprocal deletion.
To address this question, by comparing directly rates of deletion and duplication, we developed breakpoint-specific real-time PCR assays to measure NAHR activity in the male germline at 4 known hotspots. These hotspots lie within: the WBS-LCRs (Low Copy Repeats) that sponsor the WBS deletion and dup7(q11.23) duplication 10; the AZFa-HERVs (Human Endogenous RetroVirus) that sponsor the AZFa deletion that causes male infertility and its reciprocal, apparently asymptomatic, duplication 11,12; the CMT1A-REPs that sponsor the HNPP deletion and CMT1A duplication 13; and the LCR17p repeats that sponsor uncommon recurrent SMS deletions 14 (genomic regions are shown schematically in Supplementary Figure 1).
Haplotyping pools of sperm genomes allows detailed investigation of allelic 15 and non-allelic 16 homologous recombination by permitting the analysis of large numbers of meioses. We adapted real-time PCR amplification of recombinant alleles 17 to measuring rates of NAHR. Applying nested breakpoint-specific PCR to pools of genomes from either blood or sperm, we determined the rate of meiotic and mitotic deletion and duplication at four NAHR hotspots (Figure 2). Each of the 8 rearrangements was assayed in five unrelated sperm donors (Figure 3). We showed that all deletion and duplication events were specific to sperm. This observation affirms the specificity of our assay, and the absence of mis-priming and jumping PCR artefacts. Moreover, our rate estimate for duplication at the CMT1A-REP hotspot is in complete agreement with a previous sperm-based estimate for this rearrangement by a different laboratory using a different assay 16.
NAHR rates differed significantly among the four hotspots, in ordinal agreement with rates estimated from disease prevalence. The lowest NAHR rate occurs between the repeats that are furthest apart, consistent with the hypothesis that the highest rates of NAHR occur at the closest repeats with the greatest homology 18. However, NAHR rates at the CMT1A-REP hotspot are significantly higher than at the WBS-LCR hotspot despite the latter hotspot being embedded in longer duplicated sequences sharing greater homology, and similar separation between duplicated sequences at the two loci (Table 1). This suggests other factors, such as local recombinatorial activity, are also important in determining rates of NAHR 19.
We showed that both duplication and deletion are occurring at each hotspot, with deletion occurring at the higher rate (Table 1). We observed similar two-fold ratios of deletion to duplication at the three autosomal NAHR hotspots, and confirmed in control experiments that these differences cannot be attributed to differential amplification efficiency between deletion and duplication assays (Methods and Supplementary Figure 2).
Previous studies suggested variation in mutation rate between individuals, both for the WBS deletion 20 and CMT1A duplication 16. We did not observe any striking differences in NAHR rates among the 5 unrelated donors analysed for each hotspot, although for some rearrangements different individuals do appear to have subtle, but significant, differences in NAHR rates. We performed genome-wide array-CGH on 8 out of the 9 sperm donors analysed in this study to examine whether structural variation might account for these subtle differences in rate. We observed no large-scale structural variants in the regions assayed in these individuals (Supplementary Figure 3).
By scaling for NAHR events outside of the assayed hotspot and for the ratio of paternal and maternal NAHR, we compared our estimated rates of de novo WBS deletions and CMT1A duplications with those estimated from disease prevalence. If we scale an averaged sperm-based estimate of the WBS deletion rate in this manner we estimate a population prevalence of de novo rearrangements of 1/7,000 to 1/47,000 (Supplementary Note), this range encompasses the disease-based estimates of the WBS deletion prevalence of 1/7,500 to 1/25,000 10. Our averaged sperm-based estimates of the CMT1A duplication rate yield a population prevalence of 1/23,000 to 1/79,000, which similarly encompasses the disease-based estimate of 1/23,000 to 1/41,000 (Supplementary Note).
We sequenced 96 breakpoints for each of the 8 rearrangements from a single sperm donor to confirm and fully characterise the breakpoint (Figure 4). These sequences confirm the recombinant nature of the amplified sequences, and exclude the possibility of gene conversion of primer-binding sites contributing to our estimates of deletion and duplication. These breakpoint sequences revealed NAHR activity profiles for each NAHR hotspot, the resolution of which were governed by the location of paralogous sequence variants (PSVs) and varied between hotspots. At the CMT1A-REP hotspot, which has the most informative distribution of PSVs, NAHR was more frequent towards the centre of each hotspot (Figure 4), as has been reported previously 16,21. Comparable profiles have been observed for reciprocal events at allelic homologous recombination hotspots 17,22. The NAHR profiles for reciprocal deletion and duplication events are highly similar at two hotspots (CMT1A-REPs: p=0.09; LCR17p: p=0.11; Fisher's Exact Test) but differ significantly at the other two (AZFa-HERV: p<0.0001; WBS-LCR: p<0.001) which is suggestive of reciprocal crossover asymmetry due to unequal efficiencies of crossover initiation in the two paralogous sequences 23. We observed short tracts of patchy gene conversion in six out of the 768 breakpoint sequences (Figure 4 and Supplementary Table 1), five of which occurred at a common location in the CMT1A-REP hotspot, at which gene conversion has previously been reported 16,19.
By comparing the rates of deletion and duplication we can estimate the rate of intra-chromatidal NAHR (α) and, by factoring in information from patient studies about the frequency with which de novo rearrangements are accompanied by the recombination of flanking markers, we can solve simultaneous equations to estimate rates of inter-chromatidal (β) and inter-chromosomal (γ) NAHR (Supplementary Note). Intra-chromatidal NAHR predominated at all NAHR hotspots, as evidenced by the more than two fold greater rate of deletions than duplications at every locus (α > β + γ). For the CMT1A duplication we estimated α, β and γ to be 2.47×10−5, 3.46×10−7 and 1.7×10−5, respectively. In other words, the rate of inter-chromosomal NAHR (γ) is estimated to be 50-fold higher than the rate of inter-chromatidal NAHR (β). Preliminary estimates for WBS deletions 10 also suggest that inter-chromatidal NAHR (β) at the WBS-LCR hotspot is much less frequent than either intra-chromatidal (α) or inter-chromosomal (γ) NAHR (Supplementary Note).
The frequency of intra-chromosomal and inter-chromosomal NAHR at the CMT1A-REP hotspot can be estimated directly for one of our sperm donors, by determining haplotypes of pairs of heterozygous SNPs in breakpoint sequences (Figure 5). Two of the four possible haplotypes would be generated by inter-chromatidal NAHR, and the other two haplotypes caused by inter-chromosomal NAHR. We observed only the inter-chromosomal NAHR haplotypes among the nineteen informative duplication breakpoint sequences (p< 0.0001; chi-square test; null hypothesis of random haplotype formation), which corroborates our estimates of β and γ above. By contrast, all four possible haplotypes of (different) informative SNPs were observed in informative breakpoint sequences of the reciprocal deletion. The absence of haplotype biases (p=0.23) confirms that inter-chromosomal (γ) and intra-chromosomal (α) deletion events occur at similar rates.
The rate of deletion relative to duplication (4.11:1) is highest at the haploid AZFa-HERV hotspot, which cannot undergo inter-chromosomal NAHR. Nevertheless, the relative rate of inter-chromatid NAHR is much higher than at CMT1A-REP and WBS-LCR NAHR hotspots, which suggests that it might be suppressed at autosomal loci, although this hypothesis needs further testing across many loci.
In conclusion, sperm-based assays support the simple model for NAHR elaborated in Figure 1, and allow the relative contribution of different NAHR events to be dissected. Our results refute the common misconception that because deletion and duplication caused by NAHR can be reciprocal events, the rates are necessarily equal. The proportion of deletions to duplications in apparently healthy individuals is approximately equal in a recent survey of copy number variation 3, which, given our finding that the rate of deletions caused by NAHR outstrips the rate of duplications by at least a factor of two, suggests either that other mechanisms generating copy number variation are biased towards duplications, or that selection removes deletions from the population more efficiently than duplications. These sperm-based assays are potentially of great value in guiding clinical diagnosis (see below), but require precise knowledge of NAHR locations and so we encourage researchers to fine-map breakpoints of pathogenic rearrangements.
Our results have important implications for clinical diagnosis of genomic disorders. First, we detected the reciprocal duplication for the LCR17p deletion 14, which has not been previously reported. This LCR17p duplication should entirely encompass the duplication causing Potocki-Lupski syndrome, and so it should be expected that the larger duplication would also be characterised by developmental delay, cognitive impairment and autistic features 6,24. Second, the ratio at which the dominant disorders resulting from reciprocal deletion and duplication at the CMT1A-REPs are diagnosed is 1:4 (Jim Lupski personal communication). Our results suggest that they should be 2:1, and that HNPP, which confers a relatively mild and variable phenotype 25, is being greatly under-diagnosed 26. Third, there are many hundreds of WBS deletions reported in the literature 27, but only 4 reciprocal duplications (9,28-30). Our results suggest that the syndrome resulting from these duplications remains undiagnosed in the majority of cases. These observations argue strongly that the clinical application of methods that allow these genomic imbalances to be detected directly (e.g. array comparative genome hybridisation) should be implemented more widely, so as to facilitate appropriate genetic counselling, improve diagnostic criteria, and increase our understanding of genotype-phenotype relationships.
Our findings underscore the rationale for pursuing similar detailed analyses for all genomic disorders, especially those at which inter-individual variation in rates are suspected, such as Sotos Syndrome. Moreover, we see great value in expanding these assays to more donors (including patients with genome instability disorders), more tissues, other species and more loci, so as to characterise fully the genetic and environmental determinants of chromosomal rearrangement rates.
Sperm samples were obtained from donors at the Bourn Hall IVF Clinic and their use for recombination analyses was approved by Cambridge Local Research Ethics Committee (reference 04/Q0108/46). We quantified DNA extracted from semen and blood (Supplementary Methods) using a nanodrop spectrophotometer and used pulsed field gel electrophoresis to monitor template length (data not shown)
We obtained reference alignments for duplicated sequences from the UCSC genome browser 31. We designed PCR primers to amplify each hotspot and its flanking regions non-specifically from all paralogs using Oligo6 (Molecular Biology Insights, Inc.) in conjunction with an oligonucleotide annealing temperature calculator (http://www.cnr.berkeley.edu/~zimmer/oligoTMcalc.html) (Supplementary Table 2). We amplified loci from five sperm donors for all NAHR hotspots (Supplementary Table 3). These five sperm donors were drawn from a panel of nine sperm donors. The same sperm donors were used for deletion and duplication assays at each hotspot. We cloned paralogous sequences into MACH1 competent cells (Invitrogen) using the pGEM-T Easy Vector (Promega), reamplified DNA extracted from positive colonies, and sequenced amplicons using v3.1 Big Dye Chemistry (Applied Biosystems, Inc.)
We manually aligned sequences of cloned hotspots using BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html) and identified paralogous sequence variants for each locus. We designed nested paralog-specific primers as described above, so that the 3′ nucleotide of each primer was specific for the paralog of interest, and we used additional internal mismatches when necessary (Supplementary Table 2). We tested primers for specificity on blood DNA using recombinant-specific combinations of primers. In addition, we designed a dual labeled Q-PCR probe for each locus, to allow the second of the nested PCRs to be run on an Applied Biosystems 7500 Real Time PCR Machine (Supplementary Table 2).
We performed 1° paralog-specific PCRs in 1x Platinum Taq buffer (Invitrogen), 200μM dNTPs (GE Healthcare), 300nM each primer, 1 unit Platinum Taq (Invitrogen), using sufficient copies of template DNA to give approximately 24 positive wells per plate (exact quantities determined empirically) and 2-3mM MgCl2 (Supplementary Table 3), in a total volume of 50μl. Following thermal cycling we incubated 1° PCR products with 1 unit E. coli Exonuclease I (NEB) for 1 hour at 37°C to digest 1° primers, performed a 20x dilution using water and used 0.5μl of diluted PCR product as a template in the 2° PCR. In the 2° PCR we used identical concentrations of buffer, dNTPs, primers and enzyme to the 1° PCR but the total volume was 25μl and we added a dual-labeled probe (final concentration 250nM; Sigma) and 1x Rox reference dye (Invitrogen) (Supplementary Table 3). We confirmed the reproducibility of the 2° PCR for a subset of plates by reamplifying the entire plate and showing 100% concordance as to the wells containing breakpoint sequences (data not shown). To refine breakpoint locations we reamplified wells that we had previously identified as positive from the 1° PCR plate using 2° PCR primers and sequenced these amplicons. The accession numbers for these sequences are EU033157 to EU033924.
To account for the number of wells containing more than one identical recombinant template in the primary PCR, we performed a Poisson correction on the observed number of positive reactions 15. The Poisson corrected number of recombinants is given by −N ln[(N − R)/N], where N is the number of reactions performed and R is the number of positive reactions observed. The quantity of input sperm DNA was titrated to produce approximately 24 positive breakpoint-specific amplifications per 96-well plate, thus increasing the probability that a positive result came from a single recombinant template molecule, and keeping the increase in positive results following Poisson correction below 30% 15. We calculated 95% confidence limits on the Poisson-corrected counts using the Epitools Package (http://www.mathepi.com/epitools/index.html) in R (http://www.r-project.org/)
To estimate the true amplification efficiency of our duplication assays we generated a control recombinant duplication haplotype for each of the four NAHR hotspots by fusion PCR. This recombinant haplotype was cloned and sequenced to confirm its haplotype. This cloned haplotype was then used as a positive control template in the NAHR assays, against a background of genomic DNA derived from blood. For each duplication assay, at least 192 reactions were performed each with a 0.52 probability of containing a recombinant haplotype. The number of wells containing observed recombinant haplotypes was then counted, Poisson-corrected and 95% confidence limits calculated. These values were then divided by the expected number of wells containing a recombinant haplotype (e.g. 192*0.52) to estimate a 95% confidence interval on the amplification efficiency of detecting a duplication (Supplementary Figure 2).
Genomic DNA from sperm donors was compared against a standard genomic DNA (NA10851) on a large-insert clone whole-genome tilepath array 32. 300 ng of test and reference genomic DNA each were labelled using the ENZO labelling kit (ENZO Life Sciences, Inc, USA) according to the instructions of the suppliers and incubated over night. The samples were purified and hybridised as described previously 32.
The authors are grateful to Lucy Osborne for information on WBS-LCR duplications, Celia May for advice on sperm recombination assays, and Jim Lupski and Alec Jeffreys for comments on an earlier version of the manuscript. We would like to thank the sperm donors themselves and the staff at the Bourn Clinic. This work was funded by the Wellcome Trust.