Subtelomeric rearrangements have important implications for human disease and evolution. Genomic studies of subtelomeres have revealed that human chromosome ends are subject to elevated rates of meiotic recombination, sister chromatid exchange and DNA transfer (47
). For over 40 years, subtelomeric rearrangements have been recognized as a significant cause of intellectual disability. Previous studies showed that, unlike some chromosome abnormalities that cause genomic disorders, subtelomeric rearrangements do not typically have recurrent breakpoints (8
). This led some to propose that subtelomeric rearrangements are the product of ‘random chromosome breakage’ (53
Here we show that subtelomeric rearrangements arise via distinct mutational mechanisms. We sequenced 21 breakpoint junctions and found that 18 junctions (86%) have little or no homology and that three junctions (14%) have hundreds of bp of sequence homology between recombining segments, typical of NAHR. The frequency of these classes of events is similar to what is seen in CNVs from control populations, where NAHR accounts for 9 (22
), 10–15 (23
) and 22% (24
) of sequenced breakpoint junctions.
NAHR events give rise to reciprocal deletions and duplications, as well as translocations (26
). However, the majority of NAHR-mediated chromosome rearrangements described in the literature are the product of recombination between large segmental duplications also known as low-copy repeats (56
). NAHR events between smaller interspersed repeats, such as LINEs and Alu
s, are overlooked by most array-based studies and are only captured via sequencing (24
). Though the deletions and translocation detected in SCH3, EGL049 and LM221 were all originally identified by clinical diagnostic array CGH, the NAHR substrates responsible for the rearrangements were only detected via high-resolution array CGH and sequencing. Thus, NAHR events in clinically recognized subtelomeric rearrangements are likely underestimated. Furthermore, sequence variation in interspersed repeats may affect the propensity of two elements to recombine. The recombining repeats in SCH3, EGL049 and LM221 were 88, 87 and 96% identical, respectively, as represented in the human genome assembly. However, we have not sequenced the parental repeats that recombined in these individuals, and it is possible that gene conversion between repeats may make some more homologous than represented in the reference genome (60
Like other studies of pathogenic CNV breakpoints (62
), the majority of breakpoint junctions in our study lack significant sequence homology and may arise via NHEJ, MMEJ, DNA replication errors, or de novo
telomere synthesis. NHEJ and MMEJ can be distinguished in experimental systems by disrupting proteins in either repair pathway and monitoring the amount of microhomology at breakpoint junctions (31
), but we find that the amount of microhomology at subtelomeric breakpoint junctions does not fall into discrete categories that would distinguish NHEJ from MMEJ. We presume that translocations with little sequence homology at the edges of the junction are the product of and end-joining process between two double-strand breaks. However, terminal deletions may be the product of end-joining or de novo
telomere synthesis (16
Errors in DNA replication can also give rise to chromosome rearrangements, and some replication events reveal characteristic breakpoint junctions (17
). Serial replication slippage generates short deletions and duplications at breakpoints, (35
) like the breakpoint junction in LM204 (Fig. ). Analyses of non-recurrent breakpoint junctions suggest that FoSTeS may be responsible for breakpoints with insertions of short duplicated and/or inverted segments (43
). Of the 21 breakpoint junctions we sequenced, only the 22q breakpoint from EGL098 had an inverted insertion consistent with FoSTeS. This 16 bp insertion did not affect our PCR-based junction sequencing; however, other larger insertions would escape detection by this method.
Sequencing breakpoint junctions reveals the mechanism of DNA repair, points to the site of chromosome breakage and determines rearrangement organization, which could impact the regulation of nearby genes. Though sequencing breakpoint junctions was a high priority for this study, we were only able to sequence 21 out of 78 (27%) junctions (Supplementary Material, Table S1
). We attempted to amplify 58 breakpoints as described in Methods, and captured breakpoint junctions in 21 rearrangements, giving us an overall success rate of 21 out of 58 (36%). In some cases, we exhausted the subject DNA stock before we could optimize PCR conditions, but in others we tried multiple PCR conditions and primer pairs. These recalcitrant breakpoints likely constitute more complex breakpoint junctions than predicted by high-resolution array CGH. Small local deletions and duplications are not uncommon at breakpoint junctions (18
) and we suspect that many of the missing junctions in our study fall into this category.
We were able to identify one such cryptic rearrangement by amplifying the translocation junction in 18q–71c, which appeared to be a terminal deletion by other methods. However, we were unable to sequence the junctions of any terminal or interstitial duplications. Other groups have had similar problems in sequencing duplication junctions (22
), which leads us to suspect that many are more complicated than perfect tandem or inverted duplications. Capturing complex junctions will require an unbiased approach that does not depend on inferring the correct rearrangement structure.
There are likely many factors involved in initiating the double-strand breaks that give rise to subtelomeric rearrangements. Large data sets of sequenced breakpoints coupled with functional assays of chromosome fragility will help us determine the roles of DNA sequence and chromatin in double-strand breaks and DNA replication errors. As described in Vissers et al
), other pathogenic CNV breakpoints are enriched in sequence motifs that may stimulate CNV formation. Given the enrichment of tandem repeats at chromosome ends (47
) and subtelomeric hotspots on 9q and 22q that coincide with tandem repeats (Fig. ), we suspect that some subtelomeric rearrangements are initiated by breakage in particularly fragile DNA sequences. Tandem repeats have been found at other subtelomeric breakpoints (18
), and are overrepresented in normal CNV breakpoint data sets (22
). Like fragile sites in the human genome (65
), variation in repeat track length or sequence composition could affect the risk for chromosome breakage at subtelomeric hotspots.
Furthermore, the concentration of particular repetitive sequences at chromosome ends could explain differences in the density and location of breakpoints among subtelomeres (Fig. ). Chromosome ends enriched in ‘fragile motifs’ would be expected to be more susceptible to new rearrangements. This has been suggested for breakpoints in 1p and 9q that are particularly GC-rich and concentrated in specific classes of repetitive sequences predicted to form secondary structures (20
). Of course, embryonic lethal subtelomeric rearrangements will also contribute to the ascertainment of rearrangements of chromosome ends. Subtelomeric rearrangements in gene-poor chromosome ends (e.g. 18q) are on average larger than rearrangements from gene-rich chromosome ends (e.g. 9q) (Fig. and Supplementary Material, Table S1
), likely reflecting the lethality of large genomic gains and losses that encompass many genes.
Determining the parental origin of chromosome rearrangements can also yield mechanistic insights into how rearrangements form and at what time in development they occur (67
). Analysis of six trios in our study revealed an equal number of maternally and paternally derived subtelomeric rearrangements. These data are too limited to assess parental bias and include only two sequenced junctions. Thus, we cannot determine if particular classes of subtelomeric rearrangements have a parental bias. In a large cohort of 18q terminal deletions, Heard et al
) found a significant paternal bias as 71 out of 81 deletions were paternal in origin. In contrast, analysis of the parent of origin of 40 de novo
1p terminal deletions revealed that 24 and 16 deletions occurred on the maternal and paternal chromosomes, respectively (27
). In their analysis of 9q subtelomeric rearrangements, Yatsenko et al
) found that 11 and six rearrangements occurred on the paternal and maternal alleles, respectively. It is possible that subtelomeric rearrangements involving particular loci or chromosome ends have a parental bias; however, as a group, subtelomeric rearrangements do not appear to have a significant parent-of-origin effect.
Most subtelomeric rearrangements involve many genes that could be responsible for an abnormal phenotype. Nevertheless, smaller rearrangements can refine critical regions involved in developmental disabilities and birth defects. Although we lack the detailed phenotypic information for the subjects in our study that would allow specific genotype–phenotype correlations, it is worth mentioning some of the smaller rearrangements that may narrow the genes involved in the referring diagnosis (Fig. ). EGL039 and EGL049 have overlapping interstitial deletions of 17p13.3 that include the YWHAE
gene, but not PAFAH1B1
. Mutations in PAFAH1B1
, also known as LIS1
, cause lissencephaly (69
) (OMIM 601545). EGL039 was referred with an indication of mild intellectual disability and short stature; EGL049 was referred for developmental delay. Parental analysis revealed that both deletions are de novo
(Supplementary Material, Table S1
), which is typical of pathogenic chromosome rearrangements. In addition, recent studies have reported similar deletions of 17p13.3 in children with mild intellectual disability and moderate-to-severe growth restriction (70
). Thus, our data support the conclusion that deletions of this part of 17p13.3 cause a less severe phenotype than larger deletions that also include PAFAH1B1.
We also identified an interstitial duplication of an overlapping region of 17p13.3; however, no phenotypic information was available for subject EGL042, who carries the duplication. Duplications of this region have reported in children with intellectual disability, macrosomia and dysmorphic facial features (72
The smallest chromosome imbalance in our study is a 54-kb deletion of 4p16.3 that includes only one gene, LETM1
(Fig. ). LETM1
lies within the Wolf–Hirschhorn syndrome (WHS) critical region 2 (73
) (OMIM 194190), but subject EGL094 exhibited no facial features of WHS; rather, she was referred for testing at 1 year of age, presenting with microtia, renal agenesis, Duane anomaly and a congenital heart defect. These data suggest that loss of LETM1
is not responsible for the characteristic facial features in WHS and that other candidate genes in the critical region may be involved.
Overall, we have shown that subtelomeric rearrangements are a heterogeneous class of chromosome abnormalities caused by diverse mutational mechanisms. Most breakpoint junctions do not have significant sequence homology between recombining segments, consistent with end-joining, DNA replication errors, or de novo telomere synthesis. NAHR between interspersed repeats mediates interstitial deletions and a translocation, though we found no recurrent NAHR-mediated events in this group of subtelomeric rearrangements. We suspect that the shorter stretch of sequence homology when compared with larger segmental duplications makes Alu–Alu and L1–L1 recombination less frequent. We also identified one subtelomeric rearrangement with sequenced junctions that are consistent with the FoSTeS model of replicative DNA repair. Ultimately, as with other CNVs in the human genome, there is no single mechanism to account for subtelomeric rearrangement formation.