|Home | About | Journals | Submit | Contact Us | Français|
Chromosome rearrangements are a significant cause of intellectual disability and birth defects. Subtelomeric rearrangements, including deletions, duplications and translocations of chromosome ends, were first discovered over 40 years ago and are now recognized as being responsible for several genetic syndromes. Unlike the deletions and duplications that cause some genomic disorders, subtelomeric rearrangements do not typically have recurrent breakpoints and involve many different chromosome ends. To capture the molecular mechanisms responsible for this heterogeneous class of chromosome abnormality, we coupled high-resolution array CGH with breakpoint junction sequencing of a diverse collection of subtelomeric rearrangements. We analyzed 102 breakpoints corresponding to 78 rearrangements involving 28 chromosome ends. Sequencing 21 breakpoint junctions revealed signatures of non-homologous end-joining, non-allelic homologous recombination between interspersed repeats and DNA replication processes. Thus, subtelomeric rearrangements arise from diverse mutational mechanisms. In addition, we find hotspots of subtelomeric breakage at the end of chromosomes 9q and 22q; these sites may correspond to genomic regions that are particularly susceptible to double-strand breaks. Finally, fine-mapping the smallest subtelomeric rearrangements has narrowed the critical regions for some chromosomal disorders.
Subtelomeric rearrangements at chromosome ends were among the first copy number variations (CNVs) recognized in the human genome (1–5). Unlike CNVs that represent normal variation, subtelomeric rearrangements contribute significantly to intellectual disability, autism and birth defects. The incidence of subtelomeric rearrangements was first recognized when subtelomeric fluorescence in situ hybridization (FISH) testing became a standard cytogenetic test (6,7), and since then subtelomeric rearrangements have been found on every chromosome end (8–10). The detection of subtelomeric rearrangements was greatly improved with the advent of array CGH testing, and clinical array studies have shown that 30–38% of pathogenic chromosome abnormalities involve chromosome ends (10–12). Recent microarray analysis of 15 749 developmentally disabled individuals in the International Standards for Cytogenomic Arrays (ISCA) data set revealed that 26.5 and 16.3% of clinically relevant CNVs lie within the terminal 10 Mb and terminal 5 Mb of chromosomes ends, respectively (13). Subtelomeric rearrangement studies have also paved the way for the discovery of critical regions and genes responsible for specific phenotypes (14,15).
Despite the clinical relevance and incidence of subtelomeric rearrangements, the genomic factors involved in subtelomeric breakage and repair have yet to be investigated comprehensively. However, several groups have analyzed the rearrangement breakpoints in particular chromosome ends, most notably 1p, 9q and 22q. Sequencing breakpoint junctions has revealed that subtelomeric breakpoints do not typically reside at the same site and that breakpoint junctions do not usually bear signatures of homologous recombination (16–21).
To comprehensively evaluate the mutational mechanisms that generate subtelomeric rearrangements, we have taken a large-scale approach to fine-map and sequence breakpoint junctions from a diverse collection of chromosome abnormalities. As is true for the breakpoints of CNVs that represent normal variation (22–24), we find evidence of non-homologous end-joining (NHEJ), non-allelic homologous recombination (NAHR) and DNA replication mechanisms.
To ascertain a diverse collection of subtelomeric rearrangements, we analyzed DNA samples from multiple clinical cytogenetics labs, the Chromosome 18 Clinical Research Center, and a family from the Unique Rare Chromosome Support Group. We accepted samples from individuals with a previous diagnosis of a pathogenic subtelomeric rearrangement that was detected by either subtelomeric FISH or array CGH. Subtelomeric abnormalities of known etiology, for example, the recurrent 3q29 deletion (25) and the recurrent translocation between chromosomes 4p16 and 8p23 (26), were excluded from this study.
To refine subtelomeric breakpoints detected by diagnostic FISH or array CGH, we designed custom oligonucleotide arrays to target chromosome ends with a mean probe spacing of one oligonucleotide per 240 basepairs (bp). We focused on genomic gains and losses involving the terminal 5 Megabases (Mb) of chromosome ends, but also included some larger rearrangements to capture translocation junctions where the breakpoint of one chromosome end was proximal of the terminal 5 Mb. The size of genomic imbalances ranged from 54 kilobases (kb) to 25 Mb, with a median of 2.2 Mb (Fig. 1 and Supplementary Material, Table S1). We analyzed 51 terminal deletions, 11 unbalanced translocations, 10 interstitial deletions, four interstitial duplications and two terminal duplications, for a total of 78 subtelomeric rearrangements (Supplementary Material, Fig. S1). Ten unbalanced translocations involve two chromosome ends; therefore, there are 88 genomic gains and losses corresponding to 78 rearrangements. Since these chromosome abnormalities were ascertained retrospectively from multiple labs, they are not an unbiased collection; however, the large number of rearrangements in our study gave us confidence that diverse mutational mechanisms involved in subtelomeric rearrangements would be captured.
When available, we studied parents via FISH to determine the inheritance of subtelomeric rearrangements. Of the 28 rearrangements where family trios were analyzed, 22 were de novo, three were inherited in an unbalanced form from a mother with a balanced translocation and three were inherited (Supplementary Material, Table S1). In one of the three inherited imbalances, the mother with the same rearrangement was also affected; however, in the other two trios, parents were not cognitively evaluated. Based on these family studies, we conclude that the majority of subtelomeric rearrangements in our study represent large, de novo genomic changes that are most likely pathogenic and responsible for the referring diagnosis.
Determining the parent of origin for de novo subtelomeric rearrangements can shed light on the timing and mechanisms of rearrangement formation. Because most subtelomeric rearrangements are not mosaic in blood, which is the predominant tissue type sampled in clinical cytogenetics testing, we assume that they occur in pre-meiotic cells, during meiosis, or in early embryogenesis. In addition, maternal and paternal parent-of-origin biases have been found for subtelomeric rearrangements involving chromosomes 1p and 18q, respectively (27,28). For six de novo subtelomeric rearrangements in this study, we investigated the parent of origin using Affymetrix Genome-Wide SNP 6.0 chips. We found that rearrangements from LM219, U215 and EGL097 were paternally derived, and rearrangements from EGL205, EGL209 and EGL225 were maternally derived (Supplementary Material, Table S1). Six of the 18q terminal deletions in our study had already been analyzed for parent of origin using microsatellite markers (28). Among this group, five out of six de novo deletions occurred on the paternal allele (Supplementary Material, Table S1).
Using array CGH, we fine-mapped 78 subtelomeric rearrangements that represent 88 genomic gains and losses corresponding to 102 breakpoints. The median gap between oligonucleotide probes at breakpoints is 213 bp (Supplementary Material, Table S1). Thus, our high-resolution array CGH experiments narrow the region around chromosome breakpoints to a few hundred bp. However, to truly capture the mode of DNA repair, it is essential to sequence breakpoint junctions. To this end, we have cloned and sequenced 21 breakpoint junctions via a PCR-based strategy.
We inspected breakpoint junction sequences to infer the mutational mechanism responsible for each rearrangement (29). Rearrangements with significant sequence homology (hundreds of bp to hundreds of kb) between the edges of the breakpoint are indicative of NAHR (30). However, breakpoints that lack long stretches of homology may have a few nucleotides of microhomology at breakpoint junctions. Such junctions may be the product of NHEJ (31,32), microhomology-mediated end joining (MMEJ) (33) or DNA replication processes (34–37). Below, we describe breakpoint junctions in the context of the type of subtelomeric rearrangement: terminal deletions, interstitial deletions and unbalanced translocations.
We amplified and sequenced 14 out of 51 (27%) of terminal deletion junctions. Beginning with breakpoints determined by array CGH, we designed one primer complementary to the proximal (non-deleted) chromosomal region and used one primer complementary to the telomere repeat sequence (38). After PCR, we cloned and sequenced the breakpoint junctions to capture the post-repair junction sequence.
There are between 2 and 6 nucleotides of microhomology at 10 out of 14 (71%) of terminal deletion junctions (Fig. 2 and Supplementary Material, Fig. S2). End-joining of double-strand breaks in a subtelomere and a telomere repeat sequence would give rise to these types of terminal deletion junctions. Another possibility is that terminal deletions are repaired via de novo synthesis of a new telomere repeat at the site of a double-strand break, a model put forth by others (16,21,38,39). ‘Chromosome healing’ occurs in ciliates when telomerase adds telomeric repeats to non-telomeric double-strand breaks (40). Human telomerase can synthesize telomeres from non-telomeric sites in vitro (41), but this process has not been demonstrated in human cells (42).
We also identified two terminal deletion junctions that are suggestive of DNA replication processes. The terminal deletion junction derived from LM204 has a 7 bp tandem duplication at the breakpoint, which is typical of sites of serial replication slippage (35) (Fig. 2). Sequencing the junction of EGL098's 22q terminal deletion revealed a 16 bp sequence that did not align to the 22q breakpoint region identified by array CGH or the telomere repeat at the other end of the junction. This sequence corresponds to a region that is 3 kb distal of the 22q breakpoint, but that lies in an inverted orientation relative to the proximal breakpoint on 22q and the telomere repeat (Fig. 3). This type of complex junction has been described in other rearrangement breakpoints and fits the fork stalling and template switching (FoSTeS) model (43–45).
We fine-mapped the breakpoints of 10 interstitial deletions and four interstitial duplications by array CGH and sequenced the breakpoint junctions of three interstitial deletions (Supplementary Material, Table S1). We attempted to amplify three interstitial duplication junctions using a strategy designed to capture breakpoints of tandem or inverted duplications (46), but we were unsuccessful in sequencing any of the duplication junctions.
Sequencing the interstitial deletion junction of EGL094 revealed three nucleotides of microhomology, consistent with NHEJ between two double-strand breaks in the short arm of chromosome 4 (Supplementary Material, Fig. S2). Alternatively, the 54-kb deletion in EGL094 could be the product of a FoSTeS event involving a template switch from one side of the deletion to the other. Interstitial deletions in EGL049 and SCH3 are the product of homologous recombination between Alu elements, and sequencing across the breakpoint junctions of these NAHR events revealed the hybrid repeat transitioning from one element into another (Supplementary Material, Fig. S3). In both rearrangements, the recombining elements are in direct orientation and are in the same repeat class, as would be expected for a true NAHR event. As represented in the reference genome, the distal and proximal AluSp elements that flank EGL049's 17p deletion are 293 and 370 bp, respectively, and 87% identical. The proximal and distal AluYs that flank SCH3's 9q deletion are 302 and 307 bp, respectively, and 88% identical.
We fine-mapped the breakpoints of 11 unbalanced translocations and sequenced four breakpoint junctions. There is little or no microhomology at the translocation junctions in 18q–71c, LM218 and EGL102, consistent with NHEJ between double-strand breaks in two different chromosomes (Supplementary Material, Fig. S2). 18q–71c's rearrangement was originally identified as a terminal deletion by array CGH; however, breakpoint sequencing revealed a cryptic translocation between 18q and a segmental duplication that maps to the end of chromosome 4p. Since segmental duplications at the ends of chromosomes are extremely polymorphic (47), we cannot be certain that the translocated segment was derived from 4p and not another chromosome end.
Sequencing across the unbalanced translocation junction in LM221 revealed a hybrid LINE at the breakpoint, consistent with interchromosomal NAHR (Supplementary Material, Fig. S3). The translocation was mediated by homologous recombination between L1PA2 elements in direct orientation on chromosomes 6 and 16. The L1PA2s on 6p and 16q in the reference genome are 5767 and 6003 bp, respectively, and 96% identical.
Analysis of subtelomeric breakpoint junctions reveals mechanisms of DNA repair; however, the factors involved in the initial double-strand break or DNA replication error are also critically important for understanding the mechanisms of rearrangement formation. Previous studies using FISH or BAC-based array CGH strategies suggested that some subtelomeric breakpoints are relatively close to one another, pointing to a potential recurrent breakage site (48). However, these approaches resolved breakpoints to tens or hundreds of kilobase regions, at best.
Higher-resolution array CGH and sequencing of subtelomeric breakpoints allowed us to distinguish truly recurrent breakpoints from ‘nearby’ breakpoints. We focused on five chromosome ends where we had at least eight breakpoints to sample a representative number of breakpoints per subtelomere. As shown in Figure 4, the spectrum of subtelomeric breakpoints in 4p, 9q, 17p, 18q and 22q differ among chromosome ends. Breakpoints in 18q are distributed throughout the end of the chromosome, consistent with previous studies (28). Breakpoints in 9q appear to cluster, including breakpoints from EGL057 and EGL096 that are only 4.6 kb apart (Fig. 4 and Supplementary Material, Table S1). These breakpoints are not identical, but may cluster due to a local genomic architecture that is susceptible to rearrangement. This 9q hotspot lies ~2 Mb from the end of the chromosome and does not overlap with other 9q breakpoint clusters reported in the literature (21,48).
Remarkably, four of the 12 breakpoints in 22q lie within 320 bp in chromosome band 22q13.33. Breakpoints from three terminal deletions (EGL069, EGL073 and EGL097) and one interstitial duplication (EGL108) lie between exons 8 and 9 of the SHANK3 gene, a breakpoint also reported before in the studies of 22q terminal deletions (19,49). These data suggest that this site is a hotspot for genomic rearrangement, most often manifested as a terminal deletion that causes the 22q13 monosomy syndrome (OMIM 606232). Though the interstitial duplication present in EGL108 has a distal breakpoint within the same hotspot region, without determining the orientation of the duplication, we cannot know how this rearrangement will impact the organization of the SHANK3 gene.
Subtelomeric rearrangements have important implications for human disease and evolution. Genomic studies of subtelomeres have revealed that human chromosome ends are subject to elevated rates of meiotic recombination, sister chromatid exchange and DNA transfer (47,50–52). For over 40 years, subtelomeric rearrangements have been recognized as a significant cause of intellectual disability. Previous studies showed that, unlike some chromosome abnormalities that cause genomic disorders, subtelomeric rearrangements do not typically have recurrent breakpoints (8–10). This led some to propose that subtelomeric rearrangements are the product of ‘random chromosome breakage’ (53).
Here we show that subtelomeric rearrangements arise via distinct mutational mechanisms. We sequenced 21 breakpoint junctions and found that 18 junctions (86%) have little or no homology and that three junctions (14%) have hundreds of bp of sequence homology between recombining segments, typical of NAHR. The frequency of these classes of events is similar to what is seen in CNVs from control populations, where NAHR accounts for 9 (22), 10–15 (23) and 22% (24) of sequenced breakpoint junctions.
NAHR events give rise to reciprocal deletions and duplications, as well as translocations (26,54,55). However, the majority of NAHR-mediated chromosome rearrangements described in the literature are the product of recombination between large segmental duplications also known as low-copy repeats (56,57). NAHR events between smaller interspersed repeats, such as LINEs and Alus, are overlooked by most array-based studies and are only captured via sequencing (24,58,59). Though the deletions and translocation detected in SCH3, EGL049 and LM221 were all originally identified by clinical diagnostic array CGH, the NAHR substrates responsible for the rearrangements were only detected via high-resolution array CGH and sequencing. Thus, NAHR events in clinically recognized subtelomeric rearrangements are likely underestimated. Furthermore, sequence variation in interspersed repeats may affect the propensity of two elements to recombine. The recombining repeats in SCH3, EGL049 and LM221 were 88, 87 and 96% identical, respectively, as represented in the human genome assembly. However, we have not sequenced the parental repeats that recombined in these individuals, and it is possible that gene conversion between repeats may make some more homologous than represented in the reference genome (60,61).
Like other studies of pathogenic CNV breakpoints (62), the majority of breakpoint junctions in our study lack significant sequence homology and may arise via NHEJ, MMEJ, DNA replication errors, or de novo telomere synthesis. NHEJ and MMEJ can be distinguished in experimental systems by disrupting proteins in either repair pathway and monitoring the amount of microhomology at breakpoint junctions (31–33), but we find that the amount of microhomology at subtelomeric breakpoint junctions does not fall into discrete categories that would distinguish NHEJ from MMEJ. We presume that translocations with little sequence homology at the edges of the junction are the product of and end-joining process between two double-strand breaks. However, terminal deletions may be the product of end-joining or de novo telomere synthesis (16,38,39).
Errors in DNA replication can also give rise to chromosome rearrangements, and some replication events reveal characteristic breakpoint junctions (17,34–37,62). Serial replication slippage generates short deletions and duplications at breakpoints, (35) like the breakpoint junction in LM204 (Fig. 2). Analyses of non-recurrent breakpoint junctions suggest that FoSTeS may be responsible for breakpoints with insertions of short duplicated and/or inverted segments (43–45,63). Of the 21 breakpoint junctions we sequenced, only the 22q breakpoint from EGL098 had an inverted insertion consistent with FoSTeS. This 16 bp insertion did not affect our PCR-based junction sequencing; however, other larger insertions would escape detection by this method.
Sequencing breakpoint junctions reveals the mechanism of DNA repair, points to the site of chromosome breakage and determines rearrangement organization, which could impact the regulation of nearby genes. Though sequencing breakpoint junctions was a high priority for this study, we were only able to sequence 21 out of 78 (27%) junctions (Supplementary Material, Table S1). We attempted to amplify 58 breakpoints as described in Methods, and captured breakpoint junctions in 21 rearrangements, giving us an overall success rate of 21 out of 58 (36%). In some cases, we exhausted the subject DNA stock before we could optimize PCR conditions, but in others we tried multiple PCR conditions and primer pairs. These recalcitrant breakpoints likely constitute more complex breakpoint junctions than predicted by high-resolution array CGH. Small local deletions and duplications are not uncommon at breakpoint junctions (18,20,21,62) and we suspect that many of the missing junctions in our study fall into this category.
We were able to identify one such cryptic rearrangement by amplifying the translocation junction in 18q–71c, which appeared to be a terminal deletion by other methods. However, we were unable to sequence the junctions of any terminal or interstitial duplications. Other groups have had similar problems in sequencing duplication junctions (22,23,43), which leads us to suspect that many are more complicated than perfect tandem or inverted duplications. Capturing complex junctions will require an unbiased approach that does not depend on inferring the correct rearrangement structure.
There are likely many factors involved in initiating the double-strand breaks that give rise to subtelomeric rearrangements. Large data sets of sequenced breakpoints coupled with functional assays of chromosome fragility will help us determine the roles of DNA sequence and chromatin in double-strand breaks and DNA replication errors. As described in Vissers et al. (62), other pathogenic CNV breakpoints are enriched in sequence motifs that may stimulate CNV formation. Given the enrichment of tandem repeats at chromosome ends (47,64) and subtelomeric hotspots on 9q and 22q that coincide with tandem repeats (Fig. 4), we suspect that some subtelomeric rearrangements are initiated by breakage in particularly fragile DNA sequences. Tandem repeats have been found at other subtelomeric breakpoints (18,21), and are overrepresented in normal CNV breakpoint data sets (22,63). Like fragile sites in the human genome (65), variation in repeat track length or sequence composition could affect the risk for chromosome breakage at subtelomeric hotspots.
Furthermore, the concentration of particular repetitive sequences at chromosome ends could explain differences in the density and location of breakpoints among subtelomeres (Fig. 4). Chromosome ends enriched in ‘fragile motifs’ would be expected to be more susceptible to new rearrangements. This has been suggested for breakpoints in 1p and 9q that are particularly GC-rich and concentrated in specific classes of repetitive sequences predicted to form secondary structures (20,21,66). Of course, embryonic lethal subtelomeric rearrangements will also contribute to the ascertainment of rearrangements of chromosome ends. Subtelomeric rearrangements in gene-poor chromosome ends (e.g. 18q) are on average larger than rearrangements from gene-rich chromosome ends (e.g. 9q) (Fig. 4 and Supplementary Material, Table S1), likely reflecting the lethality of large genomic gains and losses that encompass many genes.
Determining the parental origin of chromosome rearrangements can also yield mechanistic insights into how rearrangements form and at what time in development they occur (67,68). Analysis of six trios in our study revealed an equal number of maternally and paternally derived subtelomeric rearrangements. These data are too limited to assess parental bias and include only two sequenced junctions. Thus, we cannot determine if particular classes of subtelomeric rearrangements have a parental bias. In a large cohort of 18q terminal deletions, Heard et al. (28) found a significant paternal bias as 71 out of 81 deletions were paternal in origin. In contrast, analysis of the parent of origin of 40 de novo 1p terminal deletions revealed that 24 and 16 deletions occurred on the maternal and paternal chromosomes, respectively (27). In their analysis of 9q subtelomeric rearrangements, Yatsenko et al. (21) found that 11 and six rearrangements occurred on the paternal and maternal alleles, respectively. It is possible that subtelomeric rearrangements involving particular loci or chromosome ends have a parental bias; however, as a group, subtelomeric rearrangements do not appear to have a significant parent-of-origin effect.
Most subtelomeric rearrangements involve many genes that could be responsible for an abnormal phenotype. Nevertheless, smaller rearrangements can refine critical regions involved in developmental disabilities and birth defects. Although we lack the detailed phenotypic information for the subjects in our study that would allow specific genotype–phenotype correlations, it is worth mentioning some of the smaller rearrangements that may narrow the genes involved in the referring diagnosis (Fig. 4). EGL039 and EGL049 have overlapping interstitial deletions of 17p13.3 that include the YWHAE gene, but not PAFAH1B1. Mutations in PAFAH1B1, also known as LIS1, cause lissencephaly (69) (OMIM 601545). EGL039 was referred with an indication of mild intellectual disability and short stature; EGL049 was referred for developmental delay. Parental analysis revealed that both deletions are de novo (Supplementary Material, Table S1), which is typical of pathogenic chromosome rearrangements. In addition, recent studies have reported similar deletions of 17p13.3 in children with mild intellectual disability and moderate-to-severe growth restriction (70,71). Thus, our data support the conclusion that deletions of this part of 17p13.3 cause a less severe phenotype than larger deletions that also include PAFAH1B1. We also identified an interstitial duplication of an overlapping region of 17p13.3; however, no phenotypic information was available for subject EGL042, who carries the duplication. Duplications of this region have reported in children with intellectual disability, macrosomia and dysmorphic facial features (72).
The smallest chromosome imbalance in our study is a 54-kb deletion of 4p16.3 that includes only one gene, LETM1 (Fig. 4). LETM1 lies within the Wolf–Hirschhorn syndrome (WHS) critical region 2 (73) (OMIM 194190), but subject EGL094 exhibited no facial features of WHS; rather, she was referred for testing at 1 year of age, presenting with microtia, renal agenesis, Duane anomaly and a congenital heart defect. These data suggest that loss of LETM1 is not responsible for the characteristic facial features in WHS and that other candidate genes in the critical region may be involved.
Overall, we have shown that subtelomeric rearrangements are a heterogeneous class of chromosome abnormalities caused by diverse mutational mechanisms. Most breakpoint junctions do not have significant sequence homology between recombining segments, consistent with end-joining, DNA replication errors, or de novo telomere synthesis. NAHR between interspersed repeats mediates interstitial deletions and a translocation, though we found no recurrent NAHR-mediated events in this group of subtelomeric rearrangements. We suspect that the shorter stretch of sequence homology when compared with larger segmental duplications makes Alu–Alu and L1–L1 recombination less frequent. We also identified one subtelomeric rearrangement with sequenced junctions that are consistent with the FoSTeS model of replicative DNA repair. Ultimately, as with other CNVs in the human genome, there is no single mechanism to account for subtelomeric rearrangement formation.
Samples from subjects with subtelomeric rearrangements were ascertained from Emory Genetics Laboratory (EGL), Signature Genomic Laboratories (SG), Seattle Children's Hospital (SCH), the Chromosome 18 Clinical Research Center (18q-) (28,74), the Unique Rare Chromosome Support Group (U) and the Ledbetter and Martin laboratories (LM) (75,76) (Supplementary Material, Table S1). Male (GM10851) and female (GM15510) reference cell lines were obtained from Coriell Cell Repositories. This study was approved by the Emory University Institutional Review Board. Samples were de-identified or obtained with informed consent per the study protocol at Emory and/or collaborating institutions.
Subjects were referred for diagnostic testing for a number of indications, typically intellectual disabilities, autism and birth defects. Detailed phenotypic information for subjects was not available. Subtelomeric rearrangements were originally analyzed in clinical diagnostic laboratories with different array comparative genomic hybridization (CGH) platforms or subtelomeric FISH assays. Array CGH results were confirmed by FISH in diagnostic laboratories using standard FISH methodologies, except in the case of four rearrangements: interstitial deletions from EGL050 and EGL094 and interstitial duplications from EGL072 and EGL108 were too small to analyze via FISH. FISH confirmed the imbalance detected by array CGH (eliminating false positives), determined the rearrangement structure, and guided our breakpoint sequencing strategy. We confirmed all subtelomeric rearrangements ascertained from clinical diagnostic labs in our targeted array CGH experiments. Thus, there were no false positives in our subtelomeric rearrangement dataset.
Using a 244K platform from Agilent Technologies, we designed four custom arrays to cover the terminal 5 Mb of 41 chromosome ends (excluding acrocentric p arms and the Y chromosome), providing a mean probe spacing of one oligonucleotide per 240 bp. Oligonucleotides were designed using Agilent's eArray program (https://earray.chem.agilent.com/earray/). To minimize non-unique oligonucleotides that would not be informative in array CGH, we performed an HD probe search to prefer existing ‘catalog probes’ and we used the most stringent ‘similarity score filter’ designed to select probes that hybridize to only one genomic location. The unique identifiers (AMADIDs) for the array designs are 021634, 021635, 021636 and 021637 for chromosomes 1–5, 6–10, 11–17 and 18-X, respectively. For breakpoints outside of the terminal 5 Mb, we designed other targeted arrays (AMADIDs 08181, 23978, 27686, 29464 and 97831). All array designs are available upon request.
We extracted genomic DNA from most cell lines and peripheral blood samples using the Gentra Puregene DNA Extraction Kit (Qiagen, Valencia, CA). SG DNA samples were prepared from blood using the Qiagen M48 Biorobot for automated DNA extraction with standard conditions (Qiagen, Valencia, CA). Subject DNA was co-hybridized with reference DNA from either GM10851 or GM15510. Arrays were scanned using a GenePix 4000B scanner (Molecular Devices, Sunnyvale, CA) or the Agilent high-resolution C scanner (Agilent Technologies, Santa Clara, CA), and signal intensities were evaluated using Feature Extraction Version 184.108.40.206 software (Agilent Technologies, Santa Clara, CA). We used DNA Analytics Version 4.0 software (Agilent Technologies, Santa Clara, CA) to analyze the array data and call breakpoints (Supplementary Material, Table S1 and Fig. S1).
Starting with breakpoints identified by array CGH, we designed PCR primers to amplify putative breakpoint junctions. For terminal deletions, we designed a primer complementary to the intact (non-deleted) side of the junction and paired this primer with one of two telomere primers, 5′-CCCTAACCCTAACCCTAACCCTAACCCTAA-3′ or 5′-TATGGATCCCTAACCCTGACCCTAACCC-3′ (38). For unbalanced translocations, we designed PCR primers to amplify the junction from the derivative chromosome to the translocated segment. We amplified interstitial deletion junctions by designing primers complementary to the edges of the deletion. We performed PCR using TaKaRa Ex Taq polymerase (Clontech Laboratories, Inc., Madison, WI) with 1× PCR buffer, 0.2 mm dNTP, 8 pmol of each primer and 50–100 ng of DNA template. PCR conditions for amplifying terminal deletions were: 94°C for 1 min; 10 cycles at 94°C for 30 s, 65°C for 1 min (decreasing 1°C per cycle), 72°C for 3 min; 20 cycles at 94°C for 30 s, 59°C for 1 min, 72°C for 3 min; and the final extension at 72°C for 5 min. Conditions for other PCRs were: 94°C for 1 min, 30 cycles at 94°C for 30 s, 56°C for 30 s and 72°C for 1 min/kb of expected product, followed by the final extension at 72°C for 5 min. We purified PCR products from agarose gels using the QIAquick gel extraction kit (Qiagen, Valencia, CA), and then cloned them into a TOPO-TA vector following the manufacturer's protocol (Invitrogen, Carlsbad, CA). We transformed the ligated construct into SURE 2 Supercompetent Cells (Agilent Technologies, Cedar Creek, TX) following the manufacturer's protocol. We propagated plasmids in recombination-deficient SURE 2 Escherichia coli to prevent rearrangement of the cloned insert.
We purified plasmid DNA (Qiagen Miniprep kit, Valencia, CA) and submitted plasmids for sequencing (Beckman Coulter Genomics, Danvers, MA). DNA sequences were analyzed by comparing reads to the human genome reference assembly (NCBI36/hg18) using the BLAT tool on the UCSC Genome Browser (http://genome.ucsc.edu/). All junction sequences are listed in Supplementary Material, Figs S2 and S3.
Parent of origin was determined for six rearrangements by analyzing proband and parental DNA using the Affymetrix Human Genome-Wide SNP Array 6.0 at Emory University. For other family trios, parental DNA was not available. Genotyping was performed with the Birdseed algorithm, as implemented in Affymetrix Power Tools software. For genomic losses, parent of origin was determined by inferring the origin of the missing allele in the proband. For genomic gains, we analyzed informative SNPs in SNP cluster graphs to determine the origin of the additional allele.
The URLs for data presented herein are as follows:
Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/Omim/
UCSC Genome Browser, http://genome.ucsc.edu/
Tandem Repeats Finder (TRF), http://tandem.bu.edu/trf/trf.html
This study was supported by a grant from the NIH (MH092902 to M.K.R.).
The authors wish to thank participating families, referring clinicians and the Unique Rare Chromosome Disorder Support Group for their contributions to this project. We thank Erin B. Kaminsky for sharing unpublished data and for insightful discussions. Ian Goldlust, Heather Mason-Suares, Brooke Weckselblatt, Tara Wood and Becca Cozad performed array CGH and sequenced breakpoint junctions. We thank Cheryl Strauss for editorial assistance.
Conflict of Interest statement. None declared.