|Home | About | Journals | Submit | Contact Us | Français|
Array comparative genomic hybridization (aCGH) is a powerful tool for the molecular elucidation and diagnosis of disorders resulting from genomic copy-number variation (CNV). However, intragenic deletions or duplications—those including genomic intervals of a size smaller than a gene—have remained beyond the detection limit of most clinical aCGH analyses. Increasing array probe number improves genomic resolution, although higher cost may limit implementation, and enhanced detection of benign CNV can confound clinical interpretation. We designed an array with exonic coverage of selected disease and candidate genes and used it clinically to identify losses or gains throughout the genome involving at least one exon and as small as several hundred base pairs in size. In some patients, the detected copy-number change occurs within a gene known to be causative of the observed clinical phenotype, demonstrating the ability of this array to detect clinically relevant CNVs with subkilobase resolution. In summary, we demonstrate the utility of a custom-designed, exon-targeted oligonucleotide array to detect intragenic copy-number changes in patients with various clinical phenotypes.
High-resolution human genome analysis by array comparative genomic hybridization (aCGH) has revolutionized our ability to identify both benign copy-number variation (CNV) [Conrad et al., 2010b; Iafrate et al., 2004; Redon et al., 2006; Sebat et al., 2004] as well as pathogenic copy-number changes associated with genomic disorders [Lupski, 1998, 2009]. The pathogenic mechanism for these disorders, which involve genomic losses or gains of various sizes, is often dosage sensitivity of one or more of the genes within the rearranged genomic interval, but gene interruption, gene fusions, and position effects are increasingly recognized mechanisms mediating downstream effects of CNVs [Lupski and Stankiewicz, 2005]. Array CGH has enabled the detection of submicroscopic CNV (i.e., microdeletions and microduplications). To date, dozens of disorders have been ascribed to this type of genomic aberration [Mefford and Eichler, 2009; Stankiewicz and Lupski, 2010]. Recurrent microdeletions and microduplications occur via nonallelic homologous recombination (NAHR), with the “fixed” size of the reciprocal rearrangements reflecting the genomic positions of flanking, directly oriented repeat sequences utilized as homologous recombination substrates [Stankiewicz and Lupski, 2002]. In contrast, nonrecurrent rearrangements vary in size from genomic alterations involving megabases of DNA, to single-gene duplication/triplication, to CNV of single exons [Zhang et al., 2009a, 2010b]. Such nonrecurrent CNV occur by nonhomologous end joining (NHEJ) or by the recently described replication-based mechanisms of fork stalling and template switching/microhomology-mediated break induced replication (FoSTeS/MMBIR) [Hastings et al., 2009a,b; Lee et al., 2007].
Deletion or addition of one or more exons in a gene can have varied molecular and phenotypic consequences. A shift in reading frame can result in a premature termination codon, typically followed by nonsense-mediated decay (NMD) to create a loss of function allele [Maquat 1995]. Escape from NMD is possible, which may cause disease by gain of function [Ben-Shachar et al., 2009; Inoue et al., 2004]. Rarely, premature stop codons may also promote exon skipping (nonsense-associated altered splicing; NAS), which has the potential to restore the reading frame [Dietz et al., 1993; J. Wang et al., 2002]. An in-frame loss or gain may result in an altered [Yatsenko et al., 2003] or fused [Lifton et al., 1992; Miyahara et al., 1992] protein product with reduced or novel function. Thus, although haploinsufficiency may result from exonic CNV [Zhang et al., 2009b, 2010a], novel hypomorphic, antimorphic, and even neomorphic mutant alleles may be generated.
Exon-targeted aCGH (i.e., aCGH using an array with probes concentrated disproportionately in exons) can have either genome-wide or focused coverage. Genome-wide exonic arrays have been used to measure mRNA expression [Kapur et al., 2007], which unlike traditional 3′ expression arrays allows alternative splicing to be assessed [Clark et al., 2007; Gardina et al., 2006; Thorsen et al., 2008; Yeo et al., 2007]. This technique has enabled the discovery of tissue- and tumor-specific splice variants. Similar studies have been performed using locus-specific expression exon arrays [Labeit et al., 2006].
In addition to assessing gene expression, exon arrays have been used to assess genomic content. Bailey et al.  studied nine healthy HapMap individuals using an array with exonic coverage for 2,790 genes. This study uncovered substantial CNV, disproportionately localized to regions containing segmental duplications. Although a catalog of benign intragenic CNV has been found by this and other studies [Conrad et al., 2010b], array-based detection of clinically relevant intragenic CNV remains in its infancy.
Hegde et al. , del Gaudio et al. , and Bovolenta et al.  each designed a genomic microarray spanning the length of the dystrophin (DMD) gene. Although these single-locus arrays were not strictly exon targeted, the density and distribution of probes were sufficient to detect exonic (and intronic) CNV within the DMD locus in patients suspected of having mutations in this gene. Wong et al.  also detected exonic CNV, using an array with dense coverage of 130 nuclear genes implicated in mitochondrial and metabolic disorders. This demonstrated the utility of a single nonexon-targeted array to detect intragenic CNV in multiple related genes.
Dhami et al.  constructed an exon-specific array with coverage for 162 exons of five genes implicated in unrelated conditions (COL4A5, DMD, NF2, PLP1, and PMP22). Similarly, Saillour et al.  performed aCGH to assess copy-number variation among 158 exons in eight disease genes (CFTR, DMD, and six sarcoglycan genes), as did Staaf et al.  for the exons of six cancer-related genes (BRCA1, BRCA2, MSH2, MLH1, PTEN, and CDKN2A). Tayeh et al.  constructed a targeted array with exonic (and intronic, with slightly diminished resolution) coverage of 71 disease genes, predominantly implicated in lysosomal storage and metabolic disorders. Significantly, this array was used in a clinical diagnostic setting in cases where gene sequencing failed to detect a mutation or mutations sufficient to explain a patient’s disease. The aforementioned studies provided proof-of-concept that a targeted exon array could be used to diagnose disparate disorders caused by intragenic copy-number changes. Yet, as the patients assessed in these studies were a selected population of previously-diagnosed (either clinically or molecularly) individuals, an array-based methodology to detect clinically relevant exonic copy-number changes genome-wide in unscreened or undiagnosed individuals has not yet been described.
As part of our continuing effort to clinically implement high resolution human genome analysis [Cheung et al., 2005, 2007; Lu et al., 2007, 2008; Ou et al., 2008; Shao et al., 2008], we sought to indentify CNV of smaller sizes (i.e., kilobasepairs in length, containing only one or a few exons) in functionally relevant regions of the human genome. To do this, we designed and developed a whole-genome microarray with coverage of approximately 24,000 exons in over 1,700 clinically relevant and candidate disease genes. This approach enables detection of intragenic copy-number changes in patients with varied clinical presentations that would otherwise be missed by traditional aCGH and would not be detected by gene-specific diagnostic DNA sequencing.
V8 OLIGO is a custom-designed array with approximately 180,000 interrogating oligonucleotides, manufactured by Agilent Technologies, Inc. (Santa Clara, CA). This array contains the “best-performing” oligonucleotides (oligos) selected from Agilent’s online library (eArray; https://earray.chem.agilent.com/earray/) and has been further empirically optimized. Genomic features of the V8 OLIGO design include interrogation of all known microdeletion and microduplication syndrome regions as well as pericentromeric and subtelomeric regions and computationally predicted NAHR-mediated genomic instability regions flanked by low-copy repeats (LCR) as previously described [El-Hattab et al., 2009]. In addition, ~1,700 selected known or candidate disease genes have exonic coverage (101,644 probes in 24,319 exons; average of 4.2 probes/exon) as well as introns greater than 10 kb. The entire nuclear genome is covered with an average resolution of 30 kb, excluding LCRs and other repetitive sequences. Six hundred seventy probes interrogating the mitochondrial genome (average resolution of 25 bp) are also included. Further details are available at https://www.bcm.edu/geneticlabs/.
All genomic coordinates are based on the March 2006 assembly of the reference genome (NCBI36/hg18).
Clinical aCGH was performed on 3,743 samples referred to the Medical Genetics Laboratory at Baylor College of Medicine (BCM) from June 2009 to March 2010. Cases 1 and 2, reported herein, were analyzed on our V7.4 OLIGO array prior to this period, and cases 15 and 29 were analyzed subsequent to it. Informed consent, approved by the Institutional Review Board for Human Subject Research at Baylor College of Medicine, was obtained in cases for which an image of the subject is provided.
DNA was extracted from whole blood using the Puregene DNA Blood Kit (Gentra, Minneapolis, MN) according to the manufacturer’s instructions.
The procedures for DNA digestion, labeling, and hybridization for the oligo arrays were performed according to the manufacturers’ instructions, with minor modifications [Ou et al., 2008]. Slides were scanned into image files using the Agilent G2565 Microarray Scanner. Scanned images were quantified using Agilent Feature Extraction software (v9.0), then analyzed for copy-number change using our in-house analysis package, as described previously [Cheung et al., 2005; Ou et al., 2008; Shaw et al., 2004].
FISH analyses were performed with probes derived from bacterial artificial chromosomes (BACs) or fosmids using standard procedures [Shaffer et al., 1997]. Probe IDs are listed as part of the cytogenetic diagnoses provided as Supp. Data.
MLPA analysis was performed using the SALSA MLPA kit (MRC-Holland, Amsterdam, The Netherlands), according to the manufacturer’s instructions. Probe sets are described in the Supp. Methods. Additional information about commercially available probe sets is available at http://www.mrc-holland.com.
Long-range PCR was performed using the TaKaRa LA PCR Kit (TaKaRa Bio, Inc., Shiga, Japan). Reaction volume was 25 μl, containing 100 ng DNA, 0.5 μM of each primer, 400 μM of each dNTP, and 1.5 units TaKaRa LA Taq in 1 × LA PCR Buffer II. Primer sequences are provided in the Supp. Methods. The PCR was performed in a thermal cycler using the following conditions: 94°C × 1 min; 30 cycles of either 94°C × 30 sec followed by 68°C × 7 min, or 98°C × 5 sec followed by 68°C × 15 min; 72°C × 10 min. Agarose gel electrophoresis of amplification products enabled a comparison of amplicon sizes using patient DNA to those using control DNA from normal individuals.
PCR products were cleaned with ExoSAP-IT (USB, Cleveland, OH), according to the manufacturer’s instructions, and nucleotide sequences determined by Sanger di-deoxynucleotide sequencing (Lone Star Labs, Houston, TX).
Of 3,743 aCGH analyses performed, the most common finding was a normal result, consistent with our previous experience with unfiltered clinical samples referred to a genetic diagnostic laboratory [Lu et al., 2007]. In addition to detecting many large genomic deletions and duplications, more than 40 cases of intragenic copy-number changes—deletions and duplications spanning a portion of a gene—were identified, a subset of which are presented in this report (Table 1). These 31 CNVs (30 copy-number losses and one gain) range in size from less than 1 kb to more than 105 bp (Fig. 1A); in fact, the smallest CNV analyzed by DNA sequencing was 502 bp (see below). The CNVs were found throughout the genome—on 14 of 22 autosomes and on the X chromosome (Fig. 1B). Observed CNVs appear to be overrepresented on the X chromosome (7/31 CNVs). This finding is consistent with: (1) the large number of X chromosome genes having enhanced exon coverage on our array (163; more than for any other chromosome), (2) the large number of confirmed, disease-associated loci on the X chromosome listed in OMIM (135; more than for any other chromosome except chromosome 1; ftp://ftp.ncbi.nih.gov/repository/OMIM/genemap), and (3) hemizygous expression of most X chromosome genes. Three of these X chromosome CNVs were found in females (cases 1, 4, and 7), each of which occurred in a gene implicated in X-linked dominant disease, while the rest occurred in males. Twenty-nine CNVs spanned one or several exons; 5′, 3′, and central exons were all represented among them. Two CNVs encompass a single intron of a gene. The CNVs of interest in three patients (cases 4, 12, and 13) were found by FISH to be mosaic (Table 1 and Supp. Data). All array findings summarized in Table 1 have been independently identified by an alternative molecular technique, including FISH (see Supp. Data), PCR (Supp. Figs. S1–S5), or MLPA (Supp. Figs. S6–S7). None of these findings represent known benign CNVs listed in the Database of Genomic Variants (DGV; http://projects.tcag.ca/variation/). Although multiple instances of copy-number change may occur in a single patient, only the deletions or duplications considered most likely to be clinically relevant are detailed in Table 1. Additional CNVs are listed in Supp. Table S1. In eight instances, PCR was followed by DNA sequencing of the breakpoints of the rearrangement, providing further confirmation and inference by conceptual translation as to how gene structure and genetic information might have been disrupted, thus aiding the elucidation of the molecular mechanism of disease.
In 15 cases, a robust genotype–phenotype correlation could be established (Table 2); mutations in the gene disrupted by a CNV in these subjects are known to cause a disease that matches their clinical phenotypes. Of 12 cases for which parents were available for testing, the CNV was found to be de novo in eight, a maternally inherited, X-linked CNV in a male in two, and an autosomal CNV inherited from an affected parent in one. In the remaining case (case 11; NRXN1 deletion), a CNV was found to be inherited from a parent who did not share the patient’s clinical presentation, suggesting either reduced penetrance or undetected potential mosaicism in the tissue of clinical interest. Many of the CNVs found in these patients are novel, adding to the spectrum of mutations associated with their respective disease phenotypes.
Cases involving exonic losses in MECP2, PTEN, ZDHHC9, FAM58A, and HPRT1 are featured in Figures 2–6, respectively, and described in more detail below. These cases are representative of subjects in whom we detected an intragenic CNV causative of the patient’s disease phenotype.
This patient is a 14-year-old female with epilepsy, scoliosis, and absent verbal skills, who lacks the ability to walk. She was born at 40 weeks gestation and developed normally until 6 months of age, after which she lost the ability to sit independently and ceased babbling. An abnormal electroencephalogram was noted at 2 years of age, and frank seizures began at age 5. Scoliosis was noted, and corrective surgery was performed at age 10. DNA sequencing of methyl CpG binding protein 2 (MECP2) was performed at age 10, with no detectable mutations at that time. A karyotype was normal (46,XX), as were the results of multiple biochemical tests. A recent MRI was unremarkable. Currently, the patient is wheelchair bound, but can stand and move her legs with significant support. She is intolerant of heat, and is treated with supplementary vitamin D for osteopenia. Physical exam revealed a nondysmorphic girl with a height of 153.5 cm (~10th centile), weight of 40 kg (~10th centile), and head circumference 53 cm (~25th centile). Some residual scoliosis was noted. Fingers were tapered, and the left third finger was in a swan neck position. The first toes were long bilaterally, and a cutaneous 2–3 toe syndactyly was noted. Muscle bulk was appropriate, although tone was increased in both the upper and lower extremities.
Array CGH revealed a heterozygous genomic loss of about 1 kb, spanning exon 3 and part of exon 4 of MECP2 (Fig. 2A–C). The deletion was confirmed with MLPA (Fig. 2C–E). MLPA of parental samples demonstrated that this is a de novo loss. Both the molecular evidence and the patient’s clinical history and physical are consistent with Rett syndrome (MIM#312750). Heterozygous deletion of exon 3 and part of exon 4 has been previously described as an etiology for this X-linked dominant condition [Schollen et al., 2003].
This patient is an 8-year-old female who presented with new-onset joint pain. She was born at 36 weeks gestation, following maternal preeclampsia beginning at 28 weeks. She required resuscitation at birth and phototherapy for neonatal jaundice. Amblyopia was noted at 3 years of age, now treated with corrective lenses. At 4 years of age, she began to lose deciduous teeth, which were described as having “no roots.” At age 8 years, she developed persistent knee and ankle pain, which improved somewhat with mechanical support and crutches. A previous skeletal survey found no abnormalities. She also has been diagnosed with anemia and eczema. The patient has a paternal half-sister, reportedly diagnosed with Proteus syndrome (MIM# 176920). Her paternal half-brother has macrocephaly and autism, by report. Physical exam revealed a well-developed girl with age-appropriate behavior. She had a high forehead and frontal bossing (Fig. 3A), with head circumference 59 cm (>98th centile). Height and weight were at the 95th and 90th centiles, respectively. Additionally, she had a bifid uvula, high arched palate, prominent tongue papillae, mild micrognathia, an enlarged, tender thyroid, and two cervical nevi (Fig. 2A). Examination of the extremities revealed a tender right knee and ankle, tapered fingers, and decreased range of motion at the dorsal interphalangeal joints. A thyroid ultrasound showed gland enlargement but no nodules. Further evaluations are underway.
Array CGH revealed a heterozygous genomic loss of 8–26 kb, spanning exons 3–5 of phosphatase and tensin homolog (PTEN) (Fig. 3B–D). Deletion of these exons was confirmed with MLPA (Fig. 3D–F). No deletion was present in the patient’s mother, who is also macrocephalic (head circumference 59 cm, >98th centile). The patient’s father declined to be tested. The clinical features of this patient are consistent with Bannayan-Riley-Ruvalcaba syndrome (BRRS; MIM# 153480), owing to the early clinical presentation and family history, however Cowden syndrome (MIM# 158350) remains a possible diagnosis. Both BRRS and Cowden disease are dominant genetic conditions resulting from mutations in PTEN [Liaw et al., 1997; Marsh et al., 1997]. Deletion of exons 3–5 has not been described previously in patients with BRRS. Complete deletion of exons 3–5 is predicted to remove 328 nucleotides from PTEN mRNA, resulting in a frame shift and premature termination codon. It is of note that the patient’s paternal half-sister has been reportedly diagnosed with Proteus syndrome (MIM# 176920) and that the paternal half-brother is reported to be affected by macrocephaly and autism. It is possible that, despite differing clinical signs, they share the mutant PTEN allele. Point mutations in PTEN have been described in patients with Proteus syndrome and/or Proteus-like syndrome [Smith et al., 2002; Zhou et al., 2001]. Some authors, however, failed to find intraexonic point mutations in Proteus syndrome patients [Barker et al. 2001; Biesecker et al., 2001; Thiffault et al., 2004], although they did not test the possibility of other types of mutations, including CNV.
This patient is a 4-year-old male, who presented for evaluation secondary to developmental delay. He had significant behavioral problems, including aggressiveness to others and himself and head banging. He also had significant speech delay. A developmental evaluation, however, revealed no autistic features. He had sleep difficulties with disturbances initiating sleep and frequent awakening. On physical exam, the boy had normal growth parameters and was free of dysmorphic features. He was noted to have esotropia. A brain MRI performed at 4 years of age showed a paucity of white matter, as well as patchy white matter hyperintensities on T2 weighted images.
Array CGH revealed a complete genomic loss of 6–31 kb, spanning exons 10–11 of the zinc finger, DHHC-type containing 9 (ZDHHC9) gene (Fig. 4A–C), which encodes a palmitoyltransferase [Swarthout et al., 2005]. Deletion of these exons in the patient was confirmed using MLPA (Fig. 4C–D), which did not exclude a deletion of other ZDHHC9 exons. The array, molecular, and clinical findings for this patient are consistent with ZDHHC9-related X-linked syndromic mental retardation (MIM# 300799). This syndrome was first reported by Raymond et al. , associated with hemizygous point mutations (one frameshift, two missense, and one splice site) in ZDHHC9. Four families with X-linked mental retardation were described, three of which had the additional clinical feature of Marfanoid habitus. Behavioral problems and schizophrenia were also described in one patient. Tarpey et al.  found point mutations in ZDHHC9 in two families (one frameshift and one splice site mutation) and two other individuals (two missense mutations) with X-linked mental retardation, some of which had Marfanoid habitus.
To date, no genomic deletion has been reported in ZDHHC9. MLPA identified the same genomic deletion in the patient’s brother, who has a milder developmental delay, and in their mother, who is a carrier for this X-linked recessive disorder (Fig. 4C–D). Although the patient we describe lacks a Marfanoid habitus, at his young age (younger than any patient described by Raymond et al. ), this feature may not have yet manifested.
This newborn female was evaluated secondary to multiple congenital anomalies. She was born to a 25-year-old female by spontaneous vaginal delivery. Ventriculomegaly and the presence of an abdominal mass were noted in utero. At the time of birth she was noted to have dysmorphic features (Fig. 5A), including telecanthus, a wide nasal bridge, abnormally shaped and low set ears, and a right ear pit. Cardiac exam revealed a 2/6 systolic murmur heard at the left sternal border. She had an imperforate anus and developed abdominal distension after the first feed, for which a colostomy was subsequently placed. Additionally, she had an enlarged clitoris that raised concerns for ambiguous genitalia, and limb abnormalities. Syndactyly was present in both hands and feet and clinodactyly of the left fifth digit was also noted. A skeletal survey showed multiple abnormalities including fusion of vertebral spine from C3 to C4 and S3 through S5, 11 pair of ribs, an absent middle phalanx of the left fifth finger, absent ossification on the middle phalanges of the feet, and soft tissue fusion extending from the third to fifth toes bilaterally. An echocardiogram showed both atrial and ventricular septal defects. A voiding uretero-cystogram demonstrated grade V vesicoureteral reflux on the right and grade II reflux on the left, for which she was placed on antibiotic treatment prophylaxis for urinary tract infections.
Array CGH revealed a heterozygous genomic loss of 1–16 kb, spanning exon 5 of the family with sequence similarity 58, member A (FAM58A) gene (Fig. 5B–D). The copy-number loss was confirmed by PCR (Fig. 5D–E), which enabled its size to be estimated at about 8–10 kb. The final exon of ATPase, Ca(2+)-transporting, plasma membrane, 3 (ATP2B3), which is not a known or suspected disease gene, is also involved in the deletion. Array CGH of both parents did not detect the deletion, indicating that this is a de novo mutation. The clinical and molecular features of this patient are consistent with STAR syndrome (toe Syndactyly, Telecanthus, and Anogenital and Renal malformations; MIM# 300707). Deletion of exon 5 of FAM58A has been described previously as an etiology for this X-linked dominant condition [Unger et al., 2008]. All six of the patients studied by these authors, including two patients originally described by Green et al. , were female. Our patient, the seventh described with this genetic syndrome, is also female, suggesting further that similar mutations in FAM58A in males may be lethal. Unger et al.  described complete or near-complete skewing of X-chromosome inactivation (XCI) in all patients studied. This finding, coupled with in vitro experiments, suggests that inactivating mutations of FAM58A may result in a cell-autonomous proliferation defect during fetal development. X-inactivation studies performed on our patient also revealed a complete skewing of XCI (data not shown).
This patient is a 13-month-old male with moderate developmental delay and failure to thrive. No additional clinical details could be obtained. Array CGH revealed a genomic loss of 0.3–1 kb, spanning part of exon 9 of hypoxanthine phosphoribosyltransferase 1 (HPRT1) (Fig. 6A–C). The deletion was identified also with PCR (Fig. 6C–D), and biochemical analysis confirmed an absence of detectable HPRT activity (0 nmol/min/g Hb; normal range = 400–2,200 nmol/min/g Hb). DNA sequencing of the breakpoint region demonstrated a 502 bp deletion with an 18-bp insertion (Fig. 6E–F). This 18-bp sequence is not found in the reference human genome. Interestingly, 7 bp which roughly flank each side of the deletion breakpoint are homologous to one another (Fig. 6E). These features suggest NHEJ and/or replication slippage may be responsible for the formation of this genomic rearrangement.
The genomic deletion was found by aCGH to be maternally inherited (data not shown), indicating that the patient’s mother is a carrier for this X-linked recessive disorder. The array, molecular, and biochemical findings for this patient are consistent with Lesch-Nyhan syndrome (MIM# 300322). Deletion of exon 9, the final exon of HPRT1, has been reported previously in the context of this disorder [Gibbs et al., 1990; Yang et al., 1984].
Table 3 lists cases for which a genotype–phenotype correlation is less certain. This uncertainty may be on account of limited clinical information (e.g., in case 19 it is not known whether the patient has specific features of aldolase A deficiency [MIM# 103850]/glycogen storage disease XII [MIM# 611881]), a paucity of published data linking the gene to dominant disease (e.g., case 18), or age-dependent penetrance of the associated condition (e.g., case 23; this patient is likely too young for exostoses to be present). Additionally, two patients with intronic deletions are described in Table 3 (cases 30 and 31). Proposing a genotype–phenotype correlation in these cases is somewhat speculative, although intronic rearrangements—including those near splice sites [Higgins et al., 2001; Zhuang et al., 1993], those affecting splicing by constraining intron size [L. Wang et al., 2002], and deep intronic deletions and duplications [Bovolenta et al., 2008, 2010]—have been associated with disease phenotypes. Although mutations in the genes listed in Table 3 have been described, none of the specific intragenic CNVs listed therein has been reported previously. Thus, they may represent novel mutations in known genetic disorders, or they may define novel genetic syndromes. The latter possibility is particularly intriguing and suggests that our approach may enable elucidation of gene function for some of large number of genes in the human genome that do not have a confirmed role in human phenotypic variability or disease (ftp://ftp.ncbi.nih.gov/repository/OMIM/genemap).
Breakpoint regions were sequenced for seven deletion CNVs (cases 8, 11, 18, 19, 22, 30, and 31; Supp. Table S2), revealing microhomology of 2–4 bp in four cases, extended microhomology (62 bp) with breakpoints in Alu elements in one case, and an insertion of 7 and 18 bases in one case each. Each instance of copy number loss was a “simple” (i.e., not complex) deletion. Breakpoints were also mapped and sequenced in the case of a copy-number gain (case 21; Supp. Table S2), which revealed no microhomology. This is a tandem gain of at least one additional copy, and perhaps more (PCR and DNA sequencing do not distinguish between duplications, triplications, etc.). Breakpoint coordinates are listed in Supp. Table S1. Seven of 16 breakpoints in the aforementioned cases localize to repetitive sequences (Supp. Table S2).
The development of aCGH with exon coverage has enabled the detection of intragenic deletions and duplications throughout the entire human genome. We describe multiple cases involving the deletion or duplication of one or more exons, 15 of which exhibit an obvious concordance with the patient’s phenotype. These rearrangements are consistent with autosomal dominant (cases 2, 3, 5, 9–12, 14, and 15), X-linked recessive (cases 6, 8, and 13), and X-linked dominant (cases 1, 4, 7) disorders and predisposition to disease. Further, a variety of disease processes are represented, including neurodevelopmental disorders (cases 1, 3–6, 9–13), an enzyme deficiency syndrome (case 8), and other recognizable patterns of human malformation (case 2, Bannayan-Riley-Ruvalcaba syndrome; case 7, STAR syndrome; case 14, branchiootic syndrome; and case 15, Alagille syndrome). A subset of these phenotypically concordant genomic alterations has not been reported previously, including a loss of exons 3–5 of PTEN associated with Bannayan-Riley-Ruvalcaba syndrome; losses of exons 10–11 of ZDHHC9, and of exons 7–9 of IL1RAPL1, each independently associated with mental retardation in males; loss of exons 1–4 of STXBP1 associated with childhood epilepsy and other features; and a loss of exons 6–8 of JAG1 associated with Alagille syndrome (MIM numbers associated with these conditions are listed in Table 1). Thus, our approach allows new mutations to be described for known genetic conditions.
Three of the clinically correlated CNVs (cases 4, 12, and 13) were mosaic, demonstrating the ability of our methodology to detect mosaic copy-number changes with exonic resolution. The limited availability of clinical information precludes objective assessment of clinical severity in comparison to patients with nonmosaic CNVs.
For 16 cases, no firm genotype–phenotype relationship exists. Such ambiguity can result from either an incomplete clinical history or an absence of published literature describing a clear phenotypic consequence of mutations in the gene of interest. Overwhelmingly, the genomic aberrations in these cases are previously undescribed, and as such may define novel genetic syndromes. Most of the copy-number changes we describe are heterozygous. Thus, correlation with a known disease state is only possible when a dominant condition has been described in the literature. However, it is possible that alleles that were previously described as causing recessive disease act as dominant alleles in a milder or alternative disease state, for example, splenic syndrome at altitude or sudden death in sickle cell trait [Kark et al., 1987; Lane and Githens, 1985] and intermediate defects in cholesterol regulation in individuals heterozygous for mutations in low-density lipoprotein receptor (LDLR) [Brown and Goldstein, 1974]. Single-copy deletions or duplications may also cause disease as compound heterozygotes, with a CNV on one allele “unmasking” a single nucleotide variant (SNV), for example, on the other allele [Borg et al., 2009; Kurotaki et al., 2005; Tayeh et al., 2009]. In this case, sequencing of the other allele is necessary to find the full genetic cause of recessive disease. Such a mechanism is suspected in cases of inherited copy-number changes where disease is not seen in the parent transmitting the CNV-containing allele. Further, more complex inheritance schemes, for example, a two-hit model [Girirajan et al., 2010; Lupski 2007] or the co-occurrence of two or more conditions in a single patient, each attributable to an independent genomic rearrangement [Potocki et al., 1999], are also possible.
Although exon deletions and duplications can disrupt a gene, causing loss of function, they may also constitute gain of function [Bochukova et al., 2009] or dominant negative [Inoue et al., 2004] mutations with unexpected phenotypic consequences. In addition, copy-number changes in other types of conserved sequences, for example, introns, promoters, and enhancers, can have pathogenic consequences [Bovolenta et al., 2008, 2010; Higgins et al., 2001; Lee et al., 2006; Smyk et al., 2007; L. Wang et al., 2002; Weterman et al., 2010; Zhang et al., 2010b; Zhuang et al. 1993]. We have sequenced the breakpoint regions of eight intragenic CNVs (Supp. Table S2). Each was a “simple” copy-number loss or gain, as no evidence of complex genomic rearrangement [Zhang et al., 2009a] was found. The molecular consequences of each CNV, predicted by conceptual translation, are listed in Supp. Table S2. These CNVs are anticipated to have variable, and in several cases uncertain, effects on gene expression. In four of seven sequenced intragenic deletions, microhomology ranging from two to four base pairs exists between the “upstream” and “downstream” deletion breakpoints (Supp. Table S2). This is characteristic of either NHEJ or the replication-based FoSTeS/MMBIR mechanisms [Gu et al., 2008], which have been implicated in genomic, genic, and exonic copy-number changes [Zhang et al., 2009b]. The substantial representation of breakpoint microhomology among this small group of samples and other exonic deletions [Zhang et al., 2010a] hints at the importance of replication-based mechanisms in causing clinically relevant exonic deletion syndromes throughout the genome. Microhomology at deletion breakpoints is also a common feature of benign CNV, found in 219/315 (70%) of breakpoints described by Conrad et al. [2010a]. The finding of mosaicism in three samples, for which a sequence was not obtained, is consistent with a mitotic, postzygotic mechanism of CNV generation (e.g., the replication-based MMBIR/FoSTeS). In one case (case 19), 62 bp of perfect homology is seen at deletion breakpoints that localize to two Alu elements in the same genomic orientation (Supp. Table S2). As these elements belong to differing Alu subfamilies (AluSq and AluSx) and share only 80% identity over 245 bp, NAHR (Alu– Alu recombination) [Lehrman et al., 1987] is improbable. Rather, Alu-specific microhomology-mediated deletion is likely to have generated this and other recently described CNV with breakpoints in Alu elements [Erez et al., 2009; Stankiewicz et al., 2009; Vissers et al., 2009; Zhang et al., 2009a]. The remaining two deletions contain insertions at their breakpoints of 18 (case 8) and 7 (case 30) bp in length. The 18-bp inserted sequence is not found in the human genome, whereas the 7-bp insertion is found throughout the genome, including within the 1,099-bp region deleted in patient 30. Interestingly, in both of these cases involving insertions, there exist 7-bp sequences that are repeated at each side of the deletion breakpoint. The above features of these two cases are consistent with NHEJ, with a possible contribution of replication slippage. In the final case, a tandem intragenic copy-number gain (case 21) displayed no microhomology (Supp. Table S2), suggestive of NHEJ.
In the case that an intragenic deletion spans the final exon of a gene, the possibility exists that a fusion transcript is made that incorporates part of a downstream gene. If the stop codon of the upstream gene is involved and the reading frame is preserved upon splicing to or via direct fusion with an exon of the downstream gene (exon accretion), a fusion protein may be made, although this is likely only if the two genes share the same genomic orientation [Walsh et al., 2008]. In six of the reported deletion cases (cases 6–8 and 25–27), the final exon of the gene of interest is included in the region of copy-number loss, although in each case the most proximal downstream gene is in an opposing orientation (Supp. Table S1). Thus, fusion proteins, which can impart novel functions, may be less likely in these cases. However, a theoretical new gene could be created using the complementary exon strands as was shown for the evolution of human HREP [Inoue et al., 2001; Inoue and Lupski, 2002]. Such novel genes may encode new proteins with either neomorphic or antimorphic activities. Furthermore, in one case (case 28) it is possible that PEX11A or PLIN1, genes that map upstream of the gene of interest (KIF7) in the same orientation and the 3′ ends of which may be deleted, splice to KIF7. DNA sequencing of the deletion breakpoint and further molecular experiments would be necessary to determine if in fact a fusion transcript is produced.
We hypothesized that by concentrating array probes in exons of known or putative disease genes, we would be able to detect small genomic rearrangements that would escape detection with standard genomic arrays of similar probe number. Our exon-targeted array contains more probes in each CNV of known clinical significance (cases 1–15) than standard 105,000- (105 k), 180,000- (180 k), or 244,000-probe (244 k) arrays (Agilent Technologies) (Table 4). In eight of 15 cases, the 180 k standard array, which contains approximately the same number of total probes as our V8 OLIGO array, includes either one or no probes in the CNV of interest. Thus, the detected “exonic CNVs” would be missed using standard aCGH analysis performed with the nonexon-targeted 180 k array.
One strategy for improving the resolution of traditional aCGH is to increase the number of array probes. However, this is not without increased cost and results in detection of a greater number of clinically irrelevant copy-number changes. Indeed, high-resolution whole-genome arrays have demonstrated an immense CNV load among normal individuals [Conrad et al., 2010b]. Uncertainty as to the pathogenicity of newly discovered copy-number variants may linger and will likely be of even greater importance as new methods—for example, higher density arrays, whole-exome arrays, “conservome” arrays (those with increased coverage of conserved noncoding regions of the genome), and next generation sequencing—are increasingly used to link copy-number variation to disease. By supplementing our whole genome array with dense coverage of the exons of known and suspected disease genes, we have focused on what we hypothesized to be the most clinically relevant and interpretable genomic copy-number changes. Our approach improves the resolution of aCGH to the level of the exon while excluding much of the noise inherent in other strategies.
The ability to detect single-exon copy-number changes by aCGH provides new opportunities for genetic research and diagnosis. Nevertheless, interpretation of such rearrangements may still present a challenge, as the functional impact of these genomic alterations is not always well understood. Determining their significance, especially in the case of previously unreported variations, involves investigative teamwork between the laboratory and the clinician. Despite these challenges, this method provides a screening method to detect, with subkilobase resolution, genomic rearrangements of clinical import and research significance. Furthermore, such an approach may enable the elucidation of the function of some of the majority of the predicted genes in the human genome for which a function remains enigmatic.
We are indebted to the patients and families who participated in this study. We thank Jonathan Berg and Nicola Brunetti for assistance in array design. F.J.P. holds a Career Award for Medical Scientists from the Burroughs Wellcome Fund. The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from the chromosomal microarray analysis offered in the Medical Genetics Laboratory.
Grant sponsor: The Baylor College of Medicine Medical Scientist Training Program; Grant number: T32GM007330-34 (to P.M.B.); Grant sponsor: The National Institute of Neurological Disorders and Stroke (NINDS, NIH); Grant number: R01NS058529 (to J.R.L.); Grant sponsor: The Polish Ministry of Science and Higher Education; Grant number: R13-0005-04/2008 (to P.S.).
Communicated by Michael Dean
Additional Supporting Information may be found in the online version of this article.