|Home | About | Journals | Submit | Contact Us | Français|
The duplication architecture of the human genome predisposes our species to recurrent copy number variation and disease. Emerging data suggest that this mechanism of mutation contributes to both common and rare diseases. Two features regarding this form of mutation have emerged. First, common structural polymorphisms create susceptible and protective chromosomal architectures. These structural polymorphisms occur at varying frequencies in populations, leading to different susceptibility and ethnic predilection. Second, a subset of rearrangements shows extreme variability in expressivity. We propose that two types of genomic disorders may be distinguished: syndromic forms where the phenotypic features are largely invariant and those where the same molecular lesion associates with a diverse set of diagnoses including epilepsy, schizophrenia, autism, intellectual disability and congenital malformations. Copy number variation analyses of patient genomes reveal that disease type and severity may be explained by the occurrence of additional rare events and their inheritance within families. We propose that the overall burden of copy number variants creates differing sensitized backgrounds during development leading to different thresholds and disease outcomes. We suggest that the accumulation of multiple high-penetrant alleles of low frequency may serve as a more general model for complex genetic diseases, posing a significant challenge for diagnostics and disease management.
Genomic disorders were originally described as large deletions and duplications that are highly penetrant, mostly de novo in origin, and typically identified in affected individuals with intellectual disability/multiple congenital malformations. Some examples include Smith–Magenis syndrome (MIM: 182290), DiGeorge/velocardiofacial syndrome (MIM: 188400, 192430) and Williams–Beuren syndrome (MIM: 194050). These classical genomic disorders have been well characterized in the past two decades with genotype–phenotype correlation studies implicating causative genes, mouse models recapitulating the human clinical features, and standardized management protocols and support groups established.
Application of higher definition molecular techniques, including single-nucleotide polymorphism microarrays or array comparative genomic hybridization (CGH), has allowed genotyping of larger disease cohorts and controls. Two major principles have emerged from these more recent studies: (i) common copy number polymorphism predisposes certain chromosomes to recurrent deletions and duplications and (ii) association of the same recurrent genomic lesion with apparently very diverse phenotypes. The latter has begun to illuminate common neurodevelopmental pathways and helps to explain the comorbidity of diverse neurological manifestations within the same families. The distinction between variability of expressivity and reduced penetrance depending on the diagnosis has become an important consideration for these rare mutational events. We will explore the mechanisms, models and implications underlying these two different aspects.
Seminal work on Charcot–Marie–Tooth disease (1,2) and hereditary neuropathy with liability to pressure palsies (HNPP) (3) directly implicated low-copy repeats or segmental duplications as substrates for unequal crossover or non-allelic homologous recombination (NAHR) resulting in duplications or deletions (4). Discovery of genomic disorders enriched within chromosomal regions flanked by segmental duplications further strengthened the NAHR model (5,6), and predisposition to such events was attributed to the length (≥10 kb) and percentage identity (≥95%) of participating segmental duplication blocks (5). It was also recognized that sites flanking copy number variants (CNVs) were particularly polymorphic with increased copy numbers of segmental duplications associating with recurrent rearrangements (7,8). For example, increased copies of segmental duplications flanking the chromosome 7q11.23 region have been linked to an elevated risk for rearrangements causing Williams syndrome (9). An intimate association between inversion polymorphisms and deletions and duplications was recognized early on for genomic disorders such as Williams syndrome (10), 8p23.1 microdeletion (11,12) and 15q11 microdeletions (13). Recently, fosmid paired-end sequencing followed by fluorescent in situ hybridization also identified six human disease-associated inversion polymorphisms in three HapMap populations for 3q29, 15q24.1, 15q13.3, 17q21.31, 8p23 and 17q12 (14,15) (Fig. 1). However, the molecular basis of these associations was not clear until recently.
Stefansson et al. (16) reported one such structurally complex region on chromosome 17q21.31 of the human genome. Two ancient haplotypes map to this region: the directly orientated (or H1 based on the build36 version of the human genome assembly) and the inverted orientation (H2) (Fig. 2A). Interestingly, no recombination occurs between the two haplotypes and, thus, there is a large extended stretch of linkage disequilibrium in this region. Only the H2 haplotype harbors two highly identical (38 kb, ~98%) blocks of segmental duplications in direct orientation facilitating NAHR events (6,17). This structure predisposes ~20% of Europeans to a recurrent 480 kb microdeletion (6,17–19). Notably, the H2 haplotype occurs at an extremely low frequency in the Asian and African populations (16). Among the San population of Southern Africa, the H2 allele has been discovered at low frequency (20) yet without the predisposing segmental duplication that promotes unequal crossover. The available data suggest that the duplication-associated inversion and the 17q21.31 recurrent microdeletion are largely restricted to individuals of European and Mediterranean descent.
Similarly, investigation into segmental duplication architecture of the more recently described 16p12.1 microdeletion (21) revealed two common structural configurations with worldwide frequencies of 17.6% (S1) and 82.4% (S2) (22) (Fig. 2B). Of note, the two human haplotypes differ by 333 kb of additional duplicated sequence present in S2 but not in S1. Again, similar to the 17q21.31 region, the S2 configuration directly orients the specific duplication blocks (68 kb, >99%), thereby predisposing to disease rearrangement. Although the S1 structure is protective against the disease-associated rearrangement, S2 is the most common structure with frequencies of 97.5% in Africans (YRI), 83.1% in Europeans (CEU) and 71.6% in Asian populations (CHB/JPT). Similar analyses of other hotspot regions of the genome will likely identify other specific predilections to specific disease rearrangements.
The dichotomy between syndromic genomic disorders and those with more variable phenotypes is longstanding. For example, microdeletion on chromosome 22q11.2 presented with a variety of phenotypes, including velocardiofacial syndrome (23,24), DiGeorge syndrome (25–27), isolated cardiac outflow tract defects (28,29) and schizophrenia (30). In contrast, specific phenotypes aiding relatively straightforward clinical diagnosis were noted for Williams syndrome resulting from a 7q11.23 microdeletion (31) and Angelman syndrome (32) or Prader–Willi syndrome (33) resulting from maternal (34) or paternal deletion (35), respectively, of the imprinted loci on chromosome 15q11.2q13 (36,37). Nearly 20 new genomic disorders have been reported recently, almost twice the number compared with what was known in the past two decades. These novel disorders are recurrent and map within genomic hotspots flanked by segmental duplications. Despite initial claims, most of these disorders are not specific and phenotypes have been associated with, but not limited to, intellectual disability/congenital malformations, epilepsy, schizophrenia, autism, cardiac and renal anomalies and obesity (Table 1).
For example, microdeletions in 16p11.2 containing the TBX6 gene was initially reported to be associated in individuals with autism (38) and autism spectrum disorder (39), whereas the same events were identified in cases ascertained for developmental delay and congenital malformations (40). Although further studies validated the association of this microdeletion to both autism (41–43) and idiopathic mental retardation (44,45), using high-density array CGH, Walters et al. (46) also documented an increased incidence of obesity in cases with microdeletion of this locus.
There are now numerous examples of this widening spectrum of clinical diagnoses (Table 1). Microdeletion on 16p11.2 containing the SH2B1 gene was initially reported as an obesity locus (47); however, recent studies have also shown a strong association with developmental delay (48). These studies have also associated a previously unreported phenotype to a well-characterized canonical syndrome mainly due to ascertainment bias. For example, the microdeletion on chromosome on 17p12 is classically known to be causal for HNPP (3). Interestingly, a large, multi-center study also implicated microdeletion on 17p12 in schizophrenia susceptibility (49). Initially, del17q12 was discovered in cases with cystic renal disease and maturity-onset diabetes of the young but without cognitive impairment (50). Nagamani et al. (51) expanded the phenotypic spectrum of this disorder to include cognitive impairment, seizures and brain malformations. Deletions and duplications in 16p13.11 were initially reported in cases with intellectual disability (44,52) and autism (52,53); subsequent studies then also linked these CNVs to epilepsy (54,55) and schizophrenia (56).
Thus, certain challenges surround the discovery of these newly described CNVs: (i) many of these novel CNVs have been found in the control populations as well as in unaffected family members and (ii) lack of detailed clinical information and standardized phenotyping poses a challenge for diagnosis, counseling and management of such disorders. In light of the effect size of some of these variants in large case–control studies, the discovery of the same lesion in the general population likely reflects variability of expressivity as opposed to reduced penetrance of the variant. Studying these deletions and duplications within the context of a well-phenotyped family would likely reveal that most of these larger events have phenotypic consequence although the effect in some cases may be more subtle depending on other genetic and non-genetic modifiers.
There are several explanations for variable expressivity and clinical heterogeneity in genomic disorders (Fig. 3). First, the breakpoints of the events may not be identical. Atypical deletions and duplications involving contiguous dosage-sensitive genes within the region often explained the observed clinical variability in many genomic disorders. Examples include Smith–Magenis syndrome (57), Potocki–Lupski syndrome (58), 15q24.3 microdeletion (59,60) and Williams syndrome (61). Many of these atypical CNVs tend to result from alternate segmental duplications (62) or Alu sequences (63) acting as recombination substrates leading to non-recurrent rearrangements. Mapping of these atypical CNVs has further enabled the delineation of a disease-associated critical interval, for example, in Alagille syndrome (JAG1) (64,65), Smith–Magenis syndrome (RAI1) (66) and demarcation of the phenotype-associated gene in Williams syndrome [GTF2IRD1 and GTF2I for craniofacial abnormalities and visuospatial defects (61,67) and NCF1 for hypertension (68)]. Similarly, by correlating the patient phenotypes to the non-recurrent atypical 17p13.3 deletions, the PAFAH1B1 or LIS1 and YWHAE genes were attributed to a majority of clinical features associated with lissencephaly or Miller–Dieker syndrome, respectively (69–71). Although detection of smaller or atypical deletions has pinpointed genes related to specific phenotypes, for some genomic disorders the CNV size (examples include del1q21.1, del16p13.11 and dup16p13.11) did not correlate with the phenotypes observed, indicating other mechanisms for the clinical heterogeneity.
A major challenge for the newly reported genomic disorders has been the pathological association of diverse neurological phenotypes with CNVs mapping to segmental duplications. For these CNVs, the breakpoints have not been refined to a specific basepair position due to the complexity of the duplicated sequences. However, such areas are enriched for genes of unknown function, so it is possible that subtle differences within the breakpoints themselves can account for the variability. A good example is the 15q13.3 microdeletion that maps between two large identical blocks of segmental duplications, distal to the Prader–Willi/Angelman locus. Originally, these microdeletions were described in nine cases with developmental delay/multiple congenital anomalies and seizures (72). The same event was later identified to be the most prevalent risk factor for common epilepsies; in fact, accounting for ~1% of the sporadic cases as well as familial cases with idiopathic generalized epilepsy (73,74) and sporadic epilepsy syndromes (55). As more cases with a broad spectrum of developmental and neuropsychiatric disorders and control samples were analyzed, it became clear that the 15q13.3 microdeletion was associated with a wide range of outcomes as well being reported among normal individuals. Whereas the same microdeletion was enriched in cases with autism (75) and individuals with schizophrenia (76,77), Ben-Shachar et al. (78) also reported aggressive and possible antisocial behaviors in carrier parents. Shinawi et al. (79) were further able to reduce the disease-associated critical region when they genotyped four cases with variable neurodevelopmental phenotypes and identified a 680 kb region encompassing the CHRNA7 and OTUD7 genes. Le Pichon et al. (80) identified a patient with severe neurodevelopmental deficits and a homozygous 15q13.3 microdeletion suggesting dosage sensitivity for several genes within this region including TRIPM1 and CHRNA7. Although accumulating evidence suggests that CHRNA7 is the causal gene for these neurological phenotypes, it is likely that heterozygous deletions result in neuropsychiatric and/or epilepsy phenotypes, dependent on other CNV hits and genetic modifiers, and that the absence of CHRNA7 would result in a more severe cognitive manifestation.
Haploinsufficiency or dosage sensitivity for one or more genes within the genomic region has been the commonly proposed functional impact of a CNV. Genes associated with autosomal dominant disorders are also likely to be exposed due to the hemizygosity of the particular gene-containing interval; examples include Williams syndrome [ELN for supravalvular aortic stenosis (81)], Smith–Magenis syndrome [RAI1 for mental retardation, sleep disturbance, self-injurious behaviors (66)] and del1q21.1 [GJA5 for congenital cardiac disease (82,83) and GJA8 for cataracts (84)]. Mutations in NIPA1 within the 15q11.2 (BP1–BP2) region have been reported in autosomal dominant hereditary spastic paraplegia (85). Interestingly, 15q11.2 deletion, resulting in hemizygosity of NIPA2, has been linked with an increased risk for diverse neuropsychiatric features (Table 1). The MYH11 gene, previously implicated in aortic aneurysms and dissections (86), maps within the 16p13.11 CNV region; however, probably due to ascertainment bias, aortic defects have not yet been reported in studies on 16p13.11 CNVs.
Furthermore, recessive genes reside within the CNV regions, and the chances of finding a recessive mutation along with a microdeletion are rare (frequency of spontaneous mutation × frequency of the deletion event), but plausible (Fig. 3). Profound sensorineural hearing loss has been reported in patients with Smith–Magenis syndrome whose deletions unmask the recessive mutations in the myosin (MYO15A) gene located within the 17p11.2 region (87). Functional polymorphisms within COMT and FXII, unmasked by hemizygous deletions, have also been reported to result in cognitive decline and psychosis in patients with del22q11.2 and reduced activity of coagulation factor 12 in Sotos syndrome, respectively (88,89). Chronic granulomatous disease resulting from unmasking of NCF1 recessive mutation has been reported in relation to the Williams syndrome deletion (90). Similarly, metachromatic leukodystrophy was reported in a case with 22q13 deletion and a hemizygous mutation in the arylsulfatase A (ARSA) gene (91). Oculocutaneous albinism was reported in an individual with Prader–Willi syndrome deletion and a recessive mutation in the OCA2 gene (92). A combination of symptoms of Wolfram syndrome and Wolf–Hirschhorn syndrome was detected in a case with a WFS1 mutation unmasked by a chromosome 4p deletion (93). Kumar et al. (94) performed sequencing analysis on eight genes within the 16p11.2 autism-associated region in five autism cases with the microdeletion and approximately 100 cases without the microdeletion and 100 ethnicity-matched controls. Although single-nucleotide variants in the coding exons were detected in SEZ6L2 in the initial set of autism cohorts, the authors were not able to replicate this finding in a replication set. However, sequencing efforts to identify recessive alleles in genes within the 1q21.1 (95,96), 16p13.11 (53) and 16p12.1 (21) regions have not yet met with much success. Targeted sequencing of genes within the pathogenic CNV region using the newly available technologies such as molecular inversion probe assays (97) would be useful to find potential candidate genes.
Most studies on previously discovered microdeletions and microduplications associated with neurodevelopmental phenotypes were focused on the impact of genes within a specific region. Recently, a 520 kb microdeletion on the 16p12.1 was identified to be non-syndromic, i.e. associated with variable phenotypes and was inherited from a parent in 95% of the cases (21), in stark contrast to other genomic disorders where most mutations are de novo. Further follow-up on the families of probands showed that the carrier parents were, in fact, suffering from subclinical manifestations of mild neuropsychiatric illness including depression, bipolar disorder, mild learning disability or seizures. What is the cause for the phenotypic variability between the probands and the carrier parents? It was observed that ~25% of the probands also carried another large deletion or duplication second hit, elsewhere in the genome. This represented a 40-fold increase for two or more CNVs > 500 kb in length when compared with the general population. The clinical features in probands with two hits were different from those associated with just the second hit. It is evident that two large CNVs in a single individual would increase or decrease the dosage for multiple genes, thereby creating a sensitized genomic background. It is likely that one hit is sufficient to reach a threshold just enough to induce some neuropsychiatric features but a second hit pushes one toward a more severe phenotype with intellectual disability and developmental delay (Fig. 4).
The two-hit model was initially applied to explain the variable expressivity in 16p12.1 microdeletion. Further evaluation of a handful of other genomic disorders associated with syndromic or variable phenotypes showed a clear clustering of two hits in CNVs with variable phenotypes when compared with syndromic mutational events. Further, a strong correlation was observed between the proportion of inherited first hits and the frequency of second hits (Fig. 5). This relationship reflects as a function of fitness wherein genomic disorders associated with severe syndromes are essentially under strong purifying selection and therefore can arise only through de novo events. However, others often present with milder phenotypes are subjected to strong but reduced selection, and are therefore passed on from parents to offspring for a few generations until the variant crosses paths with a second germline hit and then manifests as a severe phenotype. Although the two-hit model was initially applied to large CNVs, the second hit could also, in principle, be a smaller CNV or a single-nucleotide change affecting a phenotypically related gene or a risk allele derived from a parent. Assuming that nosologically distinct diseases such as epilepsy, intellectual disability, schizophrenia and autism share common neurodevelopmental pathways, the disease outcome may differ depending on the overall burden of haplosensitive genes in an individual and the pathways that are affected (Fig. 4). Ultimately, such mutations are removed from the population but the overall genetic load and high frequency of large, recurrent CNVs lead to a significant contribution to disease. The two-hit model is attractive in the sense that it can explain the often-documented comorbidity between intellectual disability and other neurological phenotypes (98,99) within families.
Standard management protocols have been developed for several genomic disorders including DiGeorge, Smith–Magenis and Williams syndromes. As many of the new ‘first-hit’ CNVs are inherited from a parent, management of cases with two hits is challenging, especially when the pathogenicity of the second hit is not yet well established. In addition to the assessment of recurrence risks, management of cases with two hits has to rely on a case-by-case workup or follow the standard protocols in case either hit is a known genomic disorder (Fig. 6). Detailed multi-disciplinary assessment of many cases is warranted for a definitive diagnosis and for teasing out the genomic contribution for neurodevelopmental phenotypes. Although collectively quite common, the low prevalence of any single event justifies the screening of hundreds of thousands of cases to provide evidence-based criteria for the assessment of pathogenicity and diagnosis. The development of an international consortium—the ISCA (100)—and associated databases—dbVAR, ECARUCA (101) and DECIPHER (102)—is an important first step in this regard.
Given the rapid progress in the field of genomic sequencing, it is evident that the microarrays are going to be replaced by exome and whole-genome sequencing in the near future (103). These will ultimately become the standard for testing copy number variation. Such rapid advancements will uncover a deluge of potentially disruptive mutations including small indels and point mutations, further challenging genetic diagnosis and management. In such situations wherein diseases seem to behave in a non-Mendelian fashion, compounding rare variants of functional significance will be relevant. A more tractable solution would be to perform a detailed focus study of individuals sharing a particular lesion using a genotype-first approach. Such projects have already been proposed and are some of the most tangible deliverables of the genome era. Finally, it is evident that copy number and inversion polymorphisms can predispose some populations to disease and make others immune (Europeans predisposed and Asians protected against 17q21.31 lesions). Therefore, CNV studies for rare structural variants (0.001–0.1%) could lead to false associations without disease significance and should contain properly matched cases and controls.
This work was supported by NIH grant HD043569 (E.E.E.). E.E.E. is an investigator of the Howard Hughes Medical Institute.
We thank Luis Perez-Jurado for useful comments.
Conflict of Interest statement. E.E.E. is a scientific advisory board member for Pacific Biosciences.