|Home | About | Journals | Submit | Contact Us | Français|
Autism is a neurodevelopmental disorder characterized by impaired social interaction and communication and restricted interests and behaviors. Despite high estimates of heritability, genetic causes of ASD have long been elusive, due in part to a high degree of genetic and phenotypic heterogeneity (Bailey et al., 1995). Recently, important advances have been made in the genetics of ASD with the use of new technologies for the direct detection of copy number variation (CNV) in the human genome. CNV studies have revealed that de novo deletions and duplications, typically less than 1 Mb in size, are strongly associated with ASD, suggesting that spontaneous structural mutations play a more important role in the etiology of disease than was previously recognized. Rare mutations have been identified at many different locations in the genome, and multiple ‘hot spots’ have been identified where identical rearrangements recur with high frequency. These findings are consistent with the hypothesis that autism, like mental retardation, is caused by a large number of individually rare mutations. These studies serve as a model for how other emerging technologies for mutation detection (e.g. next generation sequencing platforms) could be used to further elucidate the role of rare sequence changes in ASD.
Autism is a class of developmental disorders characterized by impairments in social interaction and communication, and restricted or repetitive behaviors. Because of the substantial clinical heterogeneity of this disorder (Baron-Cohen and Belmonte, 2005), it has become commonly referred as ‘the autisms’ or Autism Spectrum Disorders (ASDs) (Abrahams and Geschwind, 2008). ASD is now recognized as a common disorder, with an estimated population prevalence of 1/150 (Autism and Developmental Disabilities Monitoring Network Surveillance Year 2002 Principal Investigators; Centers for Disease Control and Prevention, 2007); and ASD is four times more prevalent in boys than in girls.
Evidence from twin studies strongly supports autism as a genetic disorder (Abrahams and Geschwind, 2008). With concordance rates for monozygotic twins of 70–90%, and 10% for dizygotic twins, ASD is the most highly heritable of all neuropsychiatric disorders (Freitag, 2006). Early progress in identifying autism susceptibility genes was made through the identification of rare ‘monogenic’ forms of autism, such as Rett syndrome (Amir et al., 1999), Fragile-X syndrome (Verkerk et al., 1991; Yu et al., 1991), tuberous sclerosis (European Chromosome 16 Tuberous Sclerosis Consortium, 1993) and neurofibromatosis (Barker et al., 1987), using classical genetic approaches. Additional autism loci have been identified by candidate gene-based approaches, such as Neurologins 3 and 4 (Jamain et al., 2003; Laumonnier et al., 2004) and Shank3 (Durand et al., 2007). Further evidence for rare mutations in the etiology of ASD comes from cytogenetic studies, where chromosomal abnormalities are consistently reported in 5–7% of patients (Vorstman et al., 2006). The relevance of such rare mutations to the broader patient population was unclear initially, because the individual loci involved each only account for a small fraction of cases. Mutations like those above were generally regarded as uncommon causes of disease with little relevance to the broader patient population.
The prevailing hypothesis for the genetic basis of autism and other neuropsychiatric disorders has thus been the ‘common gene/common disease’ hypothesis which posits that disease results due to the additive or multiplicative effects of multiple genetic and environmental factors, with individual genes accounting only for a small increase in the risk in an individual patient (Lohmueller, 2003). Consequently, considerable effort has been devoted to the approaches most suitable to test this hypothesis, which are primarily linkage analysis and more recently genome-wide association studies (GWAS). Despite rigorous effort, definitive identification of autism genes by this approach remains elusive. Results from early linkage studies of ASD found evidence for linkage at many different locations throughout the genome (Risch et al., 1999). Several subsequent linkage scans produced similar results (Alarcon et al., 2002; Auranen et al., 2002; IMGSAC, 2001; Liu et al., 2001; Szatmari et al., 2007), but no individual loci identified therein were widely replicated. These findings were consistent with the model that autism is a complex disorder involving a large number of different genes.
Subsequently, with the advent of high-resolution oligonucleotide microarrays, it became possible to test the alternative hypothesis that ASD is caused by rare mutations, consisting in part of CNVs. Microarray comparative genomic hybridization (CGH) provides a means to screen genome-wide for structural changes not detectable using cytogenetic methods. These approaches offered a novel way to perform unbiased mutation screens of the genome in disease studies (Pollack and Iyer, 2002; Shaw-Smith et al., 2004). A CNV scan of the genome, like any classical genetic approach, is done in an unbiased fashion, and risk factors are identified based on the association of genetic markers with case status. However, the CNV-based approach is unique in that many novel rare variants are detected as well as common polymorphisms. This provides an opportunity to look for disease association in new ways. In addition to testing the association of variants individually with disease (single-marker association), one can simultaneously look for multiple rare mutations ‘piling up’ in the same region. Furthermore, one can examine how the aggregate of rare variants detected in patients differs quantitatively or qualitatively from variation in controls (Sebat et al., 2007; Stone et al., 2008; Walsh et al., 2008; Xu et al., 2008). This approach serves to identify candidate genes that can be further explored through targeted genetic screening as well as molecular studies of gene function.
Several studies have recently been published examining the role of structural variants in ASD (Jacquemont et al., 2006; Sebat et al., 2007; Szatmari et al., 2007; Kumar et al., 2008; Marshall et al., 2008; Weiss et al., 2008). Following the clinical cytogenetics model that chromosomal abnormalities are most likely to occur in cases with the most pronounced developmental impairments (Vorstman et al., 2006), Jacquemont et al. (2006) examined a series of 29 patients with ‘syndromic’ ASD, i.e. cases with a visible growth abnormality or malformation using a 1 Mb-resolution microarray. Seven rare deletions and three rare duplications were detected, ranging from 1.4 to 16 Mb. Seven of these events were de novo, amounting to a de novo rate of 24% in syndromic cases. Two larger studies examined the role of CNVs in idiopathic autism (Sebat et al., 2007; Szatmari et al., 2007; i.e. using patient samples that were not enriched in individuals with multiple congenital anomalies). Szatmari et al. (2007) examined copy number variation in a large sample of conconcordant sib-pair (multiplex) families analyzed using a 10K SNP platform. A statistical association of CNVs with ASD was not reported. However, some individual rare CNVs were detected in multiple families, and several de novo CNVs were detected including a deletion of a compelling gene candidate, neurexin-1, which occurred independently in two affected siblings from the same family. The role of de novo structural variants in idiopathic ASD was specifically addressed in a study by Sebat et al. (2007), which examined the rate of de novo CNVs in patients as compared to healthy controls, and in sporadic cases (simplex families) as compared to patients that have an affected first-degree relative (multiplex families). Using an 85K resolution array, de novo CNVs were detected in 10% of simplex cases, a significantly higher rate than in controls, which was 1% (P = 0.0005). By contrast, de novo events occurred in 2% of cases from multiplex families, a modest increase compared to controls. These findings suggested that spontaneous submicroscopic lesions, contribute to disease in a significant fraction of sporadic cases, and to a much lesser extent in multiplex families. The ratio of males to females observed among patients with de novo CNVs is markedly reduced, <2:1 compared to overall patient sample, which is ~4:1, suggesting that these mutations had increased penetrance, contributing to disease more equally in males and females. The vast majority of de novo events were deletions, consistent with deletions being more likely to have deleterious developmental effects than duplications. Recurrent de novo mutations were observed at some loci, but the majority of de novo mutations were unique to individual families. These findings were consistent with the presence of many different autism genes in the genome. Marshall et al. (2008) reported similar findings using an Affymetrix 500K platform in a sample of 427 families, where de novo CNVs where detected in 7% of simplex cases and 2% of multiplex cases. Consistent with our findings, the majority of de novo CNVs detected by Marshall et al. were deletions, and the ratio of males to females among patients was <2:1.
Further studies have begun to reveal which of these novel CNVs occur most frequently. Among the most frequent of the de novo mutations recently discovered is a ~500 kb microdeletion of 16p11.2. First reported in individual patients with congenital malformations (Ghebranious et al., 2007) and autism (Sebat et al., 2007), this mutation has now been shown to occur in ~1% of ASD and other pervasive developmental disorders (Kumar et al., 2008; Marshall et al., 2008; Weiss et al., 2008). By examining recurrent de novo CNVs in a large sample of 1,441 multiplex families with ASD, and an additional 299 subjects with autism spectrum disorder, 512 clinical subjects with developmental delay and 18,834 Icelandic controls, Weiss et al. (2008) found a highly significant association of the 16p11.2 microdeletion with ASD. This deletion was detected in approximately 1% of autism and other pervasive developmental disorders, much higher than the frequency observed in the general population which was ~0.01%. Marshall et al. (2008) and Kumar et al. (2008) detected the 16p11.2 deletion at a similar frequency, 2/427 (0.5%) and 4/712 (0.6%) respectively. In all three studies, the reciprocal duplication of 16p11.2 was also observed in ASD at a rate of approximately 1%. This region is now recognized as a ‘hotspot’ (Eichler and Zimmerman, 2008) with classic genomic features of a microdeletion syndrome, such as the presence of tandem segmental duplications at the breakpoints (Stankiewicz and Lupski, 2002; Stankiewicz et al., 2003; Lee and Lupski, 2006). Interestingly, the clinical phenotypes of patients with 16p11.2 microdeletions have not shown any obvious commonality (Kumar et al., 2008; Weiss et al., 2008), suggesting that some of the novel genomic disorders being discovered may have considerable variability in phenotype, being often indistinguishable from ‘garden variety’ autism.
Recurrent de novo rearrangements have been identified at other genomic locations, suggesting that more CNV ‘hot spots’ are yet to be discovered. Here, we assembled a list of recurrent de novo CNVs from recent studies of structural variation in ASD including Jaquemont et al. (2006), Sebat et al. (2007), Szatmari et al. (2007), Christian et al. (2008), Marshall et al. (2008), Mefford et al. (2008), and Weiss et al. (2008) (Table (Table1).1). A majority of de novo mutations that have been observed in at least two unrelated individuals are marked by flanking segmental duplications, consistent with sites that have a high structural mutation rate. The events most frequently reported are the previously identified hotspots at 16p11.2 and 15q11–q13. Other locations where recurrent mutations have been reported in ASD include 1q21.1, 17q12 and 22q11.2. It is important to point out that these three sites have also been reported as recurrent mutations in other neurological disorders (Lee and Lupski, 2006), highlighting the phenotypic variability associated with these rearrangements. For instance, 1q21.1 deletions and duplications have been identified in association with a range of pediatric phenotypes including autism, schizophrenia and congenital anomalies (Mefford et al., 2008). The 17p12 duplication is a well documented ‘causal’ mutation for Charcot-Marie Tooth disease type 1A (Lupski et al., 1991), while 22q11.2 deletions are associated with Velocardiofacial syndrome and schizophrenia (Karayiorgou et al., 1995) and de novo duplications of 22q11.2 have been reported in association with variable phenotypes (Ensenauer et al., 2003).
Given the much higher rate of de novo CNVs in cases compared to controls, many of the mutations identified in these studies are likely to play a causal role in ASD. However, given that relatively few have been reported in more than one family, the evidence implicating most individual loci is not unequivocal. In all studies that have so far examined spontaneous CNVs in non-disease samples, de novo events have been consistently reported in approximately 1% of apparently healthy individuals (Redon et al., 2006; Sebat et al., 2007; Xu et al., 2008). Some of these events represent clinically-neutral mutations, or in cases where only cell line-derived DNAs were used, some mutations could be artifacts due to cell culture. It is also important to consider that, in a small fraction of patients, de novo changes have been detected at multiple locations in the genome (Sebat et al., 2007; Marshall et al., 2008), raising the likely possibility that not all observed lesions in an individual are clinically relevant. In light of these observations, caution must be used in interpreting the clinical relevance of any ‘private’ mutation, whether it occurs de novo or not.
For the sake of argument, we should also recognize that the association of de novo structural mutation with disease could imply something other than a causal relationship. One alternative hypothesis to consider is whether de novo CNVs could instead be secondary lesions, i.e. mutations that accumulate as a byproduct of the disease process, as in a disease such as cancer. However, the hypothesis that autism could be a ‘fragile genome disorder’ is not consistent with certain findings, such as the association of de novo CNVs with ASD primarily in simplex families, the reduced gender bias among patients carrying de novo events, and the preponderance of rearrangements that are deletions. In order to reconcile the hypothesis with these data, we must then assume that the primary insult involved is also genetic in nature, occurs sporadically, with increased penetrance in females, and that the secondary lesions (the observed CNVs) occur through a mechanism that strongly favors deletions. By the law of parsimony, we would consider this hypothesis the less likely of the two.
To summarize, the following key lines of evidence support a causal role for rare SVs in the etiology of ASD. (1) De novo CNVs, approximately 100 kb and larger, collectively occur at a much higher rate in patients than in healthy controls and (2) Some of the most frequent de novo mutations show a significant association with ASD in large cohorts. Additional evidence supporting a causal role includes the reduced ratio of males-to-females in patients with de novo CNVs, (<2:1, as compared to 4:1), consistent with these mutations having increased penetrance.
The findings of CNV studies have revealed that spontaneous structural mutations play a more important role in the etiology of ASD than was known from previous studies based on cytogenetics. Since only a fraction of all spontaneous CNVs in ASD are detected with current microarray platforms, the true rate of such mutations could be significantly higher. Furthermore, rare de novo point mutations are known to play a role in a subset of cases (Barker et al., 1987; Verkerk et al., 1991; Yu et al., 1991; Amir et al., 1999). Therefore, structural rearrangements and point mutations that arise spontaneously could collectively contribute in a large fraction of cases.
An important role for de novo mutation has clear implications for genetic models of ASD. The heritability of ASD is estimated primarily based on rates of disease concordance in monozygotic (MZ) and dizygotic (DZ) twins, which are >70% and <10% respectively (Freitag, 2006). The former reflects both inherited and spontaneous genetic influences, because de novo mutations that occur in the parental germline will invariably be shared between MZ twins. In contrast to earlier genetic models of ASD that accounted for the discrepancy between MZ- and DZ-twin concordance rates by an oligogenic model (Pickles et al., 1995; Constantino and Todd, 2000), new models have been put forward recently that account for spontaneous mutation (Jiang et al., 2004; Zhao et al., 2007). Zhao and Wigler postulated an alternative genetic model that takes de novo mutation into account (Zhao et al., 2007). This model posits that the majority of autism cases occur sporadically as a result of de novo mutation, and it further predicts that, due to variable expression of a high-penetrance allele, some carriers will be mildly affected or not at all. Variable expressivity would enable the transmission of a high-penetrance risk allele to multiple siblings in a subsequent generation. Zhao et al. (2007) examined data on the segregation of autism in multiplex families, specifically focusing on the rates of concordance in children that were born after one pair of concordant siblings was already diagnosed. The authors found that, once two siblings in a family were diagnosed with ASD, the rates of concordance in subsequent children born were ~50% for male siblings and ~20% in female siblings. These data were consistent with the model of alleles having dominant effects in males with reduced penetrance in females.
Although the importance of rare alleles cannot be emphasized enough, genetic models are meant to be a simplification of reality. Rare variant and common variant models are not mutually exclusive, and the true genetic architecture of ASD probably consists of a mixture of modalities including dominant, recessive and low-penetrance alleles. Indeed, rare recessive alleles may explain the increased rate of ASD in consanguineous families (Morrow et al., 2008). In addition common variation could in part explain the variable expressivity observed for some mutations with apparently large effects (Mefford et al., 2008).
The use of microarray CGH to identify of submicroscopic chromosomal abnormalities in ASD has obvious implications for clinical diagnostics. However, this new technology presents new challenges. As CNV detection platforms become more powerful, the interpretation of CNV data becomes more complex. In order to distinguish between pathogenic mutations and the vast majority of naturally-occurring CNVs that are clinically neutral, it is necessary to demonstrate an association with disease. Size, rarity and de novo occurrence have long been a standard ‘criteria for causation’ for chromosomal abnormalities detected by cytogenetics. For submicroscopic rearrangements, however, pathogenic mutations may be similar in size to non-pathogenic variants. And while de novo CNVs occur with much higher frequency in ASD than in healthy individuals, de novo occurrence alone is not unequivocal evidence for causation. Furthermore, recurrent CNVs with strong associations with disease are not perfect predictors of disease outcome, as evident from the variable clinical presentation of some novel genomic disorders (Mefford et al., 2008). Thus, the new cytogenetics must establish new criteria. For instance, rare CNVs including de novo mutations should be implicated with statistical evidence. Determining a prognosis for a carrier should be based on the known penetrance of that mutation for ASD and for other clinical phenotypes associated with the same CNV.
The prevalence of autism in the population is matter of great concern to science and to the public. Therefore, we must consider what implications these findings have for the epidemiology of ASD. Two central questions raised by CNV studies are: can structural mutation rates, at individual loci or genome-wide, vary between populations? Can rates of structural mutation fluctuate over time?
The mutation rate at some individual loci can differ between populations due to genetic variation. Rates of de novo mutation are locus specific and are dependent largely on local architectural features of the genome, particularly segmental duplications. Since these architectural features are themselves subject to polymorphism, common structural polymorphisms may contribute to differences in structural mutation rate between common haplotypes. Furthermore, differences in haplotype frequency between populations may lead to population differences in the mutation rate in that genomic region. This is perhaps best exemplified by the recently discovered 17q21.3 microdeletion syndrome (Koolen et al., 2006; Sharp et al., 2006; Shaw-Smith et al., 2006; Varela et al., 2006), where the mutation is associated with an inversion polymorphism that is common in Europeans and rare in other populations (Stefansson et al., 2005). Microdeletion syndromes are almost invariably located in complex regions of the genome, often containing other common inversions and copy number polymorphisms (Mefford et al., 2008), thus, there may exist other recurrent de novo mutations that are frequent in one population and not in others.
Another important epidemiological question is whether genome-wide rates of de novo mutation could be influenced by non-genetic factors such as parental age. Advanced parental age has become established as a risk factor for ASD in offspring (Gillberg, 1980; Reichenberg et al., 2006; Cantor et al., 2007; Croen et al., 2007), with the observed effect strongest for older fathers (Reichenberg et al., 2006). This effect could be explained by genetic or epigenetic factors. Rates of structural and numerical chromosome abnormalities in sperm have been shown to increase with paternal age (Sartorelli et al., 2001; Buwe et al., 2005), raising the interesting hypothesis that spontaneous heritable changes such as de novo CNVs could be a mechanism to explain the paternal age effect. However, strong evidence linking paternal age effects to mutation in the male germline is lacking. Even in disorders that are clearly caused by genetic changes and which show clear paternal age effect, such as achondroplasia (Tiemann-Boege et al., 2002), the paternal age effect cannot be explained entirely by increased mutation rates in sperm. In order to establish a link between paternal age and the rates of de novo CNVs in autism, both must be examined directly in a large family sample.
Definitive identification of a CNV risk factor pinpoints a narrow region of the genome that is likely to contain autism genes. The next step is to determine which of the genes in the local region are functionally relevant. This is not a trivial matter, because the frequent recurrent mutations that have so far been identified are relatively large in size (>500 kb) and contain dozens of genes. Additional genetic or functional evidence is needed to confirm the biological relevance of an individual gene. More intensive genetic analysis of candidates in regions identified from large CNV regions may help to pinpoint the biologically relevant factors.
Several of the de novo CNVs identified in recent studies are smaller (≤100 kb), and include a few genes or even a single gene. Although the frequencies of these variants appear to be lower than for the previously described ‘hot spots’ and the corresponding associations are not statistically significant, converging lines of evidence from multiple studies is emerging for some individual genes. For instance, by sequencing and cytogenetic analysis of candidate genes within the 22q13 microdeletion region (Manning et al., 2004), Durand et al. (2007) found additional mutations implicating SHANK3. Multiple studies now find evidence for the role of rare neurexin-1 mutations in neuropsychiatric disorders (Kim HG et al., 2008; Kirov et al., 2008; Marshall et al., 2008; Yan et al., 2008). As these examples illustrate, the most biologically informative findings do not necessarily come from the mutations observed most frequently. The rarest alleles, i.e. those that are observed in <0.1% of the sample, may have striking phenotypic effects in individual patients, and may collectively account for a substantial proportion of disease risk in the overall population. This notion is supported by evidence from studies of ASD and schizophrenia, where the majority of de novo mutations reported to date have been observed in single families (Sebat et al., 2007; Xu et al., 2008) and the increased mutational burden of rare structural variants observed in patients with schizophrenia (Stone et al., 2008; Walsh et al., 2008) is greatest for variants observed only once in the sample (Stone et al., 2008). Thus, the rarest of de novo mutations could also serve to identify candidate genes, the most attractive candidates being the genes that are uniquely disrupted by a small de novo deletion or balanced rearrangement. We reviewed the literature and the Autism Chromosome Rearrangement Database to assemble a list of de novo CNVs and de novo balanced cytogenetic rearrangements that disrupt individual genes (Table (Table2).2). This list consists of 19 mutations involving 21 different genes. These genes include several that play key roles in neurodevelopment, such as A2BP1 (Nakahata and Kawamoto, 2005) a neuronal splicing enhancer, DPP6 (Nadal et al., 2003; Kim J et al., 2008) a necessary component of type A potassium channels that regulates neuronal excitability, and CNTNAP2, a member of the neurexin superfamily that has been previously implicated in familial epilepsy (Strauss et al., 2006) and Gilles de la Tourette syndrome (Verkerk et al., 2003). Given their established neuronal functions, these genes make excellent candidates for further study. It is likely that some autism genes will harbor additional risk alleles. Indeed, analysis of rare (Bakkaloglu et al., 2008) and common alleles (Alarcon et al., 2008; Arking et al., 2008) of the CNTNAP2 gene further support its role in ASD. Thus, a more intensive genetic analysis of these and other gene candidates could help to confirm the role of some of these genes in ASD (Abrahams and Geschwind, 2008).
CNV-based studies of ASD serve as a model for how emerging technologies for mutation detection, including next-generation sequencing platforms (Mardis, 2008), can be applied to genetic studies of disease. As rapidly evolving technologies bring the field ever closer to the critical ‘$1000 genome’ milestone, we can realistically envisage applying genome-wide screens for de novo mutation (Sebat et al., 2007; Xu et al., 2008) or mutational burden analysis (Stone et al., 2008; Walsh et al., 2008) to a nearly-complete inventory of SNPs and SVs. Structural variants could represent a substantial fraction of all causal variants in ASD. However, a sequence variant, such as frameshift or stop mutation can be as deleterious to gene function as a large deletion (Durand et al., 2007). Therefore, it is clear that some proportion of causal variants will be de novo or inherited single nucleotide changes. Thus, further genetic evidence for the role of individual genes, initially identified from CNV studies, can be sought by sequencing candidate genes in additional patient samples. Further studies that examine gene candidates more intensively using next-generations sequencing technologies will be needed to more completely understand the contribution of rare variants and the role of spontaneous mutation in ASD.
We wish to thank Michael Wigler, Mary-Claire King, Deborah Levy and James Watson for helpful discussions and support.
My laboratory is supported by funding from the National Institutes of Health (MH076431-04 and HG004222-02), Simons Foundation and Stanley Foundation.