|Home | About | Journals | Submit | Contact Us | Français|
The autism spectrum disorders (ASDs) are a group of conditions characterized by impairments in reciprocal social interaction and communication, and the presence of restricted and repetitive behaviors1. Individuals with an ASD vary greatly in cognitive development, which can range from above average to intellectual disability (ID)2. While ASDs are known to be highly heritable (~90%)3, the underlying genetic determinants are still largely unknown. Here, we analyzed the genome-wide characteristics of rare (<1% frequency) copy number variation (CNV) in ASD using dense genotyping arrays. When comparing 996 ASD individuals of European ancestry to 1,287 matched controls, cases were found to carry a higher global burden of rare, genic CNVs (1.19 fold, P= 0.012), especially so for loci previously implicated in either ASD and/or intellectual disability (1.69 fold, P= 3.4×10−4). Among the CNVs, there were numerous de novo and inherited events, sometimes in combination in a given family, implicating many novel ASD genes like SHANK2, SYNGAP1, DLGAP2 and the X-linked DDX53-PTCHD1 locus. We also discovered an enrichment of CNVs disrupting functional gene-sets involved in cellular proliferation, projection and motility, and GTPase/Ras signaling. Our results reveal many new genetic and functional targets in ASD that may lead to final connected pathways.
Twin and family studies indicate a predominantly genetic basis for ASD susceptibility and provide support for considering these disorders as a clinical spectrum. Some 5–15% of individuals with an ASD have an identifiable genetic aetiology corresponding to known rare single-gene disorders (e.g., fragile X syndrome) and chromosomal rearrangements (e.g., maternal duplication of 15q11-q13). Rare mutations have been identified in synaptic genes, including NLGN3, NLGN4X4 and SHANK35, and microarray studies have revealed copy number variation (CNV) as risk factors6. CNV examples include de novo events observed in 5–10% of ASD cases7–9, de novo or inherited hemizygous deletions and duplications of 16p11.29–11 and NRXN17, and exceptionally rare homozygous deletions in consanguineous families12. Genome-wide association studies using single nucleotide polymorphisms (SNPs) have highlighted two potential ASD risk loci at 5p14.113 and 5p15.214, but these data suggest common variation will account for only a small proportion of the heritability in ASD.
To further delineate the contribution of rare genomic variants to autism we genotyped 1,275 ASD cases and their parents using the Illumina Infinium 1M-single SNP-microarray (Fig. 1). A set of 1,981 controls used for comparison studies was genotyped on the same platform15 and both data sets were subjected to the same quality control (QC) procedures. Ultimately, we analyzed 996 ASD cases (876 trios) and 1,287 controls of European ancestry (EA) to minimize confounds due to population differences (Supplementary Fig. 1–2 and Supplementary Table 1)16.
Two CNV prediction algorithms (QuantiSNP17 and iPattern (unpublished)) and additional extensive QC were used to establish a stringent dataset of non-redundant CNVs called by both algorithms in an individual (Fig. 1, Supplementary Tables 1–3 and Supplementary Fig. 3). This stringent dataset of 5,478 rare CNVs in 996 cases and 1,287 controls of EA (Supplementary Table 4) had the following characteristics: (i) CNV present at <1% frequency in the total sample (cases and controls), (ii)≥CNV 30 kb in size (because >95% of these could be confirmed) and (iii) all CNVs further verified using combined evidence from the PennCNV algorithm18 and child-parent intensity fold-changes, genotype proportions (to verify deletions) and visual inspection (for chromosome-X).
We assessed the impact of rare CNV in cases compared to controls using three primary measures of CNV burden: the number of CNVs per individual, the estimated CNV size, and the number of genes affected by CNVs (Table 1). No significant difference was found in the former two measures (Supplementary Tables 4a and 5), even after controlling for fine-level ancestry differences by pair-matching cases and controls (Supplementary Information)16. In contrast, we discovered a significant increase in the number of genes intersected by rare CNV in cases when focusing on gene-containing segments (1.19-fold increase, empirical P= 0.012). This ASD association with genic CNV was stronger for deletions (1.26-fold increase, empirical P= 8.0×10−3). These differences remained after we further controlled for potential case-control differences that could be present due to biological differences or technical biases. Restricting our analysis to autosomal CNVs (ie. after removing CNVs located on chromosome X) resulted in a consistent enriched gene count in ASD cases compared to controls. Single-occurrence CNV deletions had increased rates in ASD over controls, suggesting some could be pathogenic.
We then examined parent-child transmission and confirmed that 5.7% (50/876) of ASD cases had at least one de novo CNV with >0.6% carrying two or more de novo events (Supplementary Tables 4a, 6 and 7). The de novo CNV rate in our simplex and multiplex families was 5.6% (22/393) and 5.5% (19/348), respectively, in contrast with previous studies showing a higher rate in simplex families8,9. A total of 226 validated de novo (7) and inherited (219) CNVs not observed in controls and affecting single genes were found (Supplementary Table 8).
Numerous novel candidate ASD loci such as SHANK2, SYNGAP1, and DLGAP2, were identified based on the observation that de novo CNV affects these genes in cases and not controls (Supplementary Table 6). The relatedness of SHANK2 to the causal ASD gene SHANK35, involvement of SYNGAP1 in ID19, and interaction of DLGAP family proteins with SHANK proteins20 further support their role in ASDs. Maternally-inherited X-linked deletions at DDX53/PTCHD1 (7 cases) implicated this locus in ASD. We tested an additional 3,677 EA controls (Fig. 1) and again found no CNV at these genes, and DDX53/PTCHD1 emerged as a significant ASD risk factor (P= 3.1×10−3 with the initial 1,287 controls; P= 3.6×10−6 with combined controls; Supplementary Fig. 4).
Association studies of individual rare CNV often have insufficient power to discriminate benign from disease-causing variants. Here, we assessed whether genes and CNVs previously associated with ASD and/or ID were enriched in cases compared with controls, in order to help identify pathogenic events. We defined three gene-lists based on evidence from previous studies of their involvement in ASDs (Supplementary Table 9): (i) ‘ASD implicated’ list consisting of 36 disease genes and 10 loci strongly implicated in ASD and identified in subjects with ASD or ASD and ID; (ii) ‘ID’ consisting of 110 disease genes and 17 loci implicated in ID but not yet in ASD; and, (iii) ‘ASD candidates’ including 103 genes from previous studies of common and rare variants.
We observed a higher proportion of cases with rare CNVs overlapping ‘ASD implicated’ disease genes compared to controls (4.3% versus 2.3%, Fisher exact test P= 5.4×10−3; Fig. 2a), corresponding to a significant enrichment for genes in this set (OR= 1.8; 95% CI 1.3–2.6, empirical P= 2.6×10−3; Fig. 2b, see also Supplementary Information). This effect was stronger for duplications, which may also disrupt genes (OR= 2.3; 95% CI 1.4–3.8, empirical P= 9.4×10−4). Enrichment was also found for rare CNVs overlapping ID genes, more notably for deletions (OR= 2.1; 95% CI 1.1–4.2, empirical P= 5.3×10−2). In contrast, there was no evidence of enrichment among case-CNVs compared to control-CNVs for genes in the ‘ASD candidates’ set (empirical P >0.3). When the two disease gene-sets ‘ASD implicated’ and ‘ID’ were combined, we observed 7.6% of cases with rare CNVs preferentially affecting ASD/ID genes compared to 4.5% in controls (Fisher exact test P= 1.2×10−3, Fig. 2a). The observed enrichments did not change when potential case-control genome-wide differences for CNV rate and size were considered.
Our global analyses of these putative pathogenic loci use somewhat subjective boundaries for CNV overlap. Manual inspection of the data yields more accurate results. After eliminating CNVs that are less likely to have an aetiological role (heterozygous CNVs that disrupt autosomal recessive loci, events outside the critical region of overlap of genomic disorders, X-linked genes in females inherited from non-ASD fathers, duplications inherited from non-ASD parents, and intronic CNVs in NRXN1), 25 CNVs remained in the ASD group, compared to only four in the controls (P= 3.6×10−6; Supplementary Table 10). Moreover, the latter four CNVs were all duplications at 1q21.1, 16p11.2 or 22q11.2, loci known to exhibit incomplete penetrance and variable expressivity6. The population attributable risk provided by the combination of all ASD-CNVs that overlap ASDs and/or ID genes is estimated to be 3.3% (Supplementary Table 11). We also identified rare de novo chromosomal abnormalities and large CNVs likely to be aetiologic (Supplementary Table 10).
We then tested for functional enrichment of gene-sets among those genes affected by CNVs to identify biological processes involved in ASD (Fig. 3). Here, the term gene-set refers to groups of genes that share a common function or operate in the same pathway. Such a functional enrichment mapping approach can combine single-gene effects into meaningful groups21.
We compiled comprehensive collections of gene-sets (Supplementary Table 12) and used the Fisher’s exact test to assess which gene-sets were more frequently affected by rare CNV events in ASD cases compared to controls. An estimate of the false-discovery rate (FDR) at each gene-set was obtained by random permutation of case and control labels (Supplementary Information). To visualize enriched gene-sets, overlap scores were used to graphically organize these sets into a functional enrichment map (or network) using Cytoscape22. We identified the ‘seed’ genes for the network at an FDR q-value of 5% and further relaxed the thresholds to 12.5% to better capture the network topology23.
Using these criteria only deletions were found significantly enriched in gene-sets in cases over controls (Supplementary Fig. 5), consistent with the global burden results (Table 1). Specifically, 76 gene-sets affected by deletions (2.18% of sets tested) were found enriched and used to construct a functional map (Figure 3a, Supplementary Fig. 6–7). We tested for possible bias, including measures of CNV size and number for cases versus controls per gene-set, as well as genome proximity, but no differences were found that might explain the observed enrichments (Supplementary Fig. 8–9).
We identified enrichments in gene-sets known to be involved in ASDs and also discovered new candidate ASD pathways (Fig. 3a, Supplementary Table 13). For example, gene-sets involved in cell and neuronal development and function (including projection, motility, and proliferation) previously reported in ASD-associated phenotypes, were identified24. Novel observations included gene-sets involved in GTPase/Ras signaling, with component Rho GTPases known to be involved in regulating dendrite and spine plasticity and associated with ID. We also found a tentative link to sets in the kinase activity/regulation functional group where only minorities of these sets meet a stringent 5% FDR q-value threshold (Supplementary Fig. 10).
We further assessed the relationship of our functional enrichment map with known ASD/ID genes (Fig. 3b, Supplementary Fig. 11) and found genes enriched in sets linked to microtubule cytoskeleton, glycosylation and CNS development/adhesion25. The two groups of genes found enriched in deletions (Fig. 3a) also displayed connectivity to the ASD/ID disease gene-sets, either directly or through intermediates (Fig 3b, Supplementary Fig. 12). Although ASD genes appear to be enriched in different subsets of genes compared to ID-only genes, we cannot discount the possibility that this is the result of selection bias, and we expect that more ID genes may yet be linked to ASD.
Our findings provide strong support for the involvement of multiple rare genic CNVs, both genome-wide and at specific loci, in ASD. These findings, similar to those recently described in schizophrenia26, suggest that at least some of these ASD-CNVs (and the genes that they affect) are under purifying selection27. Genes previously implicated in ASD by rare variant findings have pointed to functional themes in ASD pathophysiology6,28. Molecules such as NRXN1, NLGN3/4X and SHANK3, localized presynaptically or at the post-synaptic density (PSD), highlight maturation and function of glutamatergic synapses. Our data reveal SHANK2, SYNGAP1 and DLGAP2 as new ASD loci, which also encode proteins in the PSD. We also found ID genes to be important in ASD29. Furthermore, our functional enrichment map identifies new groups such as GTPase/Ras, effectively expanding both the number and connectivity of modules that may be involved in ASD. The next steps will be to relate defects or patterns of alterations in these groups to ASD endophenotypes. The combined identification of higher-penetrance rare variants and new biological pathways, including those identified in this study, may broaden the targets amenable to genetic testing and therapeutic intervention.
Cases were classified using the Autism Diagnostic Interview-Revised (ADI-R) and Autism Diagnostic Observation Schedule (ADOS) instruments and those with known karyotypic abnormalities or genetic disorders were excluded. Informed consent was obtained from all families and procedures had approval from institutional review boards. DNA was obtained from blood or buccal-swabs (73% of cases; 75% of controls) or cell-lines (22% of cases; 25% of controls) (in 5% of cases the DNA source was not identified). The 1,287 EA controls passing all QC-filters included 1,261 individuals recruited as controls for the study of addiction (SAGE)15 and 26 HapMap samples (from Illumina). An additional 3,677 EA controls from three separate studies genotyped on other platforms were also used. Raw data from ASD family (Accession pending) and SAGE control (Accession: phs000092.v1.p1) genotyping are at NCBI dbGAP. CNVs were analysed using PLINK v1.0730, R stats and custom scripts. Primary analyses were robust to potential systematic measurement differences between cases and controls; it was not possible to control for site but we controlled for the overall extent and number of CNVs for all burden comparisons, and obtained a consistent enriched gene count in ASD cases compared to controls.
The authors gratefully acknowledge the families participating in the study and the main funders of the Autism Genome Project Consortium (AGP): Autism Speaks (USA), the Health Research Board (HRB; Ireland), The Medical Research Council (MRC; UK), Genome Canada/Ontario Genomics Institute, and the Hilibrand Foundation (USA). Additional support for individual groups was provided by the US National Institutes of Health (NIH grants: HD055751, HD055782, HD055784, HD35465, MH52708, MH55284, MH57881, MH061009, MH06359, MH066673, MH080647, MH081754, MH66766, NS026630, NS042165, NS049261), the Canadian Institutes for Health Research (CIHR), Assistance Publique - Hôpitaux de Paris (France), Autism Speaks UK, Canada Foundation for Innovation/Ontario Innovation Trust, Deutsche Forschungsgemeinschaft (grant: Po 255/17-4) (Germany), EC Sixth FP AUTISM MOLGEN, Fundação Calouste Gulbenkian (Portugal), Fondation de France, Fondation FondaMental (France), Fondation Orange (France), Fondation pour la Recherche Médicale (France), Fundação para a Ciência e Tecnologia (Portugal), the Hospital for Sick Children Foundation and University of Toronto (Canada), INSERM (France), Institut Pasteur (France), the Italian Ministry of Health (convention 181 of 19.10.2001), the John P Hussman Foundation (USA), McLaughlin Centre (Canada), Ontario Ministry of Research and Innovation (Canada), the Seaver Foundation (USA), the Swedish Science Council, The Centre for Applied Genomics (Canada), the Utah Autism Foundation (USA) and the Wellcome Trust core award 075491/Z/04 (UK). D.P. is supported by fellowships from the Royal Netherlands Academy of Arts and Sciences (TMF/DA/5801) and the Netherlands Organization for Scientific Research (Rubicon 825.06.031). S.W.S holds the GlaxoSmithKline-CIHR Pathfinder Chair in Genetics and Genomics at the University of Toronto and the Hospital for Sick Children (Canada).
Author contributions D.P., J.D.B., R.M.C., E.H.C., H.C., M.C., B.D., S.E., L.G., D.H.G., M.G., J.L.H., J.H., J.M., A.P.M., J.I.N.Jr., A.D.P., M.A.P.-V., G.D.S., P.S., A.M.V., V.J.V., E.M.W., J.S.S., C.B. and S.W.S. were leading contributors in the design, analysis and writing of this study.
Author Information Reprints and permissions information is available at www.nature.com/reprints.