Detection of CNVs in ASD cases and controls
Of the 696 unrelated ASD cases examined by array CGH, 20 were found to carry CNVs larger than 5 Mb (Supporting Information, Table S1
) and excluded from further analysis. Four of these cases were known to have Down syndrome as well as an ASD diagnosis, 14 carried large chromosomal abnormalities previously detected through karyotyping and genotyping with SNP microarrays, and two cases harbored cell line artifacts. The rare stringent CNVs from the remaining 676 unrelated ASD cases were used for gene burden analysis, gene-set association tests and comparison with CNV data from other SNP microarrays to identify novel CNVs (, ).
Figure 1 CNV analysis workflow. The ASD cases and controls were typed using the Agilent 1M CGH array, and CNVs were identified using two algorithms, DNA Analytics and DNAcopy. CNVs detected by both algorithms were defined as the stringent dataset and were used (more ...)
Summary statistics of stringent CNVs found in ASD and PDx control data sets
We also examined the CNV content of the 1000 PDx controls and observed a significantly higher average number of CNVs per sample (p value < 2.2e-16) compared with the ASD cases (). This finding is likely attributable to the use of a different reference DNA, a single sex-matched individual, in the PDx CGH experiments rather than a pool of 50 sex-matched individuals used as a reference in the ASD CGH experiments. However, when we focused on rare variants (as defined in the materials and methods section), we found a smaller yet significant difference in the opposite direction (p value 0.002287; ). The presence of a significant change in the relation between class (ASD, control) and CNV number after restricting to rare variants was further confirmed using a quasi-Poisson and linear regression model (class variable and rare variant filter variable interaction p value < 2.2e-16). Of the 676 unrelated ASD cases and 1000 PDx controls, 630 ASD cases (93%) and 896 PDx controls (90%) had at least one rare variant detected on the CGH 1M array ().
Summary statistics of rare CNVs found in ASD and PDx control data sets
CNV comparison with other microarray platforms
Of the 676 unrelated ASD cases, 615 were genotyped previously with SNP microarrays including 26 cases on Illumina Human Omni 2.5M-quad (2.5 million probes), 234 cases on Affymetrix Genome-Wide Human SNP Array 6.0 (1.8 million probes), 262 cases on Illumina Human 1M single infinium chip (1 million probes), 11 cases on Illumina 1M duo array (1 million probes), and 82 ASD cases were genotyped on lower resolution Affymetrix Mapping 500K chip set (500,000 probes). The other 61 unrelated ASD cases were only run on the Agilent 1M array and thus not included in this CNV platform comparison analysis. For the 615 samples run on both SNP and CGH array platforms, we performed a sample-level 50% one-way overlap of stringent Agilent 1M CNVs with the stringent CNVs from the SNP array platforms. We found that 64% of the Agilent 1M CNVs are novel with respect to the CNVs detected on the SNP microarrays (). A more detailed comparison of CNVs detected by the 1M CGH array vs. the various SNP array platforms is shown in . For example, a comparison between CNVs detected using similar resolution arrays Agilent 1M and Illumina 1M showed that, on average, 24 novel CNVs/sample were detected by the Agilent 1M CGH array whereas only eight novel CNVs/sample were detected by the Illumina 1M SNP array. This platform comparison analysis suggests that use of multiple microarray platforms provides complementary data as every microarray platform detects a unique set of novel CNVs.
Figure 2 A Venn diagram showing comparison of Agilent 1M CNV calls with those detected by other SNP microarray platforms including Illumina 1M single/duo array, Affymetrix500K, Affymetrix6.0, and Illumina Omni 2.5M array for a total of 615 samples. Agilent 1M (more ...)
Details of platform comparison
Rare ASD CNVs that were not detected by SNP microarrays but were detected by the Agilent 1M CGH array were defined as novel rare CNVs (946 of 1884 rare CNVs detected on the CGH 1M array). We examined the size distribution of these 946 novel rare CNVs detected in the ASD cohort and found that approximately 75% of them are <30 kb in size (Figure S1
). These smaller CNVs that went undetected by SNP microarrays were missed due to insufficient probes coverage at these loci and because of the lower signal-to-noise often observed for SNP array platforms (e.g.
, SNP-based arrays are optimized for robust genotyping call rates, which may minimize quantitative probe response to copy number variation). Only 25% of the novel, rare CNVs were >30 kb in size and were mostly missed by previous studies due to insufficient probe coverage, CNV-calling parameters and analysis algorithms chosen to define a CNV as stringent.
Rare ASD-specific CNVs
Rare ASD-specific CNVs were defined using a total of 5139 controls (1000 PDx controls summarized in and 4139 in-house controls previously reported in Krawczak et al. 2006
, Stewart et al. 2009
, and Bierut et al. 2010
). A total of 1884 rare CNVs were detected in ASD cases, of which 946 of them were novel as compared with previous SNP microarray studies (Table S2). Using qPCR, we validated 117 of 132 (88.6%) novel and rare stringent CNVs that were tested. Of the 946 novel rare ASD CNVs, 57 CNVs are reported in that correspond to overlapping CNVs in two or more unrelated ASD cases (32 cases at 14 loci), recurrent CNVs (i.e.
, same breakpoints) in two or more unrelated ASD cases (24 cases at 11 loci), or are a de novo
event [1 case ()]. Some of the overlapping/recurrent CNVs impacted previously identified ASD genes such as DPYD
, and DMD
(Serajee et al. 2003
; Wu et al. 2005
; Autism Genome Project Consortium 2007
; Marshall et al. 2008
; Bruno et al. 2010
; Pagnamenta et al. 2010
; Pinto et al. 2010
; Carter et al. 2011
; Vaags et al. 2012
), whereas others were novel, including RERE
, and SYAP1
( and ).
Novel recurrent/overlapping ASD CNVs (≥ 2 cases) and a de novo CNV
Figure 3 Pedigrees (A-Q) represent ASD families with overlapping/recurrent CNVs in novel loci and a de novo CNV event (from ). The open symbols represent unaffected individuals, filled symbols represent individuals with ASD diagnosis and arrows indicate (more ...)
Figure 4 Genome browser views of a subset of novel rare CNVs occurring in two or more ASD cases or are de novo (from ). The genome coordinates are from genome build 36 (hg18) and, in each panel, ASD case ID numbers are listed next to blue bars (denoting (more ...)
One of the ASD-specific CNVs was a maternally inherited duplication at 15q25.1 in three unrelated ASD cases (117395L, 94478, 132199L; ) disrupting the exon of CIB2
(Calcium and integrin binding family member 2; 3/696 cases vs.
0/5139 controls; FET two-tailed p
= 0.001691). The transcript and protein of CIB2
gene is found to be present mainly in the hippocampus and cortex of the brain (Blazejczyk et al. 2009
). The encoded protein of this gene is shown to be involved in Ca2+
signaling, which controls a variety of processes in many cell types. In neurons, Ca2+
signaling maintains synaptic transmission, neuronal development and plasticity (Blazejczyk et al. 2009
In another two unrelated ASD cases (115813L, 117463L), we identified a recurrent CNV of size 45.7 kb (2/696 cases vs. 0/5,139 controls; FET two-tailed p = 0.01421). The duplication disrupts four exons of the DAPP1 (dual adaptor of phosphotyrosine and 3-phosphoinositides) gene at the 4q23 () and the gene is suggested to be involved in signal transduction processes.
Another interesting recurrent CNV of size 24.8 kb was detected in two unrelated ASD cases (68672, 50800L), a duplication at 17p13.3 disrupting two exons of YWHAE
(tyrosine 3/tryptophan 5-monooxygenase gene) gene, which presumably act via haploinsufficiency (). It was maternally inherited in both ASD cases, and an equivalent event based on our definition of overlap was not present in controls (2/696 cases vs.
0/5139 controls; FET two-tailed p
= 0.01421). YWHAE
belongs to the 14-3-3 family of proteins, which mediate signal transduction, and is highly conserved in both plants and mammals. Only microduplications in YWHAE
gene have been reported in ASD. It has been shown that the phenotype of patients with a 17p13.3 microduplication involving YWHAE
gene show autistic manifestation, behavioral symptoms, speech and motor delay, subtle dysmorphic facial features, and subtle hand-foot malformations (Bruno et al. 2010
). It is also noteworthy that there were larger CNVs found in two PDx controls to intersect YWHAE
. The two ASD cases were both of Asian descent, and we also have found other cases and controls of Asian descent bearing YWHAE
-CNVs (M. Gazzellone, unpublished data), suggesting that the role of this gene in ASD need to be further explored.
In three unrelated male ASD probands (44644, 124475, 45554), we observed a recurrent novel CNV, a 24.7-kb duplication encompassing two exons of the SAE1
(SUMO1 activating enzyme subunit 1) gene at the 19q13.32 locus (). The same CNV also was found in one control (3/696 cases vs.
1/5139 controls; FET two-tailed p
= 0.00616). Interestingly, another duplication of size 50.8 kb disrupting six exons of SAE1
was observed in a fourth unrelated ASD case (90278) in the present study and was also detected by previous SNP microarray study (Lionel et al. 2011
). The SAE1
gene is involved in protein sumoylation process and is shown to interact with the ARX
gene, which is involved in Autistic disorder (Sherr 2003
; Rual et al. 2005
; Ewing et al. 2007
; Gareau and Lima 2010
; Wilkinson et al. 2010
; Szklarczyk et al. 2011
In two unrelated cases (69180, 59144), we identified overlapping CNVs impacting the PLXNA4
(plexin A4) gene at the 7q32.3 locus (). One CNV is a 14.2-kb loss encompassing an exon of the gene and based on our overlap criteria is not observed in 5,139 controls, while the second CNV is a 15.5-kb gain encompassing untranslated regions of the gene and is observed in 2 of 5139 controls. PLXNA4
is involved in axon guidance as well as nervous system development (Suto et al. 2003
; Miyashita et al. 2004
In the present study, only one de novo
CNV was found (all other qPCR-validated CNVs, 116 of 117, were inherited from either parent) a 36-kb loss encompassing the intron of the GPHN
(Gephyrin) gene at the 14q23.3 locus. This de novo
CNV () was found in a male ASD proband (103018L) and was not picked up on the previous SNP array (Pinto et al. 2010
), and it was not found in any of the 5139 controls. Gephyrin is suggested to play a central organizer role in assembling and stabilizing inhibitory postsynaptic membranes in human brain (Waldvogel et al. 2003
). In our other unpublished data, we have also identified another deletion encompassing several exons of GPHN
in an unrelated ASD case and a de novo
deletion encompassing exons of the gene in a schizophrenia case suggesting that GPHN
gene could be a novel susceptibility gene playing a more general role in neurodevelopmental disorders. We believe the lack of novel, rare de novo
CNVs captured in the present study is simply due to our study design because nearly all the de novo
CNVs in this ASD cohort are relatively larger in size and therefore were already detected using SNP microarrays (Marshall et al. 2008
; Pinto et al. 2010
, Lionel et al. 2011
, A. C. Lionel, unpublished data, i.e.
, not reported in the present study).
We also detected other novel, rare CNVs present in only one unrelated ASD case () in previously identified genes associated with ASD such as CTNND2
, and NF1
[ (Williams and Hersh 1998
; Marui et al. 2004
; Glessner et al. 2009
; Pinto et al. 2010
; Griswold et al. 2011
; Salyakina et al. 2011
)]. Novel, rare CNVs occurring in genes known to be associated with other neurodevelopmental disorders (e.g.
) or potentially playing a role in neurodevelopment also were found (e.g.
, and UPB1
Other singleton novel rare CNVs
A paternally inherited 7.9-kb deletion disrupting an exon of the SLC24A2
[solute carrier family 24 (sodium/potassium/calcium exchanger), member 2] gene was observed at 9p22.1 in a male ASD case (61180-L; ) but in none of the controls. The SLC24A2
gene may play a role in neuronal plasticity (Li et al. 2006
Figure 5 Genome browser view of a subset of the novel rare CNVs found in one ASD case that impact one or more exons (). The genome coordinates are from genome build 36 (hg18) and, in each panel, the ASD case ID number is listed next to a red bar (denoting (more ...)
In a female ASD case (118909L), we observed a 7.2-kb loss disrupting an exon of the CECR2
(cat eye syndrome chromosome region, candidate 2) gene at the 22q11.21 locus (), which was not observed in controls. We also detected a CNV in the same gene in ASD case 124632L reported in another study (Pinto et al. 2010
is a chromatin remodeling factor that has been proposed to play a role in embryonic nervous system development (Banting et al. 2005
In another unrelated male ASD proband (72816L), we have identified a 15-kb paternally inherited deletion () disrupting seven exons of the DAGLA
(diacylglycerol lipase, alpha) gene at the 11q12.2 locus that is not found in controls. DAGLA
is known to synthesize an endocannabinoid that has been associated with retrograde synaptic signaling and plasticity (Gao et al. 2010
Another interesting gene is UPB1
(ureidopropionase, beta) located at the 22q11.23 locus, in which a 6.7-kb deletion disrupting two exons has been found in a male ASD proband (154266L; ). The deletion was not observed in any of 5139 controls. This gene is involved in the last step of the pyrimidine degradation pathway and deficiencies in UPB1
have been associated with developmental delay (van Kuilenburg et al. 2004
Finally, our study includes analysis of two ASD cases not previously run on SNP microarrays. In both cases, large-sized CNVs previously associated with ASD were found, a 546-kb maternally inherited 16p11.2 duplication (ASD case 100564) and a 656-kb maternally inherited 22q11.22-q11.23 duplication (ASD case MM0177-3) that overlaps with the 22q distal deletion region (Figure S2
Global rare CNV burden analyses
To investigate global differences in CNV burden, we assessed the distribution of three CNV statistics in ancestry-matched European ASD case subjects compared with control subjects: (1) subject CNV number, (2) subject total CNV length, and (3) subject total number of CNV genes.
We observed significant differences in CNV number and total gene number for deletions but not for duplications (significance threshold: Wilcoxon test p value < 0.01; ). In terms of magnitude, the total number of deletion genes was the largest difference found between the ASD and control subjects (ratio of means: 1.77). Considering that subjects were matched by platform type and other essential parameters, and also considering that previous authors found a stronger difference in deletion burden rather than duplication, the differences observed very likely translate to real biological differences. In addition, the relative difference in total gene number was larger than the relative difference in CNV number, an effect that is harder to explain by technical or experimental factors; because of the relatively large sample size, it is important to consider both the significance and magnitude of burden differences.
Global burden of rare CNVs in European ASD cases vs. PDx controls
We also assessed whether rare CNVs in genes that are causally implicated in ASD were enriched in cases over controls. The ASD gene list used comprised 110 genes compiled from the peer-reviewed literature (Betancur 2011
). We observed significant enrichment of deletions impacting genes implicated in ASD [p
= 8.896e-05, odds ratio = 20.59 with 95% confidence interval = 2.95-888.96] in ASD cases than controls (). There were 11 ASD cases with deletions in ASD candidate genes (Table S3
) and all were experimentally validated by qPCR except three false-positive CNVs overlapping the SYNGAP1
gene. The one SYNGAP1
CNV that did validate is a de novo
112 kb deletion, which was described previously in the Pinto et al.
2010 study that disrupts SYNGAP1
and encompasses four other genes. After removing the three false-positive CNVs in the SYNGAP1
gene and testing for enrichment again, we still observed significant enrichment of deletions in genes implicated in ASD (p
= 0.001585, odds ratio = 14.74 with 95% CI = 1.95−656.52) in ASD cases as compared with controls. The other ASD cases where we observed rare exonic loss in ASD candidate genes were PTCHD1
(Marshall et al. 2008
; Pinto et al. 2010
(Pinto et al. 2010
(Pinto et al. 2010
(novel to this study and as described in Carter et al. 2011
(Pinto et al. 2010
(novel to this study) and NRXN1
(ILMN Omni 2.5 M array; A. C. Lionel, unpublished data).
CNV burden in known ASD genes in cases vs. PDx controls