Detection of CNVs in ASD cases and controls
Of the 696 unrelated ASD cases examined by array CGH, 20 were found to carry CNVs larger than 5 Mb (Supporting Information,
Table S1) and excluded from further analysis. Four of these cases were known to have Down syndrome as well as an ASD diagnosis, 14 carried large chromosomal abnormalities previously detected through karyotyping and genotyping with SNP microarrays, and two cases harbored cell line artifacts. The rare stringent CNVs from the remaining 676 unrelated ASD cases were used for gene burden analysis, gene-set association tests and comparison with CNV data from other SNP microarrays to identify novel CNVs (, ).
| Table 1Summary statistics of stringent CNVs found in ASD and PDx control data sets |
We also examined the CNV content of the 1000 PDx controls and observed a significantly higher average number of CNVs per sample (p value < 2.2e-16) compared with the ASD cases (). This finding is likely attributable to the use of a different reference DNA, a single sex-matched individual, in the PDx CGH experiments rather than a pool of 50 sex-matched individuals used as a reference in the ASD CGH experiments. However, when we focused on rare variants (as defined in the materials and methods section), we found a smaller yet significant difference in the opposite direction (p value 0.002287; ). The presence of a significant change in the relation between class (ASD, control) and CNV number after restricting to rare variants was further confirmed using a quasi-Poisson and linear regression model (class variable and rare variant filter variable interaction p value < 2.2e-16). Of the 676 unrelated ASD cases and 1000 PDx controls, 630 ASD cases (93%) and 896 PDx controls (90%) had at least one rare variant detected on the CGH 1M array ().
| Table 2Summary statistics of rare CNVs found in ASD and PDx control data sets |
CNV comparison with other microarray platforms
Of the 676 unrelated ASD cases, 615 were genotyped previously with SNP microarrays including 26 cases on Illumina Human Omni 2.5M-quad (2.5 million probes), 234 cases on Affymetrix Genome-Wide Human SNP Array 6.0 (1.8 million probes), 262 cases on Illumina Human 1M single infinium chip (1 million probes), 11 cases on Illumina 1M duo array (1 million probes), and 82 ASD cases were genotyped on lower resolution Affymetrix Mapping 500K chip set (500,000 probes). The other 61 unrelated ASD cases were only run on the Agilent 1M array and thus not included in this CNV platform comparison analysis. For the 615 samples run on both SNP and CGH array platforms, we performed a sample-level 50% one-way overlap of stringent Agilent 1M CNVs with the stringent CNVs from the SNP array platforms. We found that 64% of the Agilent 1M CNVs are novel with respect to the CNVs detected on the SNP microarrays (). A more detailed comparison of CNVs detected by the 1M CGH array vs. the various SNP array platforms is shown in . For example, a comparison between CNVs detected using similar resolution arrays Agilent 1M and Illumina 1M showed that, on average, 24 novel CNVs/sample were detected by the Agilent 1M CGH array whereas only eight novel CNVs/sample were detected by the Illumina 1M SNP array. This platform comparison analysis suggests that use of multiple microarray platforms provides complementary data as every microarray platform detects a unique set of novel CNVs.
| Table 3Details of platform comparison |
Rare ASD CNVs that were not detected by SNP microarrays but were detected by the Agilent 1M CGH array were defined as novel rare CNVs (946 of 1884 rare CNVs detected on the CGH 1M array). We examined the size distribution of these 946 novel rare CNVs detected in the ASD cohort and found that approximately 75% of them are <30 kb in size (
Figure S1). These smaller CNVs that went undetected by SNP microarrays were missed due to insufficient probes coverage at these loci and because of the lower signal-to-noise often observed for SNP array platforms (
e.g., SNP-based arrays are optimized for robust genotyping call rates, which may minimize quantitative probe response to copy number variation). Only 25% of the novel, rare CNVs were >30 kb in size and were mostly missed by previous studies due to insufficient probe coverage, CNV-calling parameters and analysis algorithms chosen to define a CNV as stringent.
Rare ASD-specific CNVs
Rare ASD-specific CNVs were defined using a total of 5139 controls (1000 PDx controls summarized in and 4139 in-house controls previously reported in
Krawczak et al. 2006,
Stewart et al. 2009, and
Bierut et al. 2010). A total of 1884 rare CNVs were detected in ASD cases, of which 946 of them were novel as compared with previous SNP microarray studies (Table S2). Using qPCR, we validated 117 of 132 (88.6%) novel and rare stringent CNVs that were tested. Of the 946 novel rare ASD CNVs, 57 CNVs are reported in that correspond to overlapping CNVs in two or more unrelated ASD cases (32 cases at 14 loci), recurrent CNVs (
i.e., same breakpoints) in two or more unrelated ASD cases (24 cases at 11 loci), or are a
de novo event [1 case ()]. Some of the overlapping/recurrent CNVs impacted previously identified ASD genes such as
DPYD,
RGS7,
NRXN1,
CNTNAP5,
ERBB4,
GRM8,
NRXN3,
YWHAE, and
DMD (
Serajee et al. 2003;
Wu et al. 2005;
Autism Genome Project Consortium 2007;
Marshall et al. 2008;
Bruno et al. 2010;
Pagnamenta et al. 2010;
Pinto et al. 2010;
Carter et al. 2011;
Vaags et al. 2012), whereas others were novel, including
RERE,
NCKAP5,
ROBO2,
DAPP1,
POT1,
LEP,
PLXNA4,
CHRNB3,
ZNF517,
MIR3910-1/MIR3910-2,
CIB2,
MMP25/IL32,
MYH4,
RAB3A/MPV17L2,
SAE1, and
SYAP1 ( and ).
| Table 4Novel recurrent/overlapping ASD CNVs (≥ 2 cases) and a de novo CNV |
One of the ASD-specific CNVs was a maternally inherited duplication at 15q25.1 in three unrelated ASD cases (117395L, 94478, 132199L; ) disrupting the exon of
CIB2 (Calcium and integrin binding family member 2; 3/696 cases
vs. 0/5139 controls; FET two-tailed
p = 0.001691). The transcript and protein of
CIB2 gene is found to be present mainly in the hippocampus and cortex of the brain (
Blazejczyk et al. 2009). The encoded protein of this gene is shown to be involved in Ca
2+ signaling, which controls a variety of processes in many cell types. In neurons, Ca2
+ signaling maintains synaptic transmission, neuronal development and plasticity (
Blazejczyk et al. 2009).
In another two unrelated ASD cases (115813L, 117463L), we identified a recurrent CNV of size 45.7 kb (2/696 cases vs. 0/5,139 controls; FET two-tailed p = 0.01421). The duplication disrupts four exons of the DAPP1 (dual adaptor of phosphotyrosine and 3-phosphoinositides) gene at the 4q23 () and the gene is suggested to be involved in signal transduction processes.
Another interesting recurrent CNV of size 24.8 kb was detected in two unrelated ASD cases (68672, 50800L), a duplication at 17p13.3 disrupting two exons of
YWHAE (tyrosine 3/tryptophan 5-monooxygenase gene) gene, which presumably act via haploinsufficiency (). It was maternally inherited in both ASD cases, and an equivalent event based on our definition of overlap was not present in controls (2/696 cases
vs. 0/5139 controls; FET two-tailed
p = 0.01421).
YWHAE belongs to the 14-3-3 family of proteins, which mediate signal transduction, and is highly conserved in both plants and mammals. Only microduplications in
YWHAE gene have been reported in ASD. It has been shown that the phenotype of patients with a 17p13.3 microduplication involving
YWHAE gene show autistic manifestation, behavioral symptoms, speech and motor delay, subtle dysmorphic facial features, and subtle hand-foot malformations (
Bruno et al. 2010). It is also noteworthy that there were larger CNVs found in two PDx controls to intersect
YWHAE. The two ASD cases were both of Asian descent, and we also have found other cases and controls of Asian descent bearing
YWHAE-CNVs (M. Gazzellone, unpublished data), suggesting that the role of this gene in ASD need to be further explored.
In three unrelated male ASD probands (44644, 124475, 45554), we observed a recurrent novel CNV, a 24.7-kb duplication encompassing two exons of the
SAE1 (SUMO1 activating enzyme subunit 1) gene at the 19q13.32 locus (). The same CNV also was found in one control (3/696 cases
vs. 1/5139 controls; FET two-tailed
p = 0.00616). Interestingly, another duplication of size 50.8 kb disrupting six exons of
SAE1 was observed in a fourth unrelated ASD case (90278) in the present study and was also detected by previous SNP microarray study (
Lionel et al. 2011). The
SAE1 gene is involved in protein sumoylation process and is shown to interact with the
ARX gene, which is involved in Autistic disorder (
Sherr 2003;
Rual et al. 2005;
Ewing et al. 2007;
Gareau and Lima 2010;
Wilkinson et al. 2010;
Szklarczyk et al. 2011).
In two unrelated cases (69180, 59144), we identified overlapping CNVs impacting the
PLXNA4 (plexin A4) gene at the 7q32.3 locus (). One CNV is a 14.2-kb loss encompassing an exon of the gene and based on our overlap criteria is not observed in 5,139 controls, while the second CNV is a 15.5-kb gain encompassing untranslated regions of the gene and is observed in 2 of 5139 controls.
PLXNA4 is involved in axon guidance as well as nervous system development (
Suto et al. 2003;
Miyashita et al. 2004).
In the present study, only one
de novo CNV was found (all other qPCR-validated CNVs, 116 of 117, were inherited from either parent) a 36-kb loss encompassing the intron of the
GPHN (Gephyrin) gene at the 14q23.3 locus. This
de novo CNV () was found in a male ASD proband (103018L) and was not picked up on the previous SNP array (
Pinto et al. 2010), and it was not found in any of the 5139 controls. Gephyrin is suggested to play a central organizer role in assembling and stabilizing inhibitory postsynaptic membranes in human brain (
Waldvogel et al. 2003). In our other unpublished data, we have also identified another deletion encompassing several exons of
GPHN in an unrelated ASD case and a
de novo deletion encompassing exons of the gene in a schizophrenia case suggesting that
GPHN gene could be a novel susceptibility gene playing a more general role in neurodevelopmental disorders. We believe the lack of novel, rare
de novo CNVs captured in the present study is simply due to our study design because nearly all the
de novo CNVs in this ASD cohort are relatively larger in size and therefore were already detected using SNP microarrays (
Marshall et al. 2008;
Pinto et al. 2010,
Lionel et al. 2011, A. C. Lionel, unpublished data,
i.e., not reported in the present study).
We also detected other novel, rare CNVs present in only one unrelated ASD case () in previously identified genes associated with ASD such as
CTNND2,
CDH18,
PARK2,
NXPH1,
MTHFD1, and
NF1 [ (
Williams and Hersh 1998;
Marui et al. 2004;
Glessner et al. 2009;
Pinto et al. 2010;
Griswold et al. 2011;
Salyakina et al. 2011)]. Novel, rare CNVs occurring in genes known to be associated with other neurodevelopmental disorders (
e.g.,
KIRREL3) or potentially playing a role in neurodevelopment also were found (
e.g.,
CTNNA2,
NDST1,
SLC24A2,
NFIB,
APLP2,
ATP2C2,
CECR2,
DAGLA, and
UPB1).
| Table 5Other singleton novel rare CNVs |
A paternally inherited 7.9-kb deletion disrupting an exon of the
SLC24A2 [solute carrier family 24 (sodium/potassium/calcium exchanger), member 2] gene was observed at 9p22.1 in a male ASD case (61180-L; ) but in none of the controls. The
SLC24A2 gene may play a role in neuronal plasticity (
Li et al. 2006).
In a female ASD case (118909L), we observed a 7.2-kb loss disrupting an exon of the
CECR2 (cat eye syndrome chromosome region, candidate 2) gene at the 22q11.21 locus (), which was not observed in controls. We also detected a CNV in the same gene in ASD case 124632L reported in another study (
Pinto et al. 2010).
CECR2 is a chromatin remodeling factor that has been proposed to play a role in embryonic nervous system development (
Banting et al. 2005).
In another unrelated male ASD proband (72816L), we have identified a 15-kb paternally inherited deletion () disrupting seven exons of the
DAGLA (diacylglycerol lipase, alpha) gene at the 11q12.2 locus that is not found in controls.
DAGLA is known to synthesize an endocannabinoid that has been associated with retrograde synaptic signaling and plasticity (
Gao et al. 2010).
Another interesting gene is
UPB1 (ureidopropionase, beta) located at the 22q11.23 locus, in which a 6.7-kb deletion disrupting two exons has been found in a male ASD proband (154266L; ). The deletion was not observed in any of 5139 controls. This gene is involved in the last step of the pyrimidine degradation pathway and deficiencies in
UPB1 have been associated with developmental delay (
van Kuilenburg et al. 2004).
Finally, our study includes analysis of two ASD cases not previously run on SNP microarrays. In both cases, large-sized CNVs previously associated with ASD were found, a 546-kb maternally inherited 16p11.2 duplication (ASD case 100564) and a 656-kb maternally inherited 22q11.22-q11.23 duplication (ASD case MM0177-3) that overlaps with the 22q distal deletion region (
Figure S2).
Global rare CNV burden analyses
To investigate global differences in CNV burden, we assessed the distribution of three CNV statistics in ancestry-matched European ASD case subjects compared with control subjects: (1) subject CNV number, (2) subject total CNV length, and (3) subject total number of CNV genes.
We observed significant differences in CNV number and total gene number for deletions but not for duplications (significance threshold: Wilcoxon test p value < 0.01; ). In terms of magnitude, the total number of deletion genes was the largest difference found between the ASD and control subjects (ratio of means: 1.77). Considering that subjects were matched by platform type and other essential parameters, and also considering that previous authors found a stronger difference in deletion burden rather than duplication, the differences observed very likely translate to real biological differences. In addition, the relative difference in total gene number was larger than the relative difference in CNV number, an effect that is harder to explain by technical or experimental factors; because of the relatively large sample size, it is important to consider both the significance and magnitude of burden differences.
| Table 6Global burden of rare CNVs in European ASD cases vs. PDx controls |
We also assessed whether rare CNVs in genes that are causally implicated in ASD were enriched in cases over controls. The ASD gene list used comprised 110 genes compiled from the peer-reviewed literature (
Betancur 2011). We observed significant enrichment of deletions impacting genes implicated in ASD [
p = 8.896e-05, odds ratio = 20.59 with 95% confidence interval = 2.95-888.96] in ASD cases than controls (). There were 11 ASD cases with deletions in ASD candidate genes (
Table S3) and all were experimentally validated by qPCR except three false-positive CNVs overlapping the
SYNGAP1 gene. The one
SYNGAP1 CNV that did validate is a
de novo 112 kb deletion, which was described previously in the Pinto
et al. 2010 study that disrupts
SYNGAP1 and encompasses four other genes. After removing the three false-positive CNVs in the
SYNGAP1 gene and testing for enrichment again, we still observed significant enrichment of deletions in genes implicated in ASD (
p = 0.001585, odds ratio = 14.74 with 95% CI = 1.95−656.52) in ASD cases as compared with controls. The other ASD cases where we observed rare exonic loss in ASD candidate genes were
PTCHD1 (
Marshall et al. 2008;
Pinto et al. 2010),
VPS13B (
Pinto et al. 2010),
DMD (
Pinto et al. 2010),
DPYD (novel to this study and as described in
Carter et al. 2011),
SHANK2 (
Pinto et al. 2010),
NF1 (novel to this study) and
NRXN1 (ILMN Omni 2.5 M array; A. C. Lionel, unpublished data).
| Table 7CNV burden in known ASD genes in cases vs. PDx controls |