We began by identifying protein partners of ASD-associated proteins (
1–
3) and determining whether any of them interacted with other ASD proteins. We selected genes from three groups: those whose mutation results in syndromic ASDs (“syndromic ASD proteins”,
Table S1), those whose mutation causes severe language delay, and those whose products are paralogs, known binding partners, or functionally related to syndromic ASD proteins. We refer to the second and third groups as “ASD-associated” genes in order to distinguish them from the first group, syndromic ASD proteins (
Table S1). Although language delay can occur without accompanying autistic features, and the reasons for language delay may be as heterogeneous as the ASDs themselves, we reasoned that language development might depend on the same pathways as those involved in social communication deficits observed in ASDs.
We performed a yeast-two hybrid (Y2H) screen of a human cDNA library using 192 bait fragments for 35 gene products, each of which encoded either full-length or partial segments of coding sequences. After a series of stringent tests (
18,
22), we obtained 7,933 interacting prey clones, which belonged to 783 unique proteins; 539 passed a second round of testing in yeast to demonstrate interactions held up in an independent reconstitution system (
22). We considered only these 539 proteins as candidate binary interacting partners of the bait proteins. These 539 proteins comprised 848 interactions with 26 syndromic ASD proteins or ASD-associated proteins (
Tables S1 and S2). Among them, only 32 interactions (4%) were previously reported (
Tables S2 and S3). Baits for nine proteins failed to identify definite binding partners in this stringent Y2H screen (
Table S1).
To validate the interaction data in a mammalian system, we performed glutathione-sepharose affinity co-purifications (GST-APs) in HEK293T cells for 52 randomly selected interactions (6% of the total) (
Fig. S1). The mammalian cells recapitulated 44 out of 52 interactions (85%) (
Fig. S1 and
Table S2). In general,
bona fide binary interactions do not validate at more than 50% (
23–
25) when using different assay systems; our unusually high validation rate thus supports the reliability of our stringent screening methods compared to prior interactomes (
18,
22,
26). We did not remove candidate pairs from the screening data (
Table S2) when they failed in the validation assays, because the Y2H screen is excellent at detecting transient and biologically relevant interactions that are often difficult to recapitulate in co-immunoprecipitation and affinity purification systems (
18,
22–
25).
Using the interaction data from the Y2H screen (
Table S2), we generated an ASD protein interaction network (). Twenty-four out of the 26 syndromic ASD or associated proteins were interconnected in one major component (CDKL5 and NF1 were located outside) (). The Fragile X-related proteins 1 and 2 (FXR1 and FXR2) showed the greatest connectivity, and were also connected to the Fragile X mental retardation protein (FMRP) (). The ASD network thus recapitulates and expands previously described
in vivo associations and functional relationships among the Fragile X-related proteins (
27,
28). Similarly, we expanded previously established relationships among the tuberous sclerosis complex proteins 1 and 2 (TSC1 and TSC2) (
29) by identifying four new partners ( and
Table S2). We also confirmed that the post-synaptic proteins SHANK3 and PSD95 interact
in vivo (
30,
31) and identified nine shared partners between them (ACTN2, CLU, DLGAP1, DLGAP3, DLGAP4, HNRNPC, LZTS2, PICK1, and SYNGAP1 ).
Notably, our ASD interactome revealed previously unsuspected connectivity between two syndromic ASD proteins, SHANK3 and TSC1, which share at least 21 partners (). SHANK1, the paralog of SHANK3, arose as a potential partner of both TSC1 and SHANK3, which suggested that these syndromic ASD proteins interact in a complex at the postsynaptic compartments—the microenvironment of dendritic spines beneath the specialized membrane structure of the synapse (
3,
32) —a suggestion confirmed
in vivo by co-immunoprecipitation using mouse brain extracts ().
In vivo studies further confirmed that ACTN1, a postsynaptic scaffold protein identified as a partner of SHANK3 and TSC1 (
Table S2 and ), interacts with TSC1 as well as with the postsynaptic scaffolding proteins SHANK3 and HOMER3 (). Further, the Y2H screen recapitulated eleven previously reported
in vivo interactions (
Tables S2 and S3), which we obtained from the Human Protein Reference Database (HPRD) (
33) and the Biological General Repository for Interaction Datasets (BioGRID,
http://thebiogrid.org/). All except TSC1-AXIN1 were reported in mouse brain tissue. We also identified an additional 21 interactions, previously demonstrated by
in vitro co-purifications or Y2H screens (
Table S3).
To verify that the syndromic ASD proteins are co-expressed with their binding partners
in vivo, we analyzed microarray data from our previous studies of brain tissue from wild-type mice (
34,
35). We observed strong correlations among expression profiles for genes in the network, but not for similarly-sized sets of randomly selected probes on the microarrays (p<0.001, ). Further, the majority of the genes that encode the proteins annotated in the network co-clustered into a dominant co-expression group in the hypothalamus (78%), cerebellum (78%), and amygdala (55%) (). In conjunction with the physical interaction data (),
SHANK3,
TSC1,
ACTN1 and
HOMER3 showed highly correlated expression in these brain regions (
Table S4). Note that averaged correlation matrices of the genes in the three brain regions did not show such strong correlation, suggesting that subsets of the ASD-associated proteins and their binding partners may enjoy unique relationships in different brain regions, rather than be ubiquitously co-expressed (). Importantly, 96% of the proteins identified in our primary screen were found to be expressed in brain in the mouse studies, a substantially greater proportion than the expected 59% for randomly sampled genes (p< 1×10
−10).
To understand the unique topology (i.e. the pattern of interconnections) of the protein interaction network and systematically assess the connectivity of syndromic ASD proteins, we incorporated literature-curated interaction data for both bait and prey proteins from the HPRD and the BioGRID. This allowed us to produce an extended network consisting of one component with 3,507 proteins connected through 6,881 interactions (). Of the 35 bait proteins that were used in the Y2H screen, 34 were directly or indirectly connected inside this network; only one protein (SLC6A8) was not connected. Next we calculated the mean path length in the extended network for eight syndromic ASD proteins (
Tables S1 and S2) that were in the experimental network (), and we compared it with the distribution of mean path lengths for 8 randomly sampled proteins selected from the remaining 18 of 26 bait proteins that had at least one binding protein in the primary screen (
Table S2). We performed 10,000 random draws of 8 proteins. The eight syndromic ASD proteins showed a significantly shorter mean path length (2.14) than the random samples from the remaining baits (mean of 2.78, p = 0.004,
Fig. S2). The close connectivity of the eight syndromic ASD proteins led us to investigate whether different ASD proteins might share common molecular pathways that relate to the pathogenesis of ASD. Indeed, Gene Ontology (GO) analysis of the network (excluding all of the baits) revealed marked enrichment for proteins associated with synapse, postsynaptic density and cytoskeleton under the “Cellular Component” branch of the GO (
Fig. S3A), and for small GTPase-mediated signaling and metabotropic glutamate receptor signaling under the “Biological Process” GO branch (
Fig. S3B); a biological process describes a series of molecular events or functions. Such coherence between the cellular compartments and biological processes of the ASD baits and their interactome partners underscores the biological value of the network. It also points to key molecular pathways responsible for autistic phenotypes in distinct genetic syndromes and to the biological relevance of the network.
It remained to be determined whether a protein interaction network built on syndromic ASD proteins would prove relevant to the pathogenesis of non-syndromic or idiopathic ASDs. To address this question, we collected information from published studies on copy number variations (CNVs) that were observed in normal populations or in non-syndromic ASD patients (
36,
37). We then searched for genes that were annotated both in our network and in the intervals of CNVs found in normal individuals or ASD patients. Individuals from the ASD group showed an increased rate of CNVs spanning genes in the Y2H interactome compared to the control group by a factor of 2.4 (incidence 0.43 vs. 0.18, p < 1.13 × 10
−23; two-sided Fisher’s Exact Test, “Core”, ) with an odds ratio of 3.3. Conversely, there was a lower rate of individuals in the ASD group whose CNVs failed to encompass genes in the network, i.e., fewer ASD CNVs mapped to loci encoding non-network proteins (0.25 vs. 0.39 for the control group, p = 6.16 × 10
−8; “Non-network protein” in ). We also observed a higher rate of overlap between genes encoding proteins in the extended network for ASD than for Control (0.70 vs. 0.565 in frequency, “Extended”, ).
To consider both the connectivity of the network as well as the multi-CNV load in each individual, we computed an additional measure that we defined as the Network Connectivity Score. This score represents the sum of the number of connections in the network of all genes present in CNVs of each individual; the score therefore takes into consideration the contribution of genes by their network relevance. This score was also significantly higher in ASDs vs. control (p < 2.2 × 10
−16, Wilcoxon’s rank sum test, ). We mapped the chromosomal locations of the network genes that overlapped with the CNV regions in ASD patients ( and
S4). The genes overlapped by CNVs were widely distributed throughout the genome, indicating that our findings were not dominated by hot spots for structural variation in ASDs (). Interestingly, however, three network genes (MVP, KCTD13 and ALDOA), mapped to the recurrent hot spot of the CNVs in human chromosome 16p11.2, have been reported for ASD and schizophrenia patients in genome-wide hybridization studies (
11,
38)(
Table S2 and
Fig. S4).
To explore the role of genes in the interaction network in idiopathic ASDs, we performed microarray-based comparative genome hybridization (CGH) for 627 genes in our network using genomic DNA from 288 relatively high-functioning individuals (average IQ 80.94) with a diagnosis of idiopathic ASD (i.e., non-syndromic autism) from the Simons Foundation Simplex Collection (
39). These probands do not show any signs of syndromic disorders (systemic malformation, abnormal facies, or severe intellectual disability) on physical examination or brain scanning. We focused on events with large segmental duplications or deletions spanning over 10 kb. This analysis revealed a segmental duplication in chromosome Xq28 involving
FLNA and three segmental deletions in chromosomes 15q13.3, 16q23.3-q24.1, and 14q13.3, which involved the
PKM2,
NECAB2 and
MIPOL1 genes, respectively (). CGH analyses of the DNA from the parents of the probands confirmed that the duplication of Xq28 (
FLNA), deletions of 15q13.3 (
PKM2) and 16q23.3 (
NECAB2) were all
de novo, whereas the deletion of 14q13.3 (
MIPOL1) was maternally inherited (
Fig. S5A–D and
Table S5). Duplications and point mutations of
FLNA cause various degrees of intellectual disability, periventricular heterotopia (a disorder caused by abnormal migration of neurons), and dysmorphic features, but to our knowledge have not been associated with an autistic phenotype in the absence of intellectual deficits (
40). By identifying a
de novo duplication of
FLNA in a patient with an IQ of 109 and autism, we broaden the clinical spectrum of phenotypes associated with such duplications. Furthermore, our discovery that FLNA binds to SHANK3 (mutations in which cause syndromic ASD) using both Y2H assays (
Table S2) and co-immunoprecipitation (
Fig. S6) in mouse brain extracts validated the physical interaction of these proteins and shows that both syndromic and nonsyndromic ASDs are functionally linked.
None of the autosomal deletion events were observed in the clinical database of the Molecular Genetics Laboratory at Baylor College of Medicine (BCM), which houses samples from over 15,000 patients with dysmorphology (abnormal form or anatomy) or intellectual and developmental disability screened with a high-density oligonucleotide clinical chromosomal microarray. The BCM clinical array has much greater genome-wide coverage and was able to delineate precise boundaries of the initial CNV findings in our experimental cohort. In the clinical database, only 3 males and 1 carrier mother had the duplication event of Xq28 that involves
FLNA but not
MECP2, the gene involved in the neurological disorder, Rett syndrome (
14,
41). All male patients had developmental disabilities and cognitive deficits, but the female patient was asymptomatic (as expected for an X-linked defect). Thus, all four segmental CNVs confirmed in this study were extremely rare structural variations rather than polymorphic events. We turned to an additional study (
42) to examine a set of cognitively normal control individuals for these events using an extremely high-density tiling array. There were no deletion events covering the three genes (
PKM2,
NECAB2 and
MIPOL1) or the
FLNA duplication event in these control individuals. Furthermore, there were no deletion events overlapping these 3 deleted genes nor duplication events overlapping with
FLNA loci among the CNV data from 1,500 controls (
36) that we used for .