|Home | About | Journals | Submit | Contact Us | Français|
To uncover shared pathogenic mechanisms among the highly heterogeneous autism spectrum disorders (ASDs), we developed a protein interaction network that identified hundreds of new interactions among proteins encoded by ASD-associated genes. We discovered unexpectedly high connectivity between SHANK and TSC1, previously implicated in syndromic autism, suggesting that common molecular pathways underlie autistic phenotypes in distinct syndromes. ASD patients were more likely to harbor CNVs that encompass network genes than control subjects. We also identified, in patients with idiopathic ASD, three de novo lesions (deletions in 16q23.3 and 15q22 and one duplication in Xq28) that involve three network genes (NECAB2, PKM2, and FLNA). The protein interaction network thus provides a framework for identifying causes of idiopathic autism and for understanding molecular pathways that underpin both syndromic and idiopathic ASDs.
Autism spectrum disorders (ASDs) are a heterogeneous group of neuro-developmental disorders with three core features: impaired social skills (e.g., gaze avoidance), delayed language development, and repetitive or stereotyped behaviors (1–4). “Classic” or idiopathic autism principally involves these three features; “syndromic” ASDs are those disorders in which the autistic phenotype is one aspect of a much broader clinical syndrome. In Tuberous Sclerosis Complex (TSC), for example, which is caused by a single gene mutation in either TSC1 or TSC2, the core autistic phenotype is accompanied by seizures, developmental delay, cortical tubers, facial angiofibromas, and other skin lesions (5, 6). Similarly, Phelan-McDermid syndrome (PMS) is caused by microdeletions of chromosome 22q13.3 encompassing the SHANK3 gene (encoding SH3 and Ankyrin domain containing protein), and is characterized by general hypotonia, seizures, intellectual disability and ASD (7). Autistic features are commonly observed in Fragile X syndrome, Angelman symdome, phosphatase and tensin homolog (PTEN) hamartoma, Rett syndrome, Timothy syndrome, and in individuals with Neuroligin mutations (Table S1) (8). The majority (85–90%) of ASD cases do not show such clinically distinct phenotypes and the genetic causes remain largely unknown; in these cases the ASD is considered “non-syndromic”.
Although approximately 50 genes or genomic variants that either cause or predispose individuals to ASDs have been identified (1–4), each accounts for no more than 0.5–2% of total ASD cases (9–11), and many of these correspond to the syndromic ASDs. Together, these genes and genomic variants account for at most 30% of all ASD cases (1, 12, 13). For non-syndromic ASD, copy number variations (CNV) of submicroscopic DNA segments may prove more relevant; growing numbers of ASD-susceptibility loci have been reported based on genome-wide association studies.
To make the challenge of understanding the pathogenesis of ASDs even more daunting, the known ASD-related proteins (including those mapping within CNVs) span diverse categories, from transcription factors (4, 14) and RNA-binding proteins (15) to cell adhesion molecules (16) and enzymes involved in protein modification (2, 8, 13) and degradation (12, 17). Given the clinical heterogeneity of ASDs, it would not be surprising if mutations in hundreds or even thousands of genes cause ASD phenotypes—but might these genes converge on a few pathways?
We hypothesized that, if this were actually the case, it would best be revealed by a protein-protein interaction analysis of the existing autism-associated genes. Several years ago, when confronting a slightly different challenge with another group of neurological disorders, the inherited ataxias, we created an “ataxia interactome”. Using this interactome we identified interactors of ataxia-associated proteins (18) and mapped their inter-relationships, uncovering unexpected functional relationships (19–21). Lacking a unifying neuropathology for the ASDs, we decided for the present study to rely upon key phenotypic features to create a protein-protein interaction network that would reveal functional relationships between the gene products. Uncovering functional relationships among diverse ASD-related proteins is a first step towards the ultimate goal of developing therapies that might benefit multiple functionally or mechanistically related ASDs.
We began by identifying protein partners of ASD-associated proteins (1–3) and determining whether any of them interacted with other ASD proteins. We selected genes from three groups: those whose mutation results in syndromic ASDs (“syndromic ASD proteins”, Table S1), those whose mutation causes severe language delay, and those whose products are paralogs, known binding partners, or functionally related to syndromic ASD proteins. We refer to the second and third groups as “ASD-associated” genes in order to distinguish them from the first group, syndromic ASD proteins (Table S1). Although language delay can occur without accompanying autistic features, and the reasons for language delay may be as heterogeneous as the ASDs themselves, we reasoned that language development might depend on the same pathways as those involved in social communication deficits observed in ASDs.
We performed a yeast-two hybrid (Y2H) screen of a human cDNA library using 192 bait fragments for 35 gene products, each of which encoded either full-length or partial segments of coding sequences. After a series of stringent tests (18, 22), we obtained 7,933 interacting prey clones, which belonged to 783 unique proteins; 539 passed a second round of testing in yeast to demonstrate interactions held up in an independent reconstitution system (22). We considered only these 539 proteins as candidate binary interacting partners of the bait proteins. These 539 proteins comprised 848 interactions with 26 syndromic ASD proteins or ASD-associated proteins (Tables S1 and S2). Among them, only 32 interactions (4%) were previously reported (Tables S2 and S3). Baits for nine proteins failed to identify definite binding partners in this stringent Y2H screen (Table S1).
To validate the interaction data in a mammalian system, we performed glutathione-sepharose affinity co-purifications (GST-APs) in HEK293T cells for 52 randomly selected interactions (6% of the total) (Fig. S1). The mammalian cells recapitulated 44 out of 52 interactions (85%) (Fig. S1 and Table S2). In general, bona fide binary interactions do not validate at more than 50% (23–25) when using different assay systems; our unusually high validation rate thus supports the reliability of our stringent screening methods compared to prior interactomes (18, 22, 26). We did not remove candidate pairs from the screening data (Table S2) when they failed in the validation assays, because the Y2H screen is excellent at detecting transient and biologically relevant interactions that are often difficult to recapitulate in co-immunoprecipitation and affinity purification systems (18, 22–25).
Using the interaction data from the Y2H screen (Table S2), we generated an ASD protein interaction network (Fig. 1A). Twenty-four out of the 26 syndromic ASD or associated proteins were interconnected in one major component (CDKL5 and NF1 were located outside) (Fig. 1A). The Fragile X-related proteins 1 and 2 (FXR1 and FXR2) showed the greatest connectivity, and were also connected to the Fragile X mental retardation protein (FMRP) (Fig. 1B). The ASD network thus recapitulates and expands previously described in vivo associations and functional relationships among the Fragile X-related proteins (27, 28). Similarly, we expanded previously established relationships among the tuberous sclerosis complex proteins 1 and 2 (TSC1 and TSC2) (29) by identifying four new partners (Fig. 1A and Table S2). We also confirmed that the post-synaptic proteins SHANK3 and PSD95 interact in vivo (30, 31) and identified nine shared partners between them (ACTN2, CLU, DLGAP1, DLGAP3, DLGAP4, HNRNPC, LZTS2, PICK1, and SYNGAP1 Fig. 1C).
Notably, our ASD interactome revealed previously unsuspected connectivity between two syndromic ASD proteins, SHANK3 and TSC1, which share at least 21 partners (Fig. 2A). SHANK1, the paralog of SHANK3, arose as a potential partner of both TSC1 and SHANK3, which suggested that these syndromic ASD proteins interact in a complex at the postsynaptic compartments—the microenvironment of dendritic spines beneath the specialized membrane structure of the synapse (3, 32) —a suggestion confirmed in vivo by co-immunoprecipitation using mouse brain extracts (Fig. 2C and 2D). In vivo studies further confirmed that ACTN1, a postsynaptic scaffold protein identified as a partner of SHANK3 and TSC1 (Table S2 and Fig. 2A), interacts with TSC1 as well as with the postsynaptic scaffolding proteins SHANK3 and HOMER3 (Fig. 2E). Further, the Y2H screen recapitulated eleven previously reported in vivo interactions (Tables S2 and S3), which we obtained from the Human Protein Reference Database (HPRD) (33) and the Biological General Repository for Interaction Datasets (BioGRID, http://thebiogrid.org/). All except TSC1-AXIN1 were reported in mouse brain tissue. We also identified an additional 21 interactions, previously demonstrated by in vitro co-purifications or Y2H screens (Table S3).
To verify that the syndromic ASD proteins are co-expressed with their binding partners in vivo, we analyzed microarray data from our previous studies of brain tissue from wild-type mice (34, 35). We observed strong correlations among expression profiles for genes in the network, but not for similarly-sized sets of randomly selected probes on the microarrays (p<0.001, Fig. 3A). Further, the majority of the genes that encode the proteins annotated in the network co-clustered into a dominant co-expression group in the hypothalamus (78%), cerebellum (78%), and amygdala (55%) (Fig. 3B). In conjunction with the physical interaction data (Fig. 2B–D), SHANK3, TSC1, ACTN1 and HOMER3 showed highly correlated expression in these brain regions (Table S4). Note that averaged correlation matrices of the genes in the three brain regions did not show such strong correlation, suggesting that subsets of the ASD-associated proteins and their binding partners may enjoy unique relationships in different brain regions, rather than be ubiquitously co-expressed (Fig. 3B). Importantly, 96% of the proteins identified in our primary screen were found to be expressed in brain in the mouse studies, a substantially greater proportion than the expected 59% for randomly sampled genes (p< 1×10−10).
To understand the unique topology (i.e. the pattern of interconnections) of the protein interaction network and systematically assess the connectivity of syndromic ASD proteins, we incorporated literature-curated interaction data for both bait and prey proteins from the HPRD and the BioGRID. This allowed us to produce an extended network consisting of one component with 3,507 proteins connected through 6,881 interactions (Fig. 4A). Of the 35 bait proteins that were used in the Y2H screen, 34 were directly or indirectly connected inside this network; only one protein (SLC6A8) was not connected. Next we calculated the mean path length in the extended network for eight syndromic ASD proteins (Tables S1 and S2) that were in the experimental network (Fig. 1A), and we compared it with the distribution of mean path lengths for 8 randomly sampled proteins selected from the remaining 18 of 26 bait proteins that had at least one binding protein in the primary screen (Table S2). We performed 10,000 random draws of 8 proteins. The eight syndromic ASD proteins showed a significantly shorter mean path length (2.14) than the random samples from the remaining baits (mean of 2.78, p = 0.004, Fig. S2). The close connectivity of the eight syndromic ASD proteins led us to investigate whether different ASD proteins might share common molecular pathways that relate to the pathogenesis of ASD. Indeed, Gene Ontology (GO) analysis of the network (excluding all of the baits) revealed marked enrichment for proteins associated with synapse, postsynaptic density and cytoskeleton under the “Cellular Component” branch of the GO (Fig. S3A), and for small GTPase-mediated signaling and metabotropic glutamate receptor signaling under the “Biological Process” GO branch (Fig. S3B); a biological process describes a series of molecular events or functions. Such coherence between the cellular compartments and biological processes of the ASD baits and their interactome partners underscores the biological value of the network. It also points to key molecular pathways responsible for autistic phenotypes in distinct genetic syndromes and to the biological relevance of the network.
It remained to be determined whether a protein interaction network built on syndromic ASD proteins would prove relevant to the pathogenesis of non-syndromic or idiopathic ASDs. To address this question, we collected information from published studies on copy number variations (CNVs) that were observed in normal populations or in non-syndromic ASD patients (36, 37). We then searched for genes that were annotated both in our network and in the intervals of CNVs found in normal individuals or ASD patients. Individuals from the ASD group showed an increased rate of CNVs spanning genes in the Y2H interactome compared to the control group by a factor of 2.4 (incidence 0.43 vs. 0.18, p < 1.13 × 10−23; two-sided Fisher’s Exact Test, “Core”, Fig. 4B) with an odds ratio of 3.3. Conversely, there was a lower rate of individuals in the ASD group whose CNVs failed to encompass genes in the network, i.e., fewer ASD CNVs mapped to loci encoding non-network proteins (0.25 vs. 0.39 for the control group, p = 6.16 × 10−8; “Non-network protein” in Fig. 4B). We also observed a higher rate of overlap between genes encoding proteins in the extended network for ASD than for Control (0.70 vs. 0.565 in frequency, “Extended”, Fig. 4B).
To consider both the connectivity of the network as well as the multi-CNV load in each individual, we computed an additional measure that we defined as the Network Connectivity Score. This score represents the sum of the number of connections in the network of all genes present in CNVs of each individual; the score therefore takes into consideration the contribution of genes by their network relevance. This score was also significantly higher in ASDs vs. control (p < 2.2 × 10−16, Wilcoxon’s rank sum test, Fig. 4C). We mapped the chromosomal locations of the network genes that overlapped with the CNV regions in ASD patients (Fig. 4B and S4). The genes overlapped by CNVs were widely distributed throughout the genome, indicating that our findings were not dominated by hot spots for structural variation in ASDs (Fig. 4B). Interestingly, however, three network genes (MVP, KCTD13 and ALDOA), mapped to the recurrent hot spot of the CNVs in human chromosome 16p11.2, have been reported for ASD and schizophrenia patients in genome-wide hybridization studies (11, 38)(Table S2 and Fig. S4).
To explore the role of genes in the interaction network in idiopathic ASDs, we performed microarray-based comparative genome hybridization (CGH) for 627 genes in our network using genomic DNA from 288 relatively high-functioning individuals (average IQ 80.94) with a diagnosis of idiopathic ASD (i.e., non-syndromic autism) from the Simons Foundation Simplex Collection (39). These probands do not show any signs of syndromic disorders (systemic malformation, abnormal facies, or severe intellectual disability) on physical examination or brain scanning. We focused on events with large segmental duplications or deletions spanning over 10 kb. This analysis revealed a segmental duplication in chromosome Xq28 involving FLNA and three segmental deletions in chromosomes 15q13.3, 16q23.3-q24.1, and 14q13.3, which involved the PKM2, NECAB2 and MIPOL1 genes, respectively (Fig. 5A–D). CGH analyses of the DNA from the parents of the probands confirmed that the duplication of Xq28 (FLNA), deletions of 15q13.3 (PKM2) and 16q23.3 (NECAB2) were all de novo, whereas the deletion of 14q13.3 (MIPOL1) was maternally inherited (Fig. S5A–D and Table S5). Duplications and point mutations of FLNA cause various degrees of intellectual disability, periventricular heterotopia (a disorder caused by abnormal migration of neurons), and dysmorphic features, but to our knowledge have not been associated with an autistic phenotype in the absence of intellectual deficits (40). By identifying a de novo duplication of FLNA in a patient with an IQ of 109 and autism, we broaden the clinical spectrum of phenotypes associated with such duplications. Furthermore, our discovery that FLNA binds to SHANK3 (mutations in which cause syndromic ASD) using both Y2H assays (Table S2) and co-immunoprecipitation (Fig. S6) in mouse brain extracts validated the physical interaction of these proteins and shows that both syndromic and nonsyndromic ASDs are functionally linked.
None of the autosomal deletion events were observed in the clinical database of the Molecular Genetics Laboratory at Baylor College of Medicine (BCM), which houses samples from over 15,000 patients with dysmorphology (abnormal form or anatomy) or intellectual and developmental disability screened with a high-density oligonucleotide clinical chromosomal microarray. The BCM clinical array has much greater genome-wide coverage and was able to delineate precise boundaries of the initial CNV findings in our experimental cohort. In the clinical database, only 3 males and 1 carrier mother had the duplication event of Xq28 that involves FLNA but not MECP2, the gene involved in the neurological disorder, Rett syndrome (14, 41). All male patients had developmental disabilities and cognitive deficits, but the female patient was asymptomatic (as expected for an X-linked defect). Thus, all four segmental CNVs confirmed in this study were extremely rare structural variations rather than polymorphic events. We turned to an additional study (42) to examine a set of cognitively normal control individuals for these events using an extremely high-density tiling array. There were no deletion events covering the three genes (PKM2, NECAB2 and MIPOL1) or the FLNA duplication event in these control individuals. Furthermore, there were no deletion events overlapping these 3 deleted genes nor duplication events overlapping with FLNA loci among the CNV data from 1,500 controls (36) that we used for Fig. 4B.
We have developed a protein interaction network for ASD-causing and ASD-associated proteins (altogether ASD proteins). We began by identifying protein partners of ASD proteins (1, 2, 8, 13) and determining whether any interact with other ASD proteins. Although the majority of ASD-related proteins identified cause syndromic ASDs, we reasoned that converging pathways might provide insight into non-syndromic or idiopathic ASDs. Indeed, our interactome revealed an unexpected convergence around several proteins. Although the interactome validated several interactions that had been previously identified in the literature, it did not identify others. There are several reasons for this: the screen was not done to saturation; the interactome identifies direct interactions only, unlike co-immunoprecipitation which identifies direct and indirect interactions; protein interactions vary according to tissue; and our system is relatively low-expressing. Thus we are more likely to have false negatives (i.e., missed valid interactions) than false positives. The large number of validated interactions, however, demonstrates that this interactome is a robust framework for future studies exploring relationships among ASD proteins. Further, the significant overlap between CNVs from a previously published ASD cohort and our experimental network supports its utility as a platform for ASD gene discovery. Finally, the network can also shed light on other questions in ASD. For example, we found only one overlapping gene (CBS) on chromosome 21, despite evidence that Down’s syndrome (caused by trisomy of chromosome 21) shows some comorbidity with ASDs. This finding suggests that the ASD phenotype in Down’s syndrome may differ from the phenotypes of other non-syndromic ASDs.
An important aspect of the interactome are the connections it reveals between syndromic ASD-causing proteins and ASD-associated proteins in one network. Higher connectivity between two ASD-causing proteins suggests that they might interact in a protein complex or that they might function in a common molecular pathway. In the present study, we uncovered in vivo interactions between the TSC1 and SHANK proteins. Further exploration of such interactions should shed light on the pathogenic mechanisms underlying the autistic features observed in TSC (8, 43) and PMS (7, 44–47). The TSC1 and TSC2 proteins regulate mTOR, a promoter of protein synthesis in response to growth factors and stress (48); upregulation of protein synthesis due to functional loss of either of the TSC proteins is a likely mechanism for the ASD phenotypes of both disorders (8). The molecular mechanism that specifies the location of abnormal protein synthesis, however, remains to be determined. Our interactome suggests that these two distinct pathways are brought together in one protein complex built upon the postsynaptic protein scaffold, SHANK3.
Links between SHANK3 and other ASD proteins lend support to the idea that common pathways lead to the broader ASD phenotypes (3, 4, 8). In vivo studies of mice carrying mutant alleles of Shank3 and displaying autism-like phenotypes highlight the importance of the SHANK3 protein for maintaining the normal levels of many synaptic proteins that are critical for glutamatergic signaling (44, 45, 47). Recently reported pathogenic mutations in the SHANK2 gene in sporadic ASD cases (9, 49) further support our notion that various ASD-associated proteins, including FLNA, functionally interact with the SHANK proteins. These findings have therapeutic implications. Benefits reported from preclinical trials using rapamycin in Pten or Tsc2 mutant animals (50, 51) raise the possibility that such therapies might be beneficial in broader groups of patients with syndromic and non-syndromic ASDs.
In summary, we developed an ASD interactome that facilitates classification of ASDs according to functional pathways and interacting proteins. The short-term utility of this interactome will be to increase our molecular diagnostic capabilities with a raft of new genes. The mid-term and long-term benefits will come from advancing pathogenesis studies to promote development of rationally designed therapeutics that could be used to treat more than one ASDs.
All cDNAs encoding full-length or partial domains of bait proteins (Table S1) were cloned into DB-dest vectors using the Gateway system (Invitrogen) as described (18, 22). The constructs encoding each DB-autism fusion protein were transfected into yeast cells, MaV203 (Mat-alpha), and screened for positive interactions from 1–2 × 106 independent clones in human brain cDNA libraries (Invitrogen).
Whole mouse brains were freshly homogenized in buffer containing 320 mM sucrose, 5 mM HEPES (pH 7.4) and 1 mM EDTA as described (52). One mg of total protein from cytosolic (S2) fractions was incubated at 4°C for 2 hr with either anti-pan-SHANK (5 μl of antisera), anti-TSC1 (1 μl), anti-ACTN1 (5 μl) antibodies, or normal rabbit IgG (5 μg) in the TS buffer (150 mM NaCl, 10 mM Tris-HCl, pH 7.4) supplemented with 1% hemoglobin. Immune complexes were precipitated with prewashed Protein A agarose beads (Millipore) by incubating at 4°C for another 1 hr, and were washed in the TS buffer 5 times and extracted with SDS-loading buffer.
We determined the exon coordinates of the genes that were annotated in the protein interaction network, then selected a total of ~42K oligonucleotide probes from Agilent’s open resource library to design a custom microarray for CGH studies. Using the custom 4×44K chips, we performed DNA digestion, labeling and hybridization according as previously described (53, 54). Since the CNV regions for NECAB2, PKM2, and FLNA exceeded the targeted gene loci, we used the BCM MGL clinical array (CMA BAC V8.1, Baylor MGL) to determine boundaries of the events (53). All the intervals of CNV events and other information referred to in this study were defined using the assembly of NCBI36/hg18 at UCSC genome browser (http://genome.ucsc.edu/cgi-bin/hgGateway).
We used Homologene (www.ncbi.nlm.nih.gov/homologene) release 6.1 mapping human genes to their mouse orthologs. Of the 593 in the network we were able to map 478 (81%) to unique mouse genes. We then obtained gene expression data from wildtype mouse hypothalamic, cerebellum and amygdala samples using data from prior studies (34, 35). We computed Spearman’s rank correlation coefficient for all network gene pairs and determined the median. To compare our results with a random set of genes, we sampled 1,000 iterations of an equal number of random genes from the array and computed the same median Spearman correlation measure. The distributions for the median correlation values for random genes are plotted for each brain region. The median values for our mapped network proteins are shown with three vertical lines. In addition, we computed the correlation heat-map between all pairs of genes, and we sorted the genes according to cluster analysis based on pair-wise correlations.
We obtained CNV all variant data from dbVAR (www.ncbi.nlm.nih.gov/dbvar/) for ASD (37) and controls (36), which contained CNV data from 419 and 1,552 samples, respectively. We calculated the overlap with the protein interaction network for each CNV in terms the network genes, the network node degree, and network components (Bait, Core, or Extended). Data were processed to show overlapping events per individual. We computed two summaries: overlap indicators for each person with the components of the interaction network (Bait, Core, and Extend) and Network Connectivity Score defined as log2 of the sum of network degree for overlapped genes. Individuals with no network overlap were treated as missing values for the Connectivity Score.
Further details for data analysis are provided in Supplementary Materials.
Fig S1. Validation of the interaction data from the Y2H study by co-affinity purification experiments in HEK293T cell
Fig S2. Syndromic ASD proteins are highly connected to each other and have the shortest path length in the protein interaction network
Fig S3. The GO analysis for the extended ASD network
Fig S4. Chromosomal location of the overlapping genes between the protein interaction network and CNV intervals of ASD group
Fig S5. Confirmation of de novo or inherited CNVs
Fig S6. Physical interaction of FLNA with SHANK3 in vivo
Table S1. Summary of yeast-two-hybrid screening for ASD and related disorders
Table S2. The summary of protein interaction data obtained from Y2H screening and Co-AP study
Table S3. Summary of known interaction data in the literature and the overlap with this study
Table S4. Co-expression pattern of the genes in the mouse hypothalamus, cerebellum and amygdala
Table S5. Summary of the Probands with Novel CNVs
We thank L. White, L. Liles and M. Hoang at BCM microarray core; L. Lewis at BCM-GSC; and BCM-MGL for the aCGH study; A. McCall, D. Walker, M. Strivens and M. Rao for technical assistance; M. Sheng, W. Dobyns, D. Picketts, R. Gibbons, S. Dindot, A. Beaudet, D. Nelson, S-K. Lee, B. Franco, and I. Bezprozvanny for antisera and expression constructs; M. Sardiello, C. Schaaf, M. Costa-Mattioli and J. Neul for critical comments; C. Schaaf for computing IQ averages of the SSC patients used in this study; V. Brandt for comments on the manuscript; and H.Y.Z. lab members for helpful discussions.
Funding: Supported by the HHMI (H.Y.Z.), the Simons Foundation (H.Y.Z.), and the Ellison Foundation (D.E.H.; awarded to M. Vidal).
Author contributions: Y.S., C.A.S. and H.Y.Z. designed the study, evaluated the data, and wrote the manuscript. Y.S. performed all the experiments with technical support for the Y2H screen by Z.A-M. C.A.S., B.C.D and D.V.D conducted the bioinformatic analysis. D.E.H. provided the ORFeome clone and edited the manuscript.