|Home | About | Journals | Submit | Contact Us | Français|
How disease-associated mutations impair protein activities in the context of biological networks remains mostly undetermined. Although a few renowned alleles are well characterized, functional information is missing for over 100,000 disease-associated variants. Here we functionally profile several thousand missense mutations across a spectrum of Mendelian disorders using various interaction assays. The majority of disease-associated alleles exhibit wild-type chaperone binding profiles, suggesting they preserve protein folding or stability. While common variants from healthy individuals rarely affect interactions, two-thirds of disease-associated alleles perturb protein-protein interactions, with half corresponding to “edgetic” alleles affecting only a subset of interactions while leaving most other interactions unperturbed. With transcription factors, many alleles that leave protein-protein interactions intact affect DNA binding. Different mutations in the same gene leading to different interaction profiles often result in distinct disease phenotypes. Thus disease-associated alleles that perturb distinct protein activities rather than grossly affecting folding and stability are relatively widespread.
Over a hundred thousand genetic variants have been identified across a large number of Mendelian disorders (Amberger et al., 2011), complex traits (Hindorff et al., 2009), and cancer types (Chin et al., 2011). However, many fundamental questions regarding genotype-phenotype relationships remain unresolved (Vidal et al., 2011). One critical challenge is to distinguish causal disease mutations from non-pathogenic polymorphisms. Even when causal mutations are identified, the functional consequence of such mutations is often elusive (Sahni et al., 2013).
Genotypic information alone rarely elucidates the mechanistic insights pertaining to disease pathogenesis. Although genotype-phenotype relationships can be modeled under the assumption that most disease-associated mutations lead to complete loss of protein function, e.g. through radical changes such as protein misfolding and instability (Subramanian and Kumar, 2006) (Figure 1A), the reality is often more complex, as in the case of mutations affecting the same gene but giving rise to clinically distinguishable diseases (Zhong et al., 2009). In addition, since genes and gene products do not function in isolation but interact with each other in the context of interactome networks (Vidal et al., 2011), it is likely that many diseases result from perturbations of such complex networks (Goh et al., 2007).
Missense mutations are among the most common sequence alterations in Mendelian disorders, accounting for more than half of all reported mutations in the Human Gene Mutation Database (HGMD) (Stenson et al., 2014). In principle, missense mutations may have no functional consequences, disrupt the three-dimensional structure of the corresponding protein, or exert specific effects on particular molecular or biochemical interactions (Figure 1A), such as protein-protein interactions (PPIs), protein-DNA interactions (PDIs), or enzyme-substrate interactions, while leaving all other functional properties unperturbed. We previously reported that a considerable portion of Mendelian disease mutations could indeed be predicted computationally to cause interaction-specific, or “edgetic”, perturbations (Zhong et al., 2009). However, only a small number of genes and associated mutations were experimentally tested in that study, and the extent to which disease mutations globally lead to interaction perturbations remains to be determined.
Here we describe a multi-pronged approach to systematically decipher molecular interaction perturbations associated with missense mutations. Since chaperones and associated quality control factors (QCFs) can salvage unstable proteins by assisting with folding, and an increase in protein-chaperone interactions (PCIs) has been observed for a number of disease mutants (Whitesell and Lindquist, 2005), our systematic approach begins with characterizing PCIs for large numbers of disease-associated alleles, followed by systematic measurements of PPI and PDI profile changes caused by mutations, a strategy referred to as “edgotyping” (Figure 1B).
We provide evidence for widespread interaction perturbations across a broad spectrum of human Mendelian disorders. Our results suggest that interaction profiling helps distinguish disease-causing mutations from common variants. Furthermore, the integration of different types of molecular interactions expands our ability to understand complex genotype-phenotype relationships.
To globally characterize disease-associated alleles, we selected mutations associated with a wide range of disorders, including cancer susceptibility and heart, respiratory and neurological diseases. We retrieved from HGMD (Stenson et al., 2014) a list of ~16,400 mutations affecting over 1,200 genes for which we have a wild-type (WT) open-reading frame (ORF) clone in our human “ORFeome” collection (Yang et al., 2011) and selected up to four mutations per gene (Figure 1C; Tables S1A–B; Extended Experimental Procedures). Using properties related to RNA abundance, GO annotation and protein domains (Extended Experimental Procedures), we verified there is no significant bias between our selected genes and the rest of the human genome or all genes represented in HGMD (Figures S1B–G).
Altogether, we cloned and sequence-verified 2,890 human mutant ORFs (hmORFs), each harboring a single nucleotide change that results in an amino acid change relative to the corresponding WT ORF of 1,140 genes. To our knowledge, this human mutation ORFeome version 1.1 resource (hmORFeome1.1; Figure S1A) is the most extensive human mutation collection reported to date.
Using enhanced binding to a chaperone as an indicator of protein instability or misfolding, we examined how disease mutations impact protein folding and disposition. We determined the extent to which hmORF-encoded proteins and their WT counterparts interact with QCFs using a quantitative high-throughput LUMIER assay (Taipale et al., 2012; Taipale et al., 2014) (Figure 1C and Table S2A). We selected the following QCFs based on their broad specificity (Taipale et al., 2014): i) the cytoplasmic chaperones HSP90 and HSC70, ii) their co-chaperones BAG2 and CHIP/STUB1, iii) the proteasomal regulatory subunit PSMD2 (formerly known as RPN1), and iv) the endoplasmic reticulum (ER) chaperones GRP78/BIP and GRP94 (Extended Experimental Procedures). We did not survey mitochondrial chaperones since only ~7% of disease-associated gene products are predicted to localize solely in mitochondria (Huntley et al., 2015).
Increased interaction between a QCF and mutant or WT protein, as measured by the LUMIER assay, indicates a mutation-induced perturbation in conformational stability, often associated with compromised or complete loss of function (Taipale et al., 2012). The interaction profiles of most mutant proteins correlated with their WT counterparts. However, compared to a background control set, we observed a significant enrichment of mutant alleles showing increased interaction with QCFs (Figures 2A–H and S2A) but little or no enrichment for decreased interaction (Figures 2A and S2B; Extended Experimental Procedures). The interaction profiles of mutant proteins with the different cytoplasmic QCFs were highly correlated, distinct from those with ER factors (Figure 2I). These results highlight the coordination and specificity of cellular quality control pathways. Altogether ~28% of the tested alleles exhibited increased binding to at least one of the seven QCFs tested. Although this fraction is likely a conservatively low estimate due to limited assay sensitivity, the strong correlation between chaperone interaction profiles (Figure 2I) suggests that the estimate would not increase substantially by assaying more chaperones. We validated several mutant-specific interactions with endogenous chaperones by co-immunoprecipitation followed by western blot, corroborating the results obtained with the LUMIER assay (Figure 2J).
We next estimated protein abundance using semi-quantitative ELISA, which provides a proxy for steady-state protein stability. Although the expression levels of mutant alleles correlated with their WT counterparts (Figure S2C), mutant proteins exhibiting enhanced interactions with cytoplasmic, but not ER, chaperones were detected at lower steady-state levels than their WT counterparts (P < 1.0 × 10−4; Figure 3A). This is possibly a result of retention in the ER of mutant proteins that would normally be secreted and therefore not be detected by an assay that captures intracellular proteins. Interestingly, recessive alleles exhibited lower protein abundance levels and increased binding with QCFs compared with proteins encoded by dominant alleles (Figure S2D–E). This is consistent with the hypothesis that recessive mutations are more likely to result in loss-of-function phenotypes than dominant mutations (Lesage and Brice, 2009).
To gain insight into the structural properties of mutant proteins that exhibit increased binding to QCFs, we assessed the impact of different disease mutations on predicted protein structures. The disease alleles associated with increased binding to QCFs corresponded significantly more often to mutations of residues buried in the core of the protein (Figure 3B and Table S1C), and less often to mutations in intrinsically disordered regions (Figure 3C) when compared to mutant proteins with no change in binding. Next, we estimated the relative “deleteriousness” associated with distinct genetic mutations using PolyPhen-2 algorithm (Adzhubei et al., 2010). Deleterious mutations predicted by PolyPhen were significantly enriched in alleles that exhibited increased binding to QCFs (Figure S2F).
Previous studies suggested that increased chaperone binding reflects a change in protein stability (Falsone et al., 2004; Taipale et al., 2012). To provide further evidence for this, we assessed protein stability in cellular lysates by measuring solubility in a cellular thermal shift assay (CeTSA). We found that the majority (5 of 6) of mutant proteins with increased chaperone binding also exhibited decreased stability as measured by CeTSA (Figures S3A–S3D). In addition, computational predictions by the FoldX program (Schymkowitz et al., 2005) suggest that mutant proteins with increased binding to QCFs are likely to be significantly less stable than their WT counterpart (Figure 3D and Table S2B). Taken together, experimental and computational analyses suggest that mutant proteins with enhanced binding to QCFs have a destabilized protein structure.
Our quantitative survey of allele-specific interactions estimates that the majority of missense disease mutations do not dramatically impact protein structure or folding (Tables S1D and S2). Therefore, they may exert their deleterious effects through other mechanisms such as perturbation of molecular interactions.
In principle, the effects of missense disease mutations on molecular interactions (Zhong et al., 2009), or “edgotype” (Sahni et al., 2013), could range from no apparent detectable change in interactions (“quasi-WT”), to partial loss of interactions (“edgetic”), to an apparent complete loss of interactions (“quasi-null”) (Figure 4A). To systematically characterize PPI perturbations associated with disease mutations and identify potential gain of interactions, we used the yeast two-hybrid (Y2H) interaction assay followed by a stringent validation assay. After autoactivator removal, we screened 2,449 mutant proteins and their 1,072 corresponding WT proteins for interactions with proteins encoded by the ~7,200 ORFs in the human ORFeome v1.1 (Rual et al., 2004). Mutant and WT proteins were then tested pairwise against all partners found both in these Y2H screens and in our human interactome map HI-II-14 (Rolland et al., 2014) (Figure 1C). Altogether, we obtained interaction profiles for 460 mutant proteins and their 220 WT counterparts and found 521 perturbed interactions out of 1,316 PPIs (Table S3A).
To validate these results, we used the orthogonal in vivo Gaussia princeps luciferase protein complementation assay (GPCA) performed in human 293T cells (Cassonnet et al., 2011) (Table S3B). Unperturbed interactions were recovered at a rate statistically indistinguishable from that of a well-documented positive reference set (PRS), similar to the interactions of the WT alleles (Braun et al., 2009; Venkatesan et al., 2009). Perturbed interactions were recovered at a rate as low as a negative control “random reference set” (RRS) (Figures 4B and S4A), demonstrating the high quality of the identified perturbations induced by disease mutations.
To analyze global and topological characteristics of gene products with edgetic, quasi-null, or quasi-WT mutations, we used the human interactome map HI-II-14 (Rolland et al., 2014). According to the studied network properties (betweenness, k-core centrality, degree, closeness), the nodes (genes) examined in our edgotyping study appear unbiased, in that their topological properties are statistically indistinguishable from other genes in the network (Figures S4B–F). Interestingly, we found that the genes carrying edgetic mutations tend to be more central than either non-edgetic genes or the rest of the network (Table S4).
Out of a total of 197 mutations, corresponding to 89 WT proteins with two or more interaction partners, our interaction profiling identified 26% as quasi-null alleles, 31% edgetic and 43% quasi-WT (Figure 4C and Table S3C). We also analyzed disease mutations annotated by ClinVar (Landrum et al., 2014), and found the distribution of quasi-null, edgetic and quasi-WT alleles was statistically indistinguishable from that of HGMD (Figure S4G). We only identified two mutations that conferred PPI gains, suggesting that gain of interactions may be a rare event in human disease.
Differences between edgotype classes could be due to protein folding and/or relative expression levels. Quasi-null proteins associated significantly more with cytoplasmic but not ER chaperones, whereas edgetic and quasi-WT proteins did not significantly change their chaperone association (Figures 4D–4E, and S5A–S5E). Quasi-null proteins appeared to be poorly expressed while edgetic and quasi-WT proteins were expressed at levels similar to those of their WT controls (Figure 4F). We validated several mutant-chaperone interactions and expression profiles by co-immunoprecipitation with endogenous chaperones, followed by western blot (Figure S5F). All tested quasi-null proteins exhibited more binding to HSP90 and HSC70, although they were expressed at lower levels than their WT controls. However, the edgetic TAT-P220S protein and the quasi-WT NCF2-R395W protein did not show any detectable chaperone association. Among mutant proteins with no change in chaperone binding, edgetic (28%) and quasi-WT (57%) proteins comprised the majority, while quasi-null proteins comprised a significantly lower percentage (15%) (Figure S5G). Altogether, these results suggest that quasi-null proteins are more often unstable/misfolded and diminished in their steady-state expression levels. In contrast, edgetic and quasi-WT proteins likely exhibit normal folding and expression levels, further supporting the idea that they may cause disease through interaction perturbations or other mechanisms rather than simple loss of protein function.
Genome-wide association studies have identified hundreds of loci linked to particular disorders. However, these loci often contain several genes and multiple variants, making it challenging to distinguish causal mutations from non-pathogenic variants. We observed previously that among binary interactions found by WT proteins, disease-causing alleles were more likely to perturb interactions than non-disease variants (Rolland et al., 2014). We further investigated both disease-causing alleles from HGMD and common variants identified in healthy individuals from diverse geographical sites (1000 Genomes Project Consortium, 2012) (Table S1A) with respect to the edgetic character and chaperone binding of their protein products. Interaction profiling showed that only a small fraction of non-disease alleles lost interactions (8%, Figure 4G), a seven-fold reduction relative to disease mutations (57%; P = 1.7 × 10−9; Figure 4C). In addition, non-disease alleles on average did not alter chaperone association (Table S2A), a characteristic distinct from disease mutations annotated by HGMD (Figure 4H) or ClinVar (Figure S5H). Together, interaction perturbations can help distinguish disease-associated alleles from non-disease alleles.
To assess the predictive power of edgotyping to identify disease-causing mutations, we determined its precision and sensitivity in classifying an allele as causal based on interaction perturbation profiles. As a “gold standard” for causal alleles, we used a set of mutations annotated in HGMD as disease-causing (“DM” in Table S1A). As a negative control, we used a set of alleles most likely not associated with disease. We observed that 96% (105 of 109) of the alleles found to perturb interactions (E or QN) were disease-causing (Figure S6A). Conversely, 61% (105 of 172) of disease-causing mutations annotated by HGMD were interaction-perturbing (Figure S6B). Together, our prediction achieved a precision (96%) and sensitivity (61%) significantly higher than random expectation. It is possible that current incompleteness of interaction network maps might limit the power of edgotyping to properly classify disease-causing mutations. To evaluate this possibility, we performed a down-sampling analysis and found negligible effect on mutation classification over a broad range of network sizes (Figure S6C).
To explore edgotypes from a structural point-of-view, we assessed the possible impact of distinct classes of mutations on protein function using PolyPhen-2 analysis (Adzhubei et al., 2010). Interaction-perturbing mutations are significantly more often predicted to be deleterious than non-interaction-perturbing mutations (Figure 5A). We next investigated whether mutations from the different classes might differ in evolutionary conservation, based on the presumption that conservation of amino acid residues is a property that generally reflects functionality (1000 Genomes Project Consortium, 2012; Subramanian and Kumar, 2006; Sunyaev, 2012). The residues affected by interaction-perturbing mutations are significantly more conserved across species compared to non-interaction-perturbing mutations (Figure S6D). However, PolyPhen and conservation analysis could not distinguish between edgetic and quasi-null mutations within the interaction-perturbing group.
Given that structural domains often mediate protein interactions, different classes of mutation might vary in their locations relative to protein domains. Interaction-perturbing mutations are indeed significantly enriched within structural domains compared to noninteraction-perturbing alleles (Figure 5B and Table S1C). In addition to structural domains, intrinsically disordered regions and linear motifs could also play a role in mediating PPIs. However, we found interaction-perturbing disease alleles to be depleted in intrinsically disordered regions (Figure S6E), and occurring in linear motifs as frequently as non-perturbing alleles (Figure S6F). These results suggest that mutations perturbing PPIs are preferentially located within structural domains. Nevertheless, none of the above properties could reliably predict whether a mutation would give rise to an edgetic or quasi-null PPI effect.
We next investigated whether edgetic and quasi-null mutations differ in their physical location within three-dimensional protein structures (Zhong et al., 2009). Edgetic mutations are significantly more enriched in structurally exposed residues compared to quasi-null mutations (Figure 5C). Consistently, edgetic mutations do not tend to cause a change in hydrophobicity, a destabilizing feature that generally disrupts protein function (Balasubramanian et al., 2005), while quasi-null mutations often lead to a decrease in hydrophobicity (Figure S6G).
We also investigated whether or not edgetic mutations are more frequently located at an interface that supports interaction with a partner protein. Starting from all available structures of co-crystal complexes in the Protein Data Bank (PDB) involving a disease gene product, we determined the relative location of each mutated residue within these structures (Extended Experimental Procedures and Table S5A). In contrast to quasi-null mutations, edgetic mutations are significantly enriched at interaction interfaces identified from the corresponding co-crystal structures (Figure 5D). Notably, edgetic mutations also exhibit a significant tendency to reside at interaction interfaces with the perturbed partners, as compared to unperturbed partners or random controls (Figure 5E). These results suggest that edgetic mutations are preferentially located at PPI interfaces, perturbing the corresponding interaction.
We hypothesized that protein interaction partners perturbed by edgetic mutations are likely to function together within the tissue known to be affected by the relevant disease. To test this, we compared gene expression patterns for perturbed and unperturbed partners in disease-relevant tissues using RNA-seq data from the Illumina Human Body Map 2.0 project. Perturbed interactors exhibit a striking tendency to be expressed in disease-relevant tissues compared with unperturbed interactors or random genes (Figures 5F and S6H; Table S5B). These results indicate that disease mutations most often perturb interactions that are functionally relevant in the particular tissue(s) affected by a specific disease.
Our edgotyping model suggests that different mutations in the same gene may result in different, pleiotropic phenotypic outcomes through perturbation of distinct interactions (Figure 6A). To test this, we compared mutation edgotype classes and the resulting disease phenotypes. Among pleiotropic genes associated with two or more diseases, mutant alleles associated with different disease manifestations were more likely to exhibit different edgotype classes of perturbed PPI profiles (Table S5C).
This is exemplified by mutations in TPM3, which encodes slow muscle alpha-tropomyosin. Three TPM3 edgetic mutations L100M, R168G and R245G are associated with fiber-type disproportion myopathy through an unknown mechanism (Adzhubei et al., 2010; Clarke et al., 2008) (Figure 6B). These edgetic mutations perturb 5 of the 10 interaction partners of the WT gene product. The majority of perturbed partners are expressed in muscle, the tissue most relevant to this disease (Figure 6C). One of the disrupted interactions is the interaction between TPM3 and troponin, which was shown to be vital for the transduction of calcium-induced signals required for muscle contraction (Gunning et al., 1990). Two other perturbed interactors, HSF2, involved in myotube regeneration (McArdle et al., 2006), and CCHCR1, required for cytoskeleton organization (Tervaniemi et al., 2012), could also be of disease relevance. In contrast to these edgetic mutations, the quasi-WT mutation M9R causes a different disease, nemaline myopathy. M9R might affect actin binding, thus leading to the formation of abnormal nemaline rods (Laing et al., 1995).
The possible disease relevance of our approach was further illustrated by edgetic mutations in the gene EFHC1, mutations in which can cause epilepsy. One perturbed partner, ZBED1, plays a role in a major cell proliferation pathway affected by EFHC1 knockouts (Yamashita et al., 2007), while another perturbed interactor, TCF4, is required for neuronal differentiation (Flora et al., 2007) (Figure 6D).
We next reasoned that mutations perturbing a greater number of interactions would be likely to have a larger impact on protein function, and hence result in more severe phenotypic effects. We used the age of disease onset as a proxy for severity and determined whether an increase in the fraction of interactions lost correlated with an increase in severity for each pair of mutations causing the same disease (as annotated by HGMD) (Figure 6E and Table S5D). We found that mutations perturbing more PPIs were associated with an earlier age of disease onset significantly more often than random expectation (Figure 6E). Although computational predictions based on PolyPhen-2 were able to distinguish between interaction-perturbing versus non-perturbing alleles (Figure 5A), they did not perform as well as our approach in predicting disease severity (Figure S6I). This limitation is consistent with the inability of PolyPhen-2 to distinguish between edgetic and quasi-null mutations (Figure 5A).
We hypothesized that mutations for which no PPI perturbation has yet been detected likely cause changes in other types of molecular interactions. As a proof-of-concept, we examined the effect of disease mutations on protein-DNA interactions (PDIs) between human transcription factors (TFs) (Reece-Hoyes et al., 2011a) and developmental enhancers (Fuxman Bass et al., in press). Our hmORFeome1.1 mutant library contains 70 TF ORFs altogether harbouring 173 mutations (Table S6A). A primary screen using enhanced yeast-one hybrid (eY1H) assays (Reece-Hoyes et al., 2011b) identified PDIs between 152 enhancers (Visel et al., 2007) and 28 WT TFs (Figure 1C and Extended Experimental Procedures). We then performed pairwise assays to compare the PDIs of mutant TFs and their WT counterparts in eY1H assays (Table S6B).
Using systematic PDI profiling, we determined edgotype classes for 58 mutations in 22 TFs that bound at least two enhancers. We identified 38% of the mutations as quasi-null, 43% as edgetic (loss or gain of interaction), and 19% as quasi-WT (Figure 7A). More than 80% of TF missense disease mutations tested either abrogated DNA binding or caused partial change of PDIs. Interestingly, almost half of the mutations are edgetic, challenging the assumption that TF mutations that affect DNA binding do so in a similar fashion across their targets. Among these, a significant fraction of mutations exhibit gain of PDIs, likely because these mutations cause a reduction in DNA-binding specificity and allow greater promiscuity in target recognition.
Given that TFs interact with their DNA targets through DNA-binding domains (DBDs), we assessed whether disease mutations perturbing PDIs are enriched within DBDs. Mutations within versus outside DBDs exhibited strikingly different PDI perturbation patterns (P = 1.1 × 10−3; Figure 7B and Table S6C). Among quasi-null mutations, the proportion of mutations within DBDs was ~10-fold higher than outside DBD regions. These results confirm that most PDI perturbing mutations reside within the DBDs of proteins, further supporting the quality and validity of our PDI perturbation data.
Mutations within the same TF that cause different PDI changes would affect the expression of different targets, resulting in different diseases. We examined disease-causing TF mutations in pleiotropic genes associated with two or more diseases. Mutations with different PDI edgotype classes were likely to be associated with different clinical manifestations (Figure 7C), consistent with our results for PPI perturbations (Figure 6A).
Of the disease mutations for which both PPI and PDI data were available, about half did not perturb any PPIs (Figure 7D). Interestingly, for ~80% of these we did identify PDI perturbations. For instance, mutations in the TGF-β-induced transcription factor TGIF1 cause holoprosencephaly (Gripp et al., 2000). While the two mutant variants S28C and P63R are still able to bind their protein partners CTBP1 and CTBP2 (quasi-WT for PPI), both mutations completely abrogated the ability of TGIF1 to bind any of the tested DNA targets (quasi-null for PDI) (Figure S7A). Clearly, integrating different types of molecular interactions will enhance our ability to understand specific mechanisms that underlie many genetic disorders.
To gain further insights into alternative molecular interaction perturbations, we computationally examined the effect of disease mutations on protein-chemical interactions (Reva et al., 2011). We found that the frequency with which disease mutations are at protein-chemical interfaces is significantly higher than that of non-disease variants (Figure S7B). In addition, disease mutations that perturb PPIs have no discernable tendency to locate at protein-chemical interfaces (Figure S7C), suggesting that protein-protein and protein-chemical interfaces do not tend to overlap. Interestingly, ~13% of PPI non-perturbing mutations are located at protein-chemical interfaces, supporting the conclusion that these mutations could cause disease through perturbation of alternative types of molecular interactions.
We combined computational predictions and interaction profiling to optimize our performance in disease mutation stratification. Although computational methods such as PolyPhen-2 could predict interaction-perturbing alleles as deleterious (Figure 5A), they fail to explain many disease-causing mutations, and misclassify them as “benign” (Figure S7D). Among these misclassified mutations, ~50% could be explained by molecular interaction perturbations (PCI, PPI or PDI). For instance, the S140F mutation in PKP2 encoding the adhesion protein plakophilin leads to arrhythmogenic right ventricular dysplasia (Gerull et al., 2004). While PolyPhen-2 predicts S140F as benign, the S140F mutant exhibited increased binding to the chaperones HSC70 and BAG2, and lost all the PPIs of the WT protein (Table S7A). All together, existing computational methods alone fail to precisely predict disease causality. Examining different types of molecular interaction perturbations is critical for a full comprehension of disease-causing mutations in human.
In this systematic characterization of mutations across various human Mendelian disorders, we have found surprisingly widespread disease-specific perturbations of macromolecular interactions. Approximately 60% of disease-associated missense mutations perturb PPIs, among which half result in complete loss of interactions, generally caused by protein misfolding and impaired expression, and the other half lead to edgetic perturbations. Importantly, different mutations in the same gene frequently result in different interaction perturbation profiles. This strongly suggests that the “edgotype” of a mutation represents a fundamental link between genotype and phenotype.
Our systematic edgotyping strategy provides a practical approach to classifying candidate disease alleles emerging from genome-wide association studies and from sporadic and somatic mutation sequencing approaches. Edgotyping achieves a high precision in identifying candidate disease-causing mutations based on the interaction perturbations relative to WT alleles (Figure S6A). However, the overall sensitivity of an edgotyping approach is compromised due to the false negative rate inherent to the assays used. We expect that a significant fraction of variants currently viewed as non-interaction-perturbing (quasi-WT) will eventually be proven to cause disease. This circumstance likely arises from the incomplete nature of current human interactome network maps (Rolland et al., 2014). Nevertheless, because edgetic mutations cannot become quasi-WT or quasi-null even as interactome maps improve, our estimate of edgetic mutations already provides a reliable minimum lower bound for their frequency.
An alternative possibility is that quasi-WT mutations affect disease phenotypes through perturbation of different types of molecular interactions. Biological signaling is regulated at multiple levels, and various types of molecular interactions are involved (Sahni et al., 2013) as we have shown for PPI and PDI networks. In addition, protein-RNA (Lee et al., 2006) and protein-metabolite (Carpten et al., 2007) interactions have also been shown to be involved in disease. Perturbations of these alternative interaction networks will undoubtedly result in distinct disease consequences. One can envision that integration of additional types of interaction perturbation information with computational predictions will be necessary for a complete understanding of the cellular networks governing a particular disease state (Figure S7D). As a major benefit, perturbed interactions spotlight specific targets and pathways that are altered in a patient-specific context. This type of information could provide a much-needed guide in efforts to developing better diagnostic tools and more personalized medical treatments.
Using ORFs in the human ORFeome v8.1 collection as template, we PCR amplified the two DNA fragments flanking the mutations, followed by a fusion PCR to stitch the fragments together. The resulting fusion ORFs harboring the mutations were Gateway cloned into the Donor vector pDONR223 to derive Entry clones (Rual et al., 2004), which were subsequently verified by next-generation sequencing (Yang et al., 2011).
Interaction with chaperones and other QCFs was performed using a quantitative LUMIER assay (Taipale et al., 2012; Taipale et al., 2014). All wild-type and mutant allele clones were transferred via Gateway recombination into a mammalian expression vector containing a C-terminal 3xFLAG-V5 tag. Stable HEK-293T cell lines expressing luciferase-QCF fusion proteins were generated by lentiviral infection, and plasmids carrying wild-type and disease mutation alleles were transfected into the stable HEK-293T lines (Taipale et al., 2012). Following capture of FLAG-tagged proteins, luminescence was measured to determine QCF-target interaction. Following luminescence measurement, FLAG-tagged mutant and wild-type proteins were detected as described (Taipale et al., 2012).
We performed a binary protein-protein interaction screen for all mutant and wild-type alleles as baits against ~7,200 human prey proteins (Rual et al., 2004). The identified interactions were combined with the known pairs catalogued by the human binary interaction dataset HI-II-14 (Rolland et al., 2014). All first-pass pairs from the primary Y2H screens were subjected to pairwise testing in which all interactors of any allele of a gene were then tested against all alleles of that gene. The resulting verified protein-protein interaction profiles of disease mutants were compared with their wild-type counterparts. We validated perturbed and unperturbed interactions from mutation-mediated interaction perturbation data (“edgotyping” data) using an orthogonal in vivo Gaussia princeps luciferase Protein Complementation Assay (GPCA). Human HEK-293T cells were co-transfected with each construct expressing complementary fragments of the Gaussia luciferase fused in frame with the tested protein pairs and luciferase activity was measured as described (Cassonnet et al., 2011).
An enhanced yeast one-hybrid (eY1H) assay was used to detect binary protein-DNA interactions (PDIs) between a DNA bait and a protein prey (Reece-Hoyes et al., 2011a; Reece-Hoyes et al., 2011b). DNA baits corresponding to human enhancers were retrieved from the Vista Enhancer Browser (http://enhancer.lbl.gov) (Visel et al., 2007). Protein preys were a set of TFs for which mutant clones are available in our human mutation ORFeome version 1.1. We performed pairwise eY1H assays of an arrayed collection of TF preys comprising all the wild-type TFs and their mutant clones against 152 available enhancer baits.
Disease-causing mutations were annotated by HGMD, and the deleteriousness of amino acid substitutions was predicted by PolyPhen-2 program (Adzhubei et al., 2010). For structural features, distinct mutations were compared with respect to protein domains from the Pfam database, and interaction interfaces on co-crystal structures from PDB. Tissue-specific gene expression was analyzed with normalized RNA-seq data from Human Body Map 2.0 (GSE30611). Network properties analyzed included betweenness centrality, k-core centrality, degree, and closeness centrality (de Nooy et al., 2005).
Full details are provided in the Extended Experimental Procedures.
We thank the members of the DFCI Center for Cancer Systems Biology (CCSB) for valuable discussions and acknowledge A.A. Chen, M. Koeva and E. Guney for helpful suggestions. This work was supported by NHGRI (P50HG004233 to M.V., F.P.R. and A.-L.B.; RC4HG006066 to M.V., T.H., D.E.H., K.S.-A., L.J.W. and S.L.; and R01HG001715 to M.V., D.E.H. and F.P.R.), NSF (CCF-1219007 to Y.X.), and NSERC (RGPIN-2014-03892 to Y.X.), and the Krembil Foundation, a Canada Excellence Research Chair, an Ontario Research Fund–Research Excellence Award and a Canadian Institute for Advanced Research Fellowship awarded to F.P.R. J.I.F.B. is supported by a Pew Latin American Fellowship. M.V. is a Chercheur Qualifié Honoraire from the Fonds de la Recherche Scientifique (FRS-FNRS, Wallonia-Brussels Federation).
Supplemental Information includes Extended Experimental Procedures, seven figures and seven tables.
AUTHOR CONTRIBUTIONSM.V., S.L., L.J.W., D.E.H. and K.S.-A. conceived the project. N.S., S.Y., M.T. and J.I.F.B. designed and performed experiments, with help from G.I.K., I.K., M.H.L., Q.Z., A.P., D.B., A.D., J.M.W., A.A.S., X.Y., An. S. and Y.J.. J.C.-H., F.Y., J.P., J.W., Y.W., I.A.K. and T.H. performed computational analyses with contributions from N.S., S.Y., M.T., J.I.F.B., A.K., G.T., V.K., Am. S., Y.-Y.L., Y.S., A.S.-M., C.F., D.M.J., A.L. and B.C.. V.K., Y.J., N.Y., M.E.C., M.A.C, S.S., B.B., L.J.W., B.C. and D.E.H. provided constructive feedback. M.A.C., S.S., B.B., D.E.H., A.-L.B., T.H., F.P.R., Y.X., A.J.M.W., S.L. and M.V. supervised research and provided critical advice on the study. N.S., S.Y., M.T., J.I.F.B., J.C.-H., M.A.C., B.C., D.E.H., F.P.R., Y.X., A.J.M.W., S.L. and M.V. wrote the manuscript, with contributions from other coauthors.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.