|Home | About | Journals | Submit | Contact Us | Français|
Spinocerebellar ataxias 6 and 7 (SCA6 and SCA7) are neurodegenerative disorders caused by expansion of CAG repeats encoding polyglutamine (polyQ) tracts in CACNA1A, the alpha1A subunit of the P/Q-type calcium channel, and ataxin-7 (ATXN7), a component of a chromatin-remodeling complex, respectively. We hypothesized that finding new protein partners for ATXN7 and CACNA1A would provide insight into the biology of their respective diseases and their relationship to other ataxia-causing proteins. We identified 118 protein interactions for CACNA1A and ATXN7 linking them to other ataxia-causing proteins and the ataxia network. To begin to understand the biological relevance of these protein interactions within the ataxia network, we used OMIM to identify diseases associated with the expanded ataxia network. We then used Medicare patient records to determine if any of these diseases co-occur with hereditary ataxia. We found that patients with ataxia are at 3.03-fold greater risk of these diseases than Medicare patients overall. One of the diseases comorbid with ataxia is macular degeneration (MD). The ataxia network is significantly (P= 7.37 × 10−5) enriched for proteins that interact with known MD-causing proteins, forming a MD subnetwork. We found that at least two of the proteins in the MD subnetwork have altered expression in the retina of Ataxin-7266Q/+ mice suggesting an in vivo functional relationship with ATXN7. Together these data reveal novel protein interactions and suggest potential pathways that can contribute to the pathophysiology of ataxia, MD, and diseases comorbid with ataxia.
Spinocerebellar ataxias (SCAs) are a class of neurodegenerative disorders, characterized by the loss of balance, progressive motor dysfunction and degeneration of the cerebellar Purkinje cells (PC). Several SCAs are caused by the expansion of unstable CAG repeats that encode a polyglutamine (polyQ) tract in the respective protein (1). Outside of the CAG tract, the proteins do not share homology. In all of the proteins, the length of the polyQ tract correlates inversely with the age of disease onset (reviewed in 2). The cellular and molecular pathophysiology underlying most of the SCAs is not understood and many of the implicated proteins appear to be functionally distinct (3–14). The common pathology in these diseases, however, raised the possibility that common molecular mechanisms might underlie the pathogenesis of SCAs. To gain insight into the functions of the ataxia-causing proteins and to explore potential common molecular pathways in the SCAs, our laboratory generated a protein interaction network for 23 ataxia-causing proteins and their known interacting partners using a high-throughput yeast two-hybrid (Y2H) screen (15). The ataxia network (interactome) successfully led to the identification of shared pathways and key interactions relevant to the pathophysiology of ataxia (16–26). One of the proteins identified in the original screen and investigated further by our lab was RNA binding motif protein 17 (RBM17). RBM17 is an RNA binding protein that functions in a spliceosome complex (27,28). In vivo studies demonstrated a molecular role for RBM17 in SCA1 gain of function (18). Together these findings highlight the utility and quality of the screen and the ataxia network.
However, a few of the bait proteins failed to identify many interacting partners (15). The proteins that failed had, on average, fewer different bait fragments, and many were only full-length baits. Two of the proteins that failed to identify interactors in the original screen were CACNA1A (zero interactors) and ataxin-7 (three interactors), the disease causing proteins for SCA6 and SCA7, respectively.
Patients with SCA6 have dysarthria, oculomotor abnormalities and an ataxic gait. SCA6 is caused by an expansion of a CAG repeat in CACNA1A, a pore-forming subunit of the P/Q-type calcium channels. There are three reported isoforms of CACNA1A that result from alternative splicing at the C-terminus; however, two of these, MPI and MPc, make up 99% of the transcripts (8,29). These two splice variants differ in the splice acceptor site for the final exon such that MPI includes exon 47 and the CAG coding repeat is translated into a polyglutamine (polyQ) tract in the cytoplasmic tail of the protein, whereas MPc splices to an immediate stop codon, resulting in a shorter cytoplasmic tail lacking the polyQ tract. Since the polyQ tract is in the cytoplasmic tail of the MPI isoform, it is readily available for protein–protein interactions. Furthermore, data from Kordasiewicz et al. (30) suggest that the cytoplasmic tail of CACNA1A can be cleaved and translocated to the nucleus, possibly indicating a novel function for this protein distinct from its role as a channel subunit.
Patients with SCA7 have macular degeneration (MD) in addition to ataxia (31,32). SCA7 is caused by an expansion of a coding CAG repeat near the N-terminus of ataxin-7. Ataxin-7 is evolutionarily conserved with the yeast SAGA complex component, Sgf73, particularly in two blocks within the N-terminal portion of the respective proteins (9,33–35). Ataxin-7 is a component of a STAGA/TFTC histone acetyl transferase complex (9). In mammals, glutamine-expanded ataxin-7 interferes with the proper function of the complex and leads to misregulation of rod photoreceptor gene expression (9,36,37). Interestingly, in mouse and human tissues, ataxin-7 is localized in both the nucleus and cytoplasm and it can shuttle between the nucleus and the cytoplasm in live cells (38–47). There is currently no known functional role for ataxin-7 in the cytoplasm.
We hypothesized that identifying protein partners of CACNA1A and ataxin-7 will provide insight into SCA6 and SCA7 pathogenesis. Boxem et al. (48) demonstrated that using fragments in an Y2H screen increases the sensitivity without reducing the specificity. Therefore, we hypothesized that using partial segments of the coding sequences of both wild-type and polyQ expanded CACNA1A and ataxin-7 would increase the likelihood of identifying interacting partners.
Additionally, we hypothesized that the expanded ataxia interactome could provide insight into ataxia pathogenesis. We present data here integrating the ataxia interactome with patient medical records to uncover molecular associations for diseases comorbid with ataxia. Furthermore, we show that some of the proteins predicted to be relevant for one comorbid disease, MD, are altered in a mouse model of SCA7 exhibiting MD.
To identify interacting partners for CACNA1A and ataxin-7, we generated 28 baits representing overlapping fragments of each protein (Fig. 1 and Supplementary Material, Table S1). The fragments of ataxin-7 were chosen based on known evolutionarily conserved or functional domains (9,38,49). CACNA1A baits were generated from the cytoplasmic tail of both isoforms. For the MPI isoform, we used multiple polyQ lengths; 11Q is common in wild-type alleles (one bait was 15Q for the wild-type length due to expansion during cloning), 23Q is disease-causing and 72Q is hyper-expanded (8,29). For clarity, throughout this manuscript, we refer to these as MPI-11Q, MPI-23Q and MPI-72Q, respectively, and MPI refers to this isoform regardless of polyQ length. All of the baits were fused to the DNA binding domain of Gal4 using the Invitrogen Gateway system and were used in a high-stringency Y2H screen against an adult human brain cDNA library. We identified 152 unique interactions with CACNA1A_MPI, CACNA1A_MPc and ataxin-7 (Supplementary Material, Table S2). For CACNA1A, we identified 59 interactors with the MPI isoform and 44 with the MPc isoform; for ataxin-7, we identified 49 interactors. Seventeen prey proteins were shared by both CACNA1A and ataxin-7, and 17 prey proteins interacted with both CACNA1A isoforms; in many instances, we identified multiple independent clones of the same protein. Thus, a total of 118 different prey proteins were identified.
We next sought to recapitulate the interactions in mammalian cells. We therefore tagged the bait and prey proteins with GST and myc, respectively, and tested 47 of 152 of the Y2H-based interactions using glutathione-sepharose affinity co-purifications (GST-APs) in HEK293T cells (15,50,51). Specifically, we were able to recapitulate the interactions for 78.7% of the protein pairs tested (Fig. 2 and Supplementary Material, Table S2). This rate of reproducibility of Y2H interactions in mammalian cells is similar to that seen in other studies and confirms the high quality of the screen (15,50,51).
Given that only one of the two predominant isoforms of CACNA1A contains the disease-causing CAG tract, we hypothesized that interactions specific to the disease-causing isoform, MPI, may be particularly relevant to pathogenesis. Therefore, we further explored the CACNA1A interactions, specifically searching for proteins that interact in a polyQ length-dependent and/or isoform specific manner. In the initial Y2H screen, we identified 44 different protein interactions with MPc, the short isoform, and 59 interactions with MPI, the polyQ containing isoform. Of these, 17 were identified by both classes of baits. Since the screen was not saturated, some of the interactions may have been found only with one isoform by chance. Cross-testing all of the interactions that were unique to each isoform by a yeast mating assay against the other isoform revealed that 71 of 86 proteins interacted with both isoforms, whereas 15 of the interactions were exclusive to the MPI (glutamine-containing) isoform in yeast. None of the proteins identified was unique to MPc, consistent with MPc containing domains that are also present in MPI (Fig. 1B).
We then specifically tested the 15 MPI partners by GST-AP to explore these interactions in a mammalian system (Table 1, Fig. 3). Of the 15, 5 are not expressed at detectable levels in this system and therefore could not be evaluated. Of the remaining 10 that interact exclusively with MPI in the Y2H system, 6 proved to interact with both CACNA1A isoforms in mammalian cells (Fig. 3A and B). Three of these, RBM12B, LPHN1 and SIAHBP1, preferentially interact with the pathogenic MPI-23Q and hyper-expanded MPI-72Q GST fusion and also with MPc, but not with the wild-type length MPI-11Q construct. This suggests that the polyglutamine expansion within the MPI isoform may cause aberrant protein–protein interactions with MPc interacting partners. Another 3, of the 10 proteins tested (ABI1, BZRAP1, YLPM1), recapitulate the Y2H interactions and interact only with the MPI isoforms (Fig. 3C and D). Furthermore, ABI1 and YLPM1 preferentially interact with the disease-causing MPI-23Q and hyper-expanded MPI-72Q polyQ form and not with the wild-type length MPI-11Q.
To learn about the biological properties of the new ataxia network, it is necessary to analyze its topological properties as they typically bring useful insight into organizational principles of biological networks (52,53). We explored the network properties of the ataxia interactome, including its degree distribution, which is the distribution of the number of proteins each protein interacts with, the identification of proteins with high betweenness centrality (a measure of the importance of a node to the overall network) and enrichment of ataxia subnetworks (54). Our previously published network looked at the degree distribution of an expanded network which included the literature-curated interactions to allow comparison of the degree distribution to that of other interactome studies of model organisms (15). Here we generated the ataxia interactome solely based on data generated from the Y2H screen we performed (Fig. 4) and did not include literature-curated interactions to avoid biases emerging from well-studied proteins (55). This ataxia interactome has a noisy degree distribution even with the logarithmic binning due to less rich statistics compared with the literature-expanded network. Yet, the distribution is still consistent with a power law, in agreement with previously found scaling exponent −2.2 (Supplementary Material, Fig. S1) indicating the importance of hubs in the ataxia network.
The proteins with the highest betweenness centrality in the interactome tend to act as bridges connecting functional modules (56). In this network, CACNA1A emerges as one of these proteins. The difference between the previously published ataxia interactome and the new updated network is due to the large number of newly identified interactors for CACNA1A. This result emphasizes the importance of multiple, overlapping protein fragments as baits for identifying interacting preys. Overall, the expanded network properties are statistically unchanged from the earlier network by the addition of two new hubs (compared in Supplementary Material, Table S3) (15).
One of the most useful analyses in terms of understanding the biology of the network is the identification of disease pathways by locating subnetworks or modules within the overall network (57–61). We searched for proteins linking pairs of ataxia-causing proteins that we call ‘ataxia triples.’ A protein in a triple directly interacts with two different ataxia-causing proteins in the form: Ataxia-protein1–Interactor–Ataxia-protein2. All of the new triples that did not appear in the previous network included CACNA1A and/or ataxin-7 as one of the ataxia proteins, suggesting that CACNA1A and ataxin-7 are linked to other ataxia-causing proteins through multiple interacting partners. There are 116 ataxia triples in our network compared with 53 and 63 in the previous Y2H- and literature-expanded ataxia interactomes, respectively (15). We calculated the significance of the observed number of triples, by randomizing the ataxia interaction network and preserving its degree distribution. With 1000 random realizations, on average, we observed 18 (± 9) triples, which renders the number of triples, 116, in our network highly significant (P< 0.001). The large number of links from ataxia-causing proteins most likely explains the large number of triples compared with the randomized networks. It is also useful to compare the number of triples in networks constructed from proteins associated with a phenotypically diverse group of disorders. This control, while focusing on disease proteins, removes biases for a particular disease phenotype and permitted us to analyze the extent to which the phenotype-based ataxia network has unusual topological properties. The number of triples in the ataxia interactome was significantly higher (P< 0.005) than phenotypically diverse networks with a mean of 31 triples, suggesting the importance of these triples in ataxia pathology (15).
The interaction network we mapped experimentally is based on screening with proteins involved in one common phenotype, ataxia, and contains many proteins of diverse functions. We hypothesized that if the protein interactions in the ataxia network are biologically relevant and relevant to the pathogenesis of ataxia, then other diseases associated with proteins in the network are more likely to occur in patients with ataxia. Therefore, using the network data together with Medicare patient records, we asked three questions. First, we asked whether patients with ataxia are at an increased risk of having any other diseases compared with other Medicare patients; secondly, if proteins implicated in these other diseases are found in the ataxia interactome; and finally, what relationships exist between proteins implicated in the same diseases?
We first performed comorbidity analysis in the large set of Medicare patient medical history data in the form of ICD-9 codes from 13 million patients. We aimed to make the analysis as specific to the hereditary ataxias as possible. Therefore, we removed records from both our patient and control group with ICD codes corresponding to toxic effects from non-medical substances (codes 980–989). We then defined the ataxia patient population by only including ICD-9 codes corresponding to specific ataxia diagnoses therefore excluding most non-degenerative ataxias (see Materials and Methods for ICD-9 codes corresponding to ataxia). We also removed ataxia patients with documented alcoholism since chronic alcohol abuse can cause ataxia. In our medical history data set, we identified 11 265 patients who have a diagnosis of ataxia meeting our criteria and 13 022 828 controls. By looking for diseases that appear more often in ataxia patients than in the patient population as a whole, we identified diseases that are comorbid with ataxia. We measured relative risk based on the number of occurrences of any one diagnosis with ataxia, and its prevalence in the ataxia and general patient populations (see Materials and Methods). We found over 500 ICD-9 codes with a relative risk of greater than 1.00 (Supplementary Material, Table S4A). Overall, within the 99% confidence interval, the average relative risk among the ataxia patients is 1.63, meaning that patients with ataxia are 1.63 times more likely to also have any other diagnosis (a comorbid condition) than Medicare patients without an ataxia diagnosis.
There are several built-in positive controls as well as limitations of using ICD-9 codes. For example, nystagmus (code 379.50), a known symptom of cerebellar dysfunction, has a relative risk of 25.1, meaning this diagnosis is 25.1 times more likely in a patient also diagnosed with ataxia. Conversely, despite our efforts to use strict criteria to classify patients, a limitation of this method is the inclusion of non-genetically caused ataxia due to the diagnosis/billing codes used. For example, although we removed patients with documented toxic substance exposure and alcoholism, some cases of alcoholic ataxia are possibly included and thus there is high relative risk of alcohol-induced persisting amnestic disorder (code 291.1) with a relative risk of 5.3, which is not due to a common molecular pathway, but rather to prolonged alcohol exposure. However, assessing the presumed false-positive comorbidities on a case-by-case basis introduces investigator bias. Ataxia is generally an outpatient condition and patients may have been seeking medical care for some other diagnosis, which may or may not be related to their ataxia; however, we cannot infer a priori which diagnoses are directly related.
We therefore used gene-disease associations and comorbidity analysis to relate proteins in the ataxia interaction network to disease phenotypes and to analyze whether molecular level gene-disease relationships are observed in the population (62–64). To relate the ataxia protein–protein interaction network to population disease patterns, we interrogated the ataxia network for proteins with links to disease phenotypes by searching the Online Inheritance in Man (OMIM) database (http://www.ncbi.nlm.nih.gov/omim/) for known gene-disease associations and we generated an overlap list of diseases that are represented in the ataxia network and appear in ataxia patients. We used this overlap list to assess the relative risk for patients diagnosed with ataxia to also have a disease whose associated protein is present in the ataxia network (Supplementary Material, Table S5). Accounting for a molecular relationship this risk is 3.03, which is significantly higher than the general comorbidity risk of 1.63 (Supplementary Material, Table S4B). This enrichment indicates that detailed protein interaction maps can shed light into population disease patterns. Furthermore, we can use the list of diseases that occur more often in ataxia patients and have known molecular causes to investigate the biological origin of the comorbidity relationships between diseases.
Interestingly, although not entirely unexpected, MD (code 362.50) was 2.7 times more likely to be diagnosed in ataxia patients than controls. Additionally, other similar visual diagnoses were also overrepresented in patients (Supplementary Material, Table S6). We were particularly intrigued that MD was comorbid with ataxia since it is considered a unique feature of SCA7 and we wanted to further understand the molecular relationship between ataxin-7 and other MD-causing proteins and to verify the comorbidity analysis. We hypothesized that our network was enriched for proteins associated with retinal degeneration. We therefore used the OMIM database to identify proteins associated with macular or retinal degeneration or dystrophy (see Materials and Methods for search terms). Surprisingly, of the 33 proteins that met these criteria, only fibulin 5 (FBLN5) and EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1) appeared in the ataxia network. Ataxin-7 is not included in the OMIM database as an MD-associated gene, despite the fact that glutamine-expanded ataxin-7 causes MD (31,32,39,43,45).
When we did not find enrichment for retinal degeneration-associated proteins as we expected, we investigated the proteomic relationship of MD-associated proteins (FBLN5, EFEMP1 and ATXN7) within the ataxia interactome, hypothesizing that the relationships between these proteins may provide insights into retinal degeneration. We looked for MD subnetworks, finding two MD triples in the ataxia network where an MD triple is defined as three interacting proteins in the form: MD protein1–Interactor–MD protein2 and where the three MD proteins are FBLN5, EFEMP1 and/or ATXN7. Interestingly, we identified 80 MD quadruples (MD protein1–Interactor–Interactor–MD protein2) (Fig. 5A). Ataxin-7, owing to being a hub, is central to the MD network and removing it results in a collapse of the MD subnetwork (Fig. 5B).
In order to determine the significance of the number of observed quadruples, we measured the number of quadruples between any three proteins in the ataxia network. On average, we found 0.48 quadruples rendering the 80 observed MD quadruples in the ataxia interaction network highly significant (P= 7.37 × 10−5). Since ataxin-7 was a bait protein with a large number of interacting partners, a more strict control would be to find the quadruples between prey proteins and ataxia (bait) proteins. In this case, we found 2.56 such quadruples on average, again significantly less than the 80 we observed based on the three MD-causing proteins (P= 0.000596). Thus, this high number of observed quadruples indicates that the three proteins known to be involved in MD are highly connected. Since the proteins in the MD subnetwork are quite diverse, we examined the protein pairs in the network and found that of the 83 interacting pairs 17 had been validated by GST-APs in the course of our validation of this Y2H screen and the previously published data in Lim et al. (15).
Given the significant enrichment of MD quadruples within the ataxia network, we hypothesized that these interactors are possible candidates for MD-causing or modifying loci or otherwise important in establishing or maintaining retinal health.
Since our screening used an adult brain cDNA library, we first tested the hypothesis that the proteins included in the MD subnetwork are also expressed in the retina. Literature searches revealed that 31 of the 32 MD quadruple proteins (or their paralogues) demonstrate retinal expression by either RNA or protein detection methods (Table 2) (65–72). The remaining protein, GFI1B, a transcriptional repressor, was used as a bait protein in the earlier interactome studies and thus is a hub within the MD subnetwork; therefore, it was essential that we validate its retinal expression. We first used commercial human retina polyA RNA for quantitative real-time PCR and detected GFI1B transcript (data not shown). We also sought evidence of GFI1B protein expression using immunofluorescence in wild-type mouse retina. We found that GFI1B is expressed most prominently in the outer nuclear layer (ONL) (Fig. 6A). Taken together, these results demonstrate that all of the proteins in the MD subnetwork are expressed in the retina.
Given that our screen was not performed using a retina cDNA library, nor was the screen saturated and given central importance of ataxin-7 in the MD subnetwork, we considered that other ataxin-7 interactors might also be important in retinal health. We thus expanded our literature review and found that the 27 additional ataxin-7 interactors are all expressed in the eye or retina. Overall, we show that all 59 ataxin-7 interactors or MD quadruple components are expressed in the mammalian retina; interestingly, some of the genes map to loci previously associated with familial retinal degeneration (Table 2), suggesting they may be candidates for disease genes, although overall we would predict the MD-associated loci would be genetic modifiers of retinal health (31,32,47,65–99).
We next hypothesized that some of the ataxin-7 interactors and/or MD quadruple proteins would be altered in the retina of a previously characterized SCA7 knock-in mouse model, Ataxin-7266Q/+ (39). In a mixed genetic background, visual dysfunction begins around 5 weeks of age and ataxin-7 micro-aggregates are seen beginning at 8–10 weeks of age with more and larger inclusions by 15 weeks (39). However, as the mice have been backcrossed to a pure C57BL6/J background, the Ataxin-7266Q/+ mice die at 10–11 weeks of age and inclusions remain very small.
To validate the functional relevance of the MD subnetwork in vivo, we examined the expression of some MD subnetwork components by immunofluorescence in 5-, 8- and 10-week-old Ataxin-7266Q/+ retina and wild-type littermate controls. Unfortunately, in vivo mouse ataxin-7 protein studies are limited by a lack of an antibody that recognizes normal expression of either the wild-type mouse or knock-in protein, precluding co-localization studies until after protein inclusions have formed.
We therefore used RNA in situ hybridization (ISH) to uncover Atxn7 expression in the ONL, inner nuclear (INL) and ganglion cell layers (GCL) of wild-type C57BL6/J mouse retinal sections (Fig. 6B). These data support previous data from other groups using immunostaining in SCA7 patient and control retina tissue that find ataxin-7 expression in these same retinal layers (40,46,70,100). Furthermore, Cancel et al. (40) and Einum et al. (100) demonstrate ataxin-7 staining in control patient retina in the inner plexiform layer (IPL), specifically in both cytoplasm and processes of the ganglion cells, the outer plexiform layer (OPL) and also in the inner and outer segments of the rods and cones. Thus, our ISH data and patient protein data taken together demonstrate that ataxin-7 is normally expressed throughout the retina.
We next randomly selected 7 of the 59 MD subnetwork and/or ataxin-7 interacting proteins for immunofluorescence studies. We used retina from wild-type and Ataxin-7266Q/+ littermates collected at 5, 8 and 10 weeks of age and immunostained for TRIM27, TRIM23, GRN, RAD23A, TRIM54, SIAH1 and CARD10 protein expression.
TRIM27 is a zinc finger protein that is reported to be a transcriptional repressor (101,102); however, similarly to ataxin-7, in some cell types TRIM27 is reportedly cytoplasmic (103). In mouse retinal sections, TRIM27 is strongly expressed in the ONL (Fig. 7A). Additionally, it is expressed in the outer segments of the photoreceptors, INL and GCL (data not shown). At 5 weeks, staining levels in wild-type and Ataxin-7266Q/+ retina are approximately equivalent. However, in both genotypes, staining increases with age in the nuclear layers. The increase is more obvious in knock-in retina such that at 10 weeks the staining is many times brighter in the ONL than either earlier Ataxin-7266Q/+ or age-matched wild-type retina. Importantly, the alteration in TRIM27 expression occurs at a protein level as Trim27 RNA is not altered in Ataxin-7266Q/+ retina by qRT–PCR (data not shown).
TRIM23 staining is in the OPL and IPL (Fig. 7B) in both knock-in and wild-type retina. In contrast to TRIM27, which is increased, TRIM23 immunostaining is reduced in knock-in tissue, particularly in the OPL. Given that the Ataxin-7266Q/+ mice have progressive retinal degeneration we co-labeled with anti-calbindin, a marker of horizontal and amacrine cells, to better visualize the OPL (Fig. 7C). Fluorescence of the TRIM23 in Ataxin-7266Q/+ tissue was lower than the dynamic range in wild-type tissue under the same sample preparation and imaging conditions. Thus, in order to confirm that the reduced fluorescence was due to decreased protein levels rather than cell loss, we scaled the intensity of both the wild-type and Ataxin-7266Q/+ images by a factor of 3× (Supplementary Material, Fig. S2). Scaling reveals that the TRIM23 pattern is indeed preserved in both Ataxin-7266Q/+ and wild-type tissue, confirming reduced TRIM23 expression in the absence of cell loss. Additionally, by qRT–PCR, Trim23 RNA levels are equivalent in wild-type and Ataxin-7266Q/+ retina thus demonstrating the reduction is at the protein level (data not shown).
We found that both GRN and RAD23A are expressed in the GCL and there are no differences between wild-type and Ataxin-7266Q/+ retina at the time points tested (Supplementary Material, Fig. S3). Similarly, there are no differences in TRIM54, expressed in the INL and ganglion cell nuclei. SIAH1 appears similar between wild-type and Ataxin-7266Q/+ retina, although it may be slightly brighter in wild-type tissue. Finally, CARD10, a caspase recruitment domain protein that is reportedly involved in GPCR signaling (104), exhibits strong immunostaining in the outer segments in wild-type mice but is not present in the Ataxin-7266Q/+ mice at any age. There is also weaker CARD10 staining of the ONL, in a pattern similar to both GFI1B and TRIM27 staining. As previously reported, Ataxin-7266Q/+ mice exhibit early photoreceptor degeneration. We therefore collected light microscope images at the same time as CARD10 imaging; the light images indicate that the difference in outer segment staining is likely attributable to photoreceptor degeneration (data not shown) (39).
Thus, the proteins we tested from the MD quadruples and other ataxin-7 interactors are all expressed in the retina, confirming previous data from other labs. Additionally, at least two, TRIM27 and TRIM23, demonstrate altered expression in Ataxin-7266Q/+ mouse retina compared with wild-type littermates. Taken together the GST-AP validations, literature support and in vivo expression data provide strong support for the biological relevance and importance of the MD quadruples identified in this study.
The earlier ataxia interactome provided new insight into the pathogenesis of inherited ataxias (16–26), but it lacked interactors for two SCA-causing proteins, CACNA1A and ataxin-7. We hypothesized that using many fragments of the two proteins rather than full-length proteins to rescreen an adult human brain library in a highly stringent Y2H screen would generate better coverage and yield more interactors. By using 28 bait fragments, we identified 118 interactors for CACNA1A and ataxin-7 and were able to include these interactions in the ataxia interactome leading to a more expanded ataxia-protein network. The fact that we were more successful in finding binding partners using portions of the coding region in comparison with the full-length proteins raises the possibility that fragments might expose interaction domains that are otherwise impossible to expose in a yeast system, especially if certain cell-specific modifications are required to uncover such surfaces.
CACNA1A has two predominant splice forms that are expressed at roughly equal levels in wild-type mice. However, in knock-in mice carrying a polyQ expansion of either 14, 30 or 84 glutamines, there is a shift in the ratio of expression such that the MPI isoform is expressed at up to 80% of the transcripts (29). In post-mortem SCA6 patient brain samples, there was also an increased amount of the MPI isoform compared with control brains (105). It is unclear whether the increased prevalence of MPI is caused directly by the expansion and how the shift in isoform stoichiometry may alter disease pathogenesis. In this study, we used the C-termini of two predominant CACNA1A isoforms, MPI and MPc, as bait in an Y2H screen. Additionally, for the MPI bait fragments, we used wild-type (MPI-11Q), disease-causing (MPI-23Q) and hyper-expanded (MPI-72Q) polyQ alleles. We identified interactors that are isoform specific and also interactions that seem to make the MPI isoform with an expanded polyQ tract act more similarly to the alternative, MPc splice form. Interactors that are exclusive to MPI may mediate gain of function effects upon the expansion of the polyQ tract, such that more available MPI results in more association of these proteins. Given the altered MPc to MPI ratio seen in both patients and mouse models of SCA6 and that in a SCA1 mouse model a change in stoichiometry between two endogenous ataxin-1 containing complexes contributes to neuropathology, we propose that differences in the affinity of interactors for either MPI or MPc could modify the SCA6 disease phenotype (18). Overall, we suggest that the CACNA1A interactions that are isoform specific may be important in the SCA6 disease process and that the newly identified CACNA1A interacting partners may provide insight into the molecular understanding of SCA6. Additionally, the identification of isoform-specific interactions for CACNA1A raises the possibility that future protein-interaction studies could provide additional insights when the individual isoforms are considered as nodes instead of including them together as a single protein, particularly when only one isoform is associated with a disease.
Ataxin-7 is highly evolutionarily conserved. In fact, yeast lacking the ataxin-7 homolog, Sgf73, can be rescued by wild-type ataxin-7 expression (106). Our bait constructs took into account the regions of highest homology across species and paralogues. Interestingly, baits containing only the first half of the protein or the middle section were unable to identify interacting partners. This is likely a result of two related factors; first, the Y2H system that we used is highly stringent and does not result in the high levels of overexpression seen in some other screens. Second, the two baits that succeeded in identifying interacting partners are the C-terminal fragments that contain mostly regions not conserved in yeast, whereas those fragments that failed predominantly contained the polyglutamine tract, block I and/or block II, conserved regions that are known to be required for recruitment into yeast SAGA and mammalian STAGA/TFTC complexes (9,33–35). We suspect that in yeast, the interaction of ataxin-7 with SAGA components might compromise the Y2H reporter assay. Interestingly, block III is not found in Sgf73 and may have a specific function aside from SAGA complex formation (9). Furthermore, the successful baits begin near the reported caspase-7 cleavage sites, suggesting a possible role for this C-terminal protein fragment in vivo, although to date most studies and antibodies have targeted the N-terminal cleavage product (38,43,49). Despite being limited to only two successful baits, we identified many novel ataxin-7 interactors and revealed potentially interesting candidate partners.
Since the known role for ataxin-7 is as a STAGA/TFTC component, we expected to find ataxin-7 interactors related to histone modifications and transcriptional regulation. However, since the baits which successfully identified partners do not contain the regions responsible for STAGA/TFTC complex inclusion, it is not surprising that the proteins we identified have other functions. Although ataxin-7 cytoplasmic localization and nuclear-cytoplasmic shuttling have been reported, most studies to date have focused on the role of ataxin-7 in the nucleus. The ataxin-7 interactors we identify include nuclear proteins with known roles in transcriptional regulation and other proteins that are predicted to be cytoplasmic or secreted. Wild-type ataxin-7 was also found to colocalize with BiP, a marker for the endoplasmic reticulum in neurons, strengthening the likelihood that ataxin-7 has other functions besides its role in the STAGA/TFTC complexes (40). Thus, our studies provide a unique foundation for future studies to uncover novel roles of ataxin-7.
Our studies of CACNA1A and ataxin-7 interacting partners lead to an expansion of the ataxia interactome (15). We then used the updated ataxia interactome as a tool for exploring diseases comorbid with ataxia, hypothesizing that comorbid conditions would be associated with proteins found in the ataxia network. This study is the first that we are aware of in which patient medical records and a phenotype-based interactome are used together to explore the molecular basis of comorbid conditions. By interrogating 32 million patient records, we found that ataxia patients are indeed at increased risk of any comorbid diagnosis compared with the control population. Significantly, the relative risk of comorbidity increased from 1.63 to 3.03 when we considered the diseases represented in the ataxia interactome.
Importantly, the relative risk of 3.03 is likely an underrepresentation of the true comorbidity rate for ataxia and other diseases represented by the ataxia network. These calculations are based on currently available data sets. For instance, the ataxia network is not saturated and thus some interacting proteins are missing by chance. Second, the Medicare patient records are in the form of ICD-9 codes. These codes do not correspond in a one-to-one relationship for each disease, but can correspond to specific features and symptoms of diseases. For example, some ICD-9 codes are specific—such as MD (code 326.50), whereas some MD cases could instead be coded as low-vision (codes 369.60, 369.4, 369.3, 369.61 and others). These potentially overlapping ICD-9 codes can decrease the power (number of patients) one may see if some of the overlapping codes are considered together, as we did with some of the ataxia codes. Conversely, inclusion of misdiagnosed or miscoded patients can cause false-positive associations, such as alcohol-induced persisting amnestic disorder (code 291.1). Finally, when we account for the liabilities in our patient records by overlapping comorbid diseases with the ataxia interactome, we were limited to gene-disease relationships recorded in OMIM. Unfortunately, OMIM records are incomplete, even for the associations already reported in the literature, as exemplified by searching for MD associations and not finding ataxin-7 among the results. Thus, there is a disconnect between a patient having a molecular diagnosis of, for example, SCA7 and then having medical records including some form of ataxia and some form of vision loss. Thus, there are many nuances in patient records and disease–gene relationship databases that decrease the power of using a combined molecular and bioinformatics approach to uncover molecular relationships between comorbid diseases. Despite these limitations, our results show that this approach is fruitful for identifying these relationships and such analysis will only improve as electronic medical records are improved, and more disease-causing genes are identified and included into public databases. Furthermore, the approach and data in this study will serve as a framework for future studies that will benefit from better electronic medical data and accurate genetic diagnostic codes.
Overall, our comorbidity analysis and OMIM-based searches were limited to our ataxia interactome and 13 million patients and yet provided candidate proteins for a comorbid disease, MD. From these criteria, we found a highly significant subnetwork anchored by three MD causing proteins. The MD subnetwork contains 32 proteins that may have a role in MD or macular health. It should be noted that ataxin-7 is central in this subnetwork. Initially, we were surprised by the fact that many of these MD subnetwork proteins are predicted to be cytoplasmic or extracellular, however further literature searches and studies provide strong support for the biological relevance of this network. Specifically, 31 of the 32 proteins were previously reported to be expressed in the retina (65–72,75,78,81,85,87,89,98). EFEMP1, one of the MD proteins that anchor the network, is a secreted protein; however, the disease-causing mutation R345W results in the retention of the protein within the cells (107,108). In the mammalian retina, EFEMP1 and ataxin-7 are expressed in the same cell types (40,46,70,100,107,109,110). Additionally, MEGF8, another component of the subnetwork, was predicted to be in the cytoplasm or extracellular space, but recent studies have demonstrated that it co-localizes with GFI1B in the nucleus (111). Given the lack of saturation of this screen and also the lack of other MD proteins as baits in the screen, it seemed likely that some or many of the other ataxin-7 interactors that are not connected to the subnetwork are also involved in macular health, thus we also included some of these proteins for in vivo studies. Our own studies demonstrate that randomly selected proteins from within the MD quadruples and other ataxin-7 interactors are expressed in the mouse retina and some show altered expression in a mouse model of SCA7. Thus, we suggest that further investigation of these proteins could provide valuable insight into age-related MD.
More than 35 transgenic mouse models are available for a variety of the genes in the MD quadruples. We suggest that these mice should be further investigated given that mice are able to freely move, obtain chow and mate without vision, thus home cage behavior would not be changed even if these mice have retinal defects. Therefore, it would be interesting to look at the retina and its integrity from these mice at different ages as well as potential genetic interactions between seemingly benign mutants and existing models of retinal degeneration.
In addition to SCA7, there have been several reports of human diseases and mouse models that involve both cerebellar or PC and retinal degeneration, including a possible variant of Boucher–Neuhauser syndrome and PC degeneration (pcd) mice (112–117), and these cell types express a common subset of genes (118). Our study shows a systematic underlying relationship between ataxia and MD.
Overall, our studies demonstrate that taken together disease-based interaction networks and patient medical records can reveal molecular associations underlying comorbidities in the patient population. Our data provide candidate proteins for further investigation into the molecular pathogenesis of SCA6, SCA7 and MD. Additionally, researchers investigating other diseases represented in the network and comorbid with ataxia will be able to use the data to interrogate possible molecular relationships. As researchers continue to develop interaction networks based on a common phenotype and medical records are increasingly made electronic, the power of comorbidity analysis to identify common molecular pathways in the pathogenesis of many diseases will increase too.
Y2H baits were designed based on known homology domains between species for ataxin-7 and for CACNA1A based on reports of translocation to the nucleus and splice isoforms (9,30,38,49). The baits were cloned using Invitrogen Gateway technology as previously described (15).
Baits were tagged with GST using the Invitrogen Gateway system as previously described (15,51). Preys were tagged with myc in the same manner. Prey proteins were ORFeome clones from the Human ORFeome collection [http://horfdb.dfci.harvard.edu/ and (119)] or were rescued from the yeast colonies. Briefly, for yeast rescue, the colony was grown in liquid culture and lysed. The purified plasmid was then transformed into E. coli. Since there is no bacterial selection marker that differs for the bait and prey plasmids in this system, three to five colonies were selected and sequenced to identify the prey colonies. After identification, we used the LR reaction from the Invitrogen Gateway system to transfer the prey coding sequence from the yeast-destination vector to the entry vector. Using a BP reaction, the coding sequence was subsequently transferred to the myc-vector as previously described (15,51).
GST-APs were performed in HEK293T cells. Eight-hundred nanograms of the GST-bait and myc-prey vectors were transfected into the cells using Lipofectamine 2000 and affinity purifications were performed as previously described (15). Western blots were performed as described previously (15).
The sequence data were processed as previously described (15). Briefly, the sequencing results obtained for the prey clones from the Y2H screen were submitted to BlastN and tBlastX searches to identify the protein-coding genes in NCBI database. The sequence databases used for alignment analysis were obtained from NCBI. The protein sequence data for alignments were obtained from ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/protein/protein.fa.gz, while nucleotide data were obtained from ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA/rna.fa.gz. The cross-tabulated protein and nucleotide alignment results were compared (filtered) for high quality based on E-value as well as for agreement between the nucleotide and protein result pairs. Discordant pairs and low-quality results were manually checked and repaired. Entrez and symbol identifiers were determined for each prey protein passing the alignment filter. Sequences that had poor results or questionable BLAST were discarded. The resulting data were compiled by both bait protein and bait fragment. A network (undirected graph) was constructed using the unique bait–prey protein pairs, determined by the analysis (see below).
The network was generated from the published ataxia network (15) and our screen data. We also included unpublished data from further rescreening with three of the baits previously published (15). The ataxia network was drawn using Cytoscape (120).
Medicare patient ICD-9 codes were used to analyze whether the molecular level gene–disease relationships are also present in the patient population. The Medicare data set consists of over 32 million hospitalization records of 13 039 018 patients. We first removed all patients with a diagnosis of ‘toxic effects of substances chiefly nonmedical as to source’ (ICD-9 codes 980–989) from our data set. We then defined patients as ataxia patients if they had one or more of the following ataxia-related ICD-9 codes in their history: primary cerebellar degeneration (code 334.2), spinocerebellar disease, unspecified (code 334.9), cerebellar ataxia NOS (code 334.3), Friedreich's ataxia (code 334.0), cerebral ataxia (code 331.89), ataxia-telangiectasia (code 334.8) and cerebellar ataxia in diseases classified elsewhere (code 334.4), but not including patients with alcoholism (codes 303.0–303.9). This left 13 022 828 controls and 11 265 patients with some form of ataxia. These ataxia patients’ records were searched for other disease diagnoses (comorbidities). For a given disease, the relative risk, RR, is measured by
where Cxa is the number of patients that have the disease x and ataxia (a), and is the expected number of patients that have disease x and ataxia. The expected comorbidity is =Ix /N * Ia /N where Ix and Ia are the prevalence of disease x and ataxia, respectively, and N is the total number of patients. The confidence intervals were measured by the Katz method (121).
Search terms were: macular corneal dystrophy, MD, macular dystrophy, maculopathy, retinal cone dystrophy, retinal degeneration, retinal dystrophy and degeneration of the retina.
Probes were designed based on Allen Brain Atlas probes that were successful in the brain. ISH experiments were performed as previously described (122–125). Primers used included a T7 promoter for the sense probe and SP6 promoter for the anti-sense. Gene-specific primers are as follows: Atxn7-for 5′-CACAGCTATGGAAGAAAATCCC-3′; Atxn7-rev 5′-AGGCTCACCGAGTGTGTTTTAT-3′.
Ataxin-7266Q/+ and wild-type littermates were sacrificed at 5, 8 or 10 weeks in accordance with our approved animal protocol. Eyes were immediately removed and placed in 4% paraformaldehyde in 1xPBS for 1–2 days on a rotator in a cold room. Tissue was then cryoprotected in a sucrose gradient of 5, 10 and 30 sucrose in 1xPBS overnight at 4°. The tissues were embedded in Tissue-Tek O.C.T Compound (Sakura Finetek U.S.A., Inc. Torrance, CA 90501, USA) and stored at −80° until sectioning. Tissue was sectioned on a cryostat at 12 μm, mounted on slides and dried overnight at room temperature. Slides were stored at −80° until immunostaining.
Slides were brought to room temperature, washed in PBS and permeablized with 0.3% Triton X-100 in PBS and blocked for 2 h at RT with 1% serum, 1% BSA, 0.05% Triton X-100 in 1xPBS. The slides were washed in PBS and incubated with the primary antibody (1:200 unless otherwise noted) without Triton X-100 for 3 days in a humid box at 4°. Primary antibodies are: TRIM27 (gtx102048), TRIM23 (gtx100057), CARD10 (gtx111222), SIAH1 (gtx104715), GRN (gtx100803), RAD23A (gtx100425), Genetex Inc., Irvine, CA 92606, USA; GFI1B, D-19 (sc-8559) used at (1:50), Santa Cruz Biotechnology, Inc., Santa Cruz, CA 95060 USA; and calbindin, D-28K, clone CB-955, (c-9849) used at (1:500), Sigma-Aldrich, St Louis, MO 63103 USA. After washing, secondary antibody [goat anti-rabbit Alexa Fluor 647 (A21245), donkey anti-goat Alexa Fluor 647 (A21447) and goat anti-mouse Alexa Fluor 488 (A11001) Invitrogen Corporation, Carlsbad, CA 92008 USA] was added at (1:250) to incubate for 2 days. The slides were then washed and incubated for 5 min with DAPI (2.5 μg/ml) in PBS, washed again and mounted with Prolong Gold antifade reagent (P36934), Invitrogen Corporation. After curing, the slides were stored at −20° until imaging. Images were taken on a Leica confocal microscope and processed using ImageJ software.
For each time point, two to three pairs of mice were collected. Each antibody was used to stain one pair of 5-, 8- and 10-week-old knock-in and wild-type mice during at least two independent experiments. Representative images are selected.
This work was supported by Award Number P30HD024064 from the Eunice Kennedy Shriver National Institute of Child Health & Human Development (to H.Y.Z.) and by the Ellison Foundation (to DFCI CCSB, M Vidal, Director). The content is solely the responsibility of the authors and does not necessarily represent the official views of the Eunice Kennedy Shriver National Institute of Child Health & Human Development or the National Institutes of Health. J.J.K. was supported by a training grant from the National Eye Institute (T32 EY07102), and the research on SCA6 was supported by a generous gift from a foundation that asked that its identity not be disclosed (to H.Y.Z.). H.Y.Z. is an investigator with the Howard Hughes Medical Institute. Funding to pay the Open Access publication charges for this article was provided by Howard Hughes Medical Institute.
We thank Agnes Liang, Dr Christina Thaller and Dr Richard Atkinson for technical assistance; Dr Nicholas A. Christakis for sharing the Medicare phenotypic network and patient medical history data; and Dr Melissa Ramocki, Dr Hsiao-Tuan Chao and Dr John Fryer for input on the manuscript.
Conflict of Interest statement. None declared.