The earlier ataxia interactome provided new insight into the pathogenesis of inherited ataxias (
16–
26), but it lacked interactors for two SCA-causing proteins, CACNA1A and ataxin-7. We hypothesized that using many fragments of the two proteins rather than full-length proteins to rescreen an adult human brain library in a highly stringent Y2H screen would generate better coverage and yield more interactors. By using 28 bait fragments, we identified 118 interactors for CACNA1A and ataxin-7 and were able to include these interactions in the ataxia interactome leading to a more expanded ataxia-protein network. The fact that we were more successful in finding binding partners using portions of the coding region in comparison with the full-length proteins raises the possibility that fragments might expose interaction domains that are otherwise impossible to expose in a yeast system, especially if certain cell-specific modifications are required to uncover such surfaces.
CACNA1A has two predominant splice forms that are expressed at roughly equal levels in wild-type mice. However, in knock-in mice carrying a polyQ expansion of either 14, 30 or 84 glutamines, there is a shift in the ratio of expression such that the MPI isoform is expressed at up to 80% of the transcripts (
29). In post-mortem SCA6 patient brain samples, there was also an increased amount of the MPI isoform compared with control brains (
105). It is unclear whether the increased prevalence of MPI is caused directly by the expansion and how the shift in isoform stoichiometry may alter disease pathogenesis. In this study, we used the C-termini of two predominant CACNA1A isoforms, MPI and MPc, as bait in an Y2H screen. Additionally, for the MPI bait fragments, we used wild-type (MPI-11Q), disease-causing (MPI-23Q) and hyper-expanded (MPI-72Q) polyQ alleles. We identified interactors that are isoform specific and also interactions that seem to make the MPI isoform with an expanded polyQ tract act more similarly to the alternative, MPc splice form. Interactors that are exclusive to MPI may mediate gain of function effects upon the expansion of the polyQ tract, such that more available MPI results in more association of these proteins. Given the altered MPc to MPI ratio seen in both patients and mouse models of SCA6 and that in a SCA1 mouse model a change in stoichiometry between two endogenous ataxin-1 containing complexes contributes to neuropathology, we propose that differences in the affinity of interactors for either MPI or MPc could modify the SCA6 disease phenotype (
18). Overall, we suggest that the CACNA1A interactions that are isoform specific may be important in the SCA6 disease process and that the newly identified CACNA1A interacting partners may provide insight into the molecular understanding of SCA6. Additionally, the identification of isoform-specific interactions for CACNA1A raises the possibility that future protein-interaction studies could provide additional insights when the individual isoforms are considered as nodes instead of including them together as a single protein, particularly when only one isoform is associated with a disease.
Ataxin-7 is highly evolutionarily conserved. In fact, yeast lacking the ataxin-7 homolog, Sgf73, can be rescued by wild-type ataxin-7 expression (
106). Our bait constructs took into account the regions of highest homology across species and paralogues. Interestingly, baits containing only the first half of the protein or the middle section were unable to identify interacting partners. This is likely a result of two related factors; first, the Y2H system that we used is highly stringent and does not result in the high levels of overexpression seen in some other screens. Second, the two baits that succeeded in identifying interacting partners are the C-terminal fragments that contain mostly regions not conserved in yeast, whereas those fragments that failed predominantly contained the polyglutamine tract, block I and/or block II, conserved regions that are known to be required for recruitment into yeast SAGA and mammalian STAGA/TFTC complexes (
9,
33–
35). We suspect that in yeast, the interaction of ataxin-7 with SAGA components might compromise the Y2H reporter assay. Interestingly, block III is not found in Sgf73 and may have a specific function aside from SAGA complex formation (
9). Furthermore, the successful baits begin near the reported caspase-7 cleavage sites, suggesting a possible role for this C-terminal protein fragment
in vivo, although to date most studies and antibodies have targeted the N-terminal cleavage product (
38,
43,
49). Despite being limited to only two successful baits, we identified many novel ataxin-7 interactors and revealed potentially interesting candidate partners.
Since the known role for ataxin-7 is as a STAGA/TFTC component, we expected to find ataxin-7 interactors related to histone modifications and transcriptional regulation. However, since the baits which successfully identified partners do not contain the regions responsible for STAGA/TFTC complex inclusion, it is not surprising that the proteins we identified have other functions. Although ataxin-7 cytoplasmic localization and nuclear-cytoplasmic shuttling have been reported, most studies to date have focused on the role of ataxin-7 in the nucleus. The ataxin-7 interactors we identify include nuclear proteins with known roles in transcriptional regulation and other proteins that are predicted to be cytoplasmic or secreted. Wild-type ataxin-7 was also found to colocalize with BiP, a marker for the endoplasmic reticulum in neurons, strengthening the likelihood that ataxin-7 has other functions besides its role in the STAGA/TFTC complexes (
40). Thus, our studies provide a unique foundation for future studies to uncover novel roles of ataxin-7.
Our studies of CACNA1A and ataxin-7 interacting partners lead to an expansion of the ataxia interactome (
15). We then used the updated ataxia interactome as a tool for exploring diseases comorbid with ataxia, hypothesizing that comorbid conditions would be associated with proteins found in the ataxia network. This study is the first that we are aware of in which patient medical records and a phenotype-based interactome are used together to explore the molecular basis of comorbid conditions. By interrogating 32 million patient records, we found that ataxia patients are indeed at increased risk of any comorbid diagnosis compared with the control population. Significantly, the relative risk of comorbidity increased from 1.63 to 3.03 when we considered the diseases represented in the ataxia interactome.
Importantly, the relative risk of 3.03 is likely an underrepresentation of the true comorbidity rate for ataxia and other diseases represented by the ataxia network. These calculations are based on currently available data sets. For instance, the ataxia network is not saturated and thus some interacting proteins are missing by chance. Second, the Medicare patient records are in the form of ICD-9 codes. These codes do not correspond in a one-to-one relationship for each disease, but can correspond to specific features and symptoms of diseases. For example, some ICD-9 codes are specific—such as MD (code 326.50), whereas some MD cases could instead be coded as low-vision (codes 369.60, 369.4, 369.3, 369.61 and others). These potentially overlapping ICD-9 codes can decrease the power (number of patients) one may see if some of the overlapping codes are considered together, as we did with some of the ataxia codes. Conversely, inclusion of misdiagnosed or miscoded patients can cause false-positive associations, such as alcohol-induced persisting amnestic disorder (code 291.1). Finally, when we account for the liabilities in our patient records by overlapping comorbid diseases with the ataxia interactome, we were limited to gene-disease relationships recorded in OMIM. Unfortunately, OMIM records are incomplete, even for the associations already reported in the literature, as exemplified by searching for MD associations and not finding ataxin-7 among the results. Thus, there is a disconnect between a patient having a molecular diagnosis of, for example, SCA7 and then having medical records including some form of ataxia and some form of vision loss. Thus, there are many nuances in patient records and disease–gene relationship databases that decrease the power of using a combined molecular and bioinformatics approach to uncover molecular relationships between comorbid diseases. Despite these limitations, our results show that this approach is fruitful for identifying these relationships and such analysis will only improve as electronic medical records are improved, and more disease-causing genes are identified and included into public databases. Furthermore, the approach and data in this study will serve as a framework for future studies that will benefit from better electronic medical data and accurate genetic diagnostic codes.
Overall, our comorbidity analysis and OMIM-based searches were limited to our ataxia interactome and 13 million patients and yet provided candidate proteins for a comorbid disease, MD. From these criteria, we found a highly significant subnetwork anchored by three MD causing proteins. The MD subnetwork contains 32 proteins that may have a role in MD or macular health. It should be noted that ataxin-7 is central in this subnetwork. Initially, we were surprised by the fact that many of these MD subnetwork proteins are predicted to be cytoplasmic or extracellular, however further literature searches and studies provide strong support for the biological relevance of this network. Specifically, 31 of the 32 proteins were previously reported to be expressed in the retina (
65–
72,
75,
78,
81,
85,
87,
89,
98). EFEMP1, one of the MD proteins that anchor the network, is a secreted protein; however, the disease-causing mutation R345W results in the retention of the protein within the cells (
107,
108). In the mammalian retina, EFEMP1 and ataxin-7 are expressed in the same cell types (
40,
46,
70,
100,
107,
109,
110). Additionally, MEGF8, another component of the subnetwork, was predicted to be in the cytoplasm or extracellular space, but recent studies have demonstrated that it co-localizes with GFI1B in the nucleus (
111). Given the lack of saturation of this screen and also the lack of other MD proteins as baits in the screen, it seemed likely that some or many of the other ataxin-7 interactors that are not connected to the subnetwork are also involved in macular health, thus we also included some of these proteins for
in vivo studies. Our own studies demonstrate that randomly selected proteins from within the MD quadruples and other ataxin-7 interactors are expressed in the mouse retina and some show altered expression in a mouse model of SCA7. Thus, we suggest that further investigation of these proteins could provide valuable insight into age-related MD.
More than 35 transgenic mouse models are available for a variety of the genes in the MD quadruples. We suggest that these mice should be further investigated given that mice are able to freely move, obtain chow and mate without vision, thus home cage behavior would not be changed even if these mice have retinal defects. Therefore, it would be interesting to look at the retina and its integrity from these mice at different ages as well as potential genetic interactions between seemingly benign mutants and existing models of retinal degeneration.
In addition to SCA7, there have been several reports of human diseases and mouse models that involve both cerebellar or PC and retinal degeneration, including a possible variant of Boucher–Neuhauser syndrome and PC degeneration (pcd) mice (
112–
117), and these cell types express a common subset of genes (
118). Our study shows a systematic underlying relationship between ataxia and MD.
Overall, our studies demonstrate that taken together disease-based interaction networks and patient medical records can reveal molecular associations underlying comorbidities in the patient population. Our data provide candidate proteins for further investigation into the molecular pathogenesis of SCA6, SCA7 and MD. Additionally, researchers investigating other diseases represented in the network and comorbid with ataxia will be able to use the data to interrogate possible molecular relationships. As researchers continue to develop interaction networks based on a common phenotype and medical records are increasingly made electronic, the power of comorbidity analysis to identify common molecular pathways in the pathogenesis of many diseases will increase too.