|Home | About | Journals | Submit | Contact Us | Français|
Alzheimer’s and Parkinson’s diseases (AD and PD) are two common neurodegenerative diseases primarily affecting memory and motor functions, respectively. In this study, we integrated data from various sources, and took a systems-biology approach to compare and contrast the molecular and network based dysregulation associated with AD and PD and we integrated these data with known pathways of drug treatment. First, we identified genes that exhibit consistent prior evidence of association with each disease. Then, we extracted disease-specific sub-networks from a human interactome database using associated genes as seeds. To rank the sub-networks we used existing gene expression data from cases and controls. Comparison of resulting disease-associated genes and networks revealed significant overlap between AD and PD. In addition, the identified sub-networks correlated with known drug interdiction pathways, and suggested new potential targets for intervention.
Alzheimer’s disease (AD) and Parkinson’s disease (PD) are caused by progressive degeneration and/or death of nerve cells. In AD, the patient’s memory and ability to think and carry out tasks is slowly destroyed; while PD mainly affects the patient's physical abilities: patients lose body control and have difficulty with movement and coordination1,2. Both diseases are strongly linked with the process of aging3,4. For most patients with AD and PD, symptoms first appear after age 60. Since the average age of the population is increasing, the numbers of AD and PD patients are expected to grow rapidly: e.g., it is estimated that 5.4 million Americans have AD in 2010 and this number will increase to 11–16 million in 20505.
To date, FDA has approved several drugs to slow the progression of AD and PD. Most drugs attempt to prevent the breakdown of critical chemicals whose levels are decreased in patients, e.g., cholinesterase inhibitors for AD patients slow the metabolic breakdown of acetylcholine that is involved in nerve cell communication; carbidopa for PD delays the conversion of levodopa. However, almost all of the currently available drugs are effective for short periods (a few months to a few years). For both diseases, in order to develop more efficient treatments and drugs, it is important to investigate and understand the molecular mechanisms and molecular networks that are altered in the development and progression of diseases.
A large number of studies, including recent genome-wide associate studies (GWAS)6–11, traditional association, linkage, and gene expression studies, have been conducted to identify genes associated with AD and PD incidence and progression. Consequently, hundreds of potentially associated genes have been identified. A few of these genes have very strong connections with disease. In particular, APOE is linked to approximately 50% of AD patients12, and alpha-synuclein is associated with PD in members of a large Italian family13. However, for both diseases, the majority of the hundreds of identified genes are likely to individually have both small and potentially complex effects on the development and progression of disease14,15. To uncover the underlying molecular mechanisms in these two diseases, comprehensive analysis of identified genes and their interactions within a network framework might provide many important insights beyond the traditional single-gene or single-marker analyses16,17.
Network and pathway analysis are relatively new approaches to the study and identification of dysregulated components in diseases18. The underlying principle is that human diseases are caused by perturbations of the complex networks/pathways that link molecular components (such as genes and proteins) in a human cell. Pathways emphasize what is known and relatively well understood, thus the results can be easily integrated into familiar biological frameworks. On the other hand, current knowledge of pathways is incomplete, and network analysis explores new connections, and connects what are often perceived as distinct pathways. Network analysis has been employed to detect the networks associated with many complex diseases, such as cancer19,20, schizophrenia16 and addiction21. Recently, we have used integrated genome wide mRNA expression with protein-protein interaction (PPI) networks to detect sub-networks that are dysregulated in colon cancer and sleep disorders22,23. The aim of such studies is to identify the functionally related genes that exhibit coordinate differential expression between healthy and diseased patients. Due to the computational complexity of examining the actions of multiple genes simultaneously, such studies are currently focused on small subnetworks as markers of disease. However, multiple studies have demonstrated that the results are biologically meaningful and can provide testable hypothesis for further validation19,22,23. The framework of PPI permits scoring of multiple gene combinations within a highly functional context, while the reduced search space limits multiple testing corrections. These PPI sub-networks can reveal critical nodes and edges reflecting both biomarkers of disease, e.g. molecular beacons of the condition, as well as pinpoint critical nodes for potential functional intervention, e.g. important drug targets. We wished to test these properties of molecular network assessment in the context of known information on AD and PD, including the known biomarkers and known targets of drug treatment.
Although AD and PD have their own unique neuropathological features; many patients with one disease later develop symptoms of the other24–28. This observation suggests the presence of common genetic variants that predispose individuals to both diseases and/or age related similarities in disease progression. Furthermore, beyond common genetic variants, perturbation of common pathways or network connections may also be shared in AD and PD. Studies show that the cerebral accumulation of beta-amyloid is associated with AD while alpha-synuclein is linked with PD29,30. Using a transgenic mouse model and NMR spectroscopy, a recent study has suggested an interaction between beta-amyloid and alpha-synuclein31–33 providing a potential molecular connection between AD and PD. Motivated by these observations, we conducted a systematic comparative analysis to investigate the molecular mechanisms and relationship of AD and PD, by taking advantage of the large amount of available molecular data for both diseases. We identified genes and networks strongly associated with AD and PD, and comparative analysis showed that AD and PD have strong connections and shared components at both molecular and network levels.
To comprehensively investigate the molecular mechanisms of AD and PD, analyses at three levels were performed: (i) genes associated with AD and PD, (ii) networks connecting genes associated with AD and PD, and (iii) sub-networks dysregulated in AD and PD at the level of mRNA expression. This section describes the construction of genes and networks associated with each disease, and the detection of dysregulated sub-networks.
Genes associated with AD and PD (termed as AD genes and PD genes, respectively) were extracted from public databases constructed using literature searches10. For the purpose of comparison, gene sets related with three other common mental disorders (autism, multiple sclerosis (MS) and schizophrenia) were also generated using a similar approach. To extract associated genes with strong evidence, we only considered associations that are reported in at least four publications whenever possible. In the case of autism, only genes present in both databases were extracted. The data sources are summarized in Table 1, along with the number of extracted disease-associated genes for each disease.
The five resulting gene sets were then compared. In order to assess the significance of the overlap between the gene sets of each disease pair, we used a hypergeometric model. Namely, let N denote the total number of genes in the human genome (estimated to be around 21,000 at the time of this study). Let m and n respectively denote the number of genes associated with each disease (selected as described in the previous paragraph). If the number of genes that are common to both of these sets is k, then we computed the p-value of this overlap as follows:
To identify the pathways and enriched functional categories of genes shared between AD and PD, the commercial software Ingenuity Pathway Analysis (IPA, http://www.ingenuity.com/) was used.
Pathway analysis provides useful information on known biological processes common to diseases; however, it is limited in extracting novel information. PPI networks are useful in this respect, since they provide a comprehensive map of functional relationships among products of genes in the human genome. For this reason, we also mapped AD genes and PD genes to the human interactome, extracted the networks associated with AD and PD, and investigated the relationship of two diseases at the network level. We acknowledge that this approach may neglect important interactions; as such databases are currently incomplete. However, we had two guiding principles in the study. First, we wished to use only well curated interactions with few false positives. Second, we wanted the interaction set used to be “open source” so the results could be replicated by other groups. Thus, to fulfill both these criteria, we constructed networks using the human interactome downloaded from Human Protein Reference Database (HPRD)34. The human interactome downloaded from the HPRD server (www.hprd.org, version 9.0, downloaded in September 2010) had 9,453 protein-coding genes (genes and their encoded protein are used interchangeably in this paper) and 36,867 interactions among these gene products. We applied the Steiner tree algorithm35 to identify sub-networks that could connect AD genes (or PD genes) effectively while minimizing the number of non-AD (or non-PD) genes added to construct the sub-network. In particular, 151 out of 193 AD genes and 181 out of 268 PD genes were present in the HPRD network, thus with the choice of HPRD our gene set included over 70% of the originally identified genes. These genes were used as “seeds” to construct each disease specific network using Steiner tree algorithm35. Steiner tree algorithm generated a sub-network by two steps: a) all seed genes are connected by adding a minimal number of non-seed genes; b), the network is simplified with the shortest paths between seed genes. This algorithm has been widely applied for the generation of various networks16,36,37. Once the Steiner tree algorithm was applied to each gene set, the resulting networks were considered as AD network and PD network respectively35. Finally, networks were visualized and analyzed using the Cytoscape38,39 software.
The AD and PD networks were further used to identify dysregulated sub-networks in diseases by integrating network data with mRNA expression data. The aim of this procedure was to find sub-networks in which genes exhibit coordinate differential expression in the disease. Here, coordinate differential expression was assessed in terms of the ability of a group of genes in discriminating the samples from disease patients and healthy individuals when their expression profiles were considered together. For this purpose, we used two previous published mRNA expression data for AD and PD40,41. We scored sub-networks using a sub-network scoring method developed by Chuang et al19 and searched for high scoring sub-networks using a search algorithm (SASSy) developed by our group22,23. The principle and procedure implemented by SASSy is summarized as follows: Given a small sub-network, the “sub-network activity” for that sub-network in each sample is computed by aggregating the mRNA expression of the genes in the sub-network in that sample. Subsequently, the mutual information (MI) between phenotype (disease or control) and sub-network activity is computed as a measure of the capability of “sub-network activity” for that sub-network to discriminate the two groups. MI measures the reduction of uncertainty in phenotype upon observation of the sub-network activity (aggregate mRNA-level expression) of the genes in the sub-network. To this end, a high MI score for a sub-network is an indicator of the coordinate mRNA-level dysregulation of the genes in the sub-network. SASSy exhaustively searches for sub-networks (composed of up to 5 genes) of the AD (PD) network to identify sets of genes with high MI, which is guaranteed to find all sub-networks with a maximum MI, as opposed to a heuristic algorithm19. Since, in this study, the networks were restricted to the neighborhood of genes associated with each disease, such an exhaustive search was feasible. Limiting the size of the sub-network to five is in fact arbitrary, and is defined only to limit the computational search time.
The 193 AD genes were compared with 268 PD genes to investigate the relationship between AD and PD. The results showed that 52 genes were shared in these two gene sets, and this overlap was highly significant by hypergeometric test (p-value < 10−10), indicating a strong connection between AD and PD at the molecular level. Functional analyses of the 52 overlapping genes using IPA indicated that the top functions associated with these genes are “death of normal cell and neurons” (p-value < 10−7) and “nervous system development and function” (p-value < 10−8).
To elucidate the gene overlap with other mental disorders, in order to see if the AD/PD association was especially significant, additional gene sets for autism, MS and schizophrenia were collected, and compared in pair-wise fashion with AD and PD and each other. The results clearly showed that the genes that are confidently indentified with PD and AD have the most significant overlap (Figure 1a); the p-values for pairwise overlaps for all five diseases are summarized in Figure 1b. The genes confidently identified as being associated with autism had very modest overlap with other brain diseases (no p-value < 0.01) while MS and AD (p-value < 10−5), schizophrenia and PD (p-value < 10−5), schizophrenia and AD (p-value < 10−4) had significant overlap. The results also indicated that 38 genes were associated with at least three diseases (genes clustered in the center of Figure 1a) and 11 out of 38 genes are associated with four diseases (APOE, BDNF, GSTM1, IL1A, IL1B, IL10, MTHFR, MT-ND5, PON1, PTGS2, and SLC6A4), with the likely conclusion that these genes probably play a significant role for the proper functioning of brain. Among the 38 genes, seven are mitochondrial genes (Mt-ND1, Mt-ND2, Mt-ND3, Mt-ND5, Mt-COI, Mt-L2, and Mt-DLOOP), consistent with the important role of mitochondria in brain function and dysfunction42. Five of the 38 are interleukin or interleukin-related genes (IL1A, IL1B, IL6, IL10 and IL1rn) and it is well documented that IL1 is associated with behavior, neuroendocrine function, and sleep43; and dysregulated IL6 is linked with brain tumors44.
Based on AD genes and PD genes, two networks were extracted from the HPRD interactome using the Steiner tree algorithm. The resulting network for AD contained 225 genes and 387 interactions (AD network, Figure 2a). Among the initial 193 AD genes, 151 were retained in the network (the remaining 42 genes were absent in HPRD); to connect these 151 genes the algorithm added 74 additional genes. It is interesting to note that 13 out of these 74 new genes had prior evidence of association with AD in the database; they were not included in the AD genes as they did not satisfy the criterion of being present in at least four publications. The inclusion of these genes in the AD network provides further evidence for their functional association with other AD genes, and illustrates the power of a network-based approach. Similarly, the PD network had 273 genes and 502 interactions (Figure 2b), 185 PD genes were retained (the remaining 88 were absent in HPRD), and 5 of 88 added genes had evidence of PD association in the databases.
The comparison of AD network with PD network showed another level of relationship between the two diseases: 72 genes were shared in two networks; the network generation process added many genes that were common among the final disease networks. 51 out of 72 genes were clustered into a connected network, 4 genes were isolated from others, and the rest of genes formed four networks with 2, 4, 5, 6 genes respectively (Figure 3). Further analysis indicated that 54 out of these 72 genes were seed genes. Comparing these 54 seed genes with the AD and PD genes showed that 44 out of 54 genes were common in two disease gene sets. Furthermore, five of the AD genes (ACHE, APP, ATXN1, CLU and DAPK1) were included in the PD network by the Steiner tree algorithm, and five of PD genes (APOB, CALR, CAV1, NOS1, and TFRC) were included in AD network. In summary, more common genes were found when comparing at the network level due to the interactions between AD genes with PD genes.
Several non-seed genes were added into the disease network due to their significant positions and connection to seed genes (Figure 3). For example, TP53, SRC, EGFR and ZBTB16 were not seed genes, they were included in the network because each of them had multiple interactions with seed genes. Three studies support the link between TP53 and AD45–47, while there have no evidence for the association of SRC, EGFR and ZBTB16 with AD or PD in the databases. Our findings suggest that they are likely associated with common neurodegenerative diseases.
Integrating network information with mRNA expression data can help to identify sub-networks that are the most dysregulated in disease. We expect these highly dysregulated sub-networks (of five genes in this analysis) and their immediate neighbors, to include both important biomarkers of the disease process and to identify important points for drug interventions to reverse the dysregulations and thus reverse the disease process. To identify highly dysregulated sub-networks, the algorithm SASSy has been developed and tested to identify biomarkers and therapeutic targets related to cancer and sleep disorders22,23. These results have shown that SASSy is a powerful tool to assess the coordinate dysregulation of multiple functionally connected genes.
We applied SASSy to exhaustively search for small (up to five genes) dysregulated sub-networks in AD and PD. We report results here for the sub-networks that have the maximum MI in the analysis. The AD disease network was analyzed using mRNA expression data from hippocampal tissue (seven AD patients vs nine controls); this generated six sub-networks with equal MI (sub-networks a–f of Figure 4)40. Several genes well known to be associated with AD were found in these sub-networks, including APOE and APP (amyloid precursor protein). Most of the genes in the sub-networks were seeds from the original disease networks (colored red). This shows that the top sub-networks, as identified by SASSy, correctly identify known markers of disease. Several genes, like TP53 and APP, appeared multiple times, suggesting that they might mediate important crosstalk between different pathways. On the other hand, this is the first evidence for the association of AD with genes in the sub-networks such as TP53, SRC, PLCG1 and HIF1A, NAA10, etc. For example, NAA10 (N(alpha)-acetyltransferase 10) helps modulate HIF1A (hypoxia inducible factor 1) acetylation, thereby promoting its degradation48. These novel genes are potential targets for further experimental investigation in AD. Furthermore, six genes (APOE, CAV1, FYN, GSK3B, NQO1 and PTS2) from the AD sub-networks are also PD seed genes, which provides evidence that the most significantly dysregulated sub-networks in AD have strong connections to PD.
In the case of PD, one significant sub-network (network g of Figure 4) was detected to be dysregulated using mRNA expression data from the substantia nigra region (16 PD patients and 9 healthy controls)41. Four out of five genes (ABCB1, CAV1, ESR1, and JUN) have strong evidence for the association with PD. This is the first evidence that TP53 has a significant association with PD. TP53 and CAV1 are present in dysregulated sub-networks of both AD and PD, which suggests another connection between AD and PD.
We expect that our disease-based molecular networks are an informative framework for analyzing effects of AD and PD drug treatments. Also, we were interested to know if the dysregulated sub-networks identified by SASSy revealed any known drug targets, or were closely associated with known targets. AD and PD approved drugs and targets were identified by searching the IPA database (Table 2); we then compared the components of networks (Figure 2) with the targets of current drugs. The drugs and drug targets in the IPA database are shown in Table 2, the targets in bold were found in our disease-associated networks, and the disease networks included most of the targets. Several nodes of the disease associated networks link directly with more than one drug target, for example, CollagenQ (COLQ) directly links with ACHE and BCHE (targets for AD), these nodes are also in the common sub-networks for AD and PD (see Figure 3) while GIPC1 (PDZ domain-containing protein GIPC1) is linked with both DRD2 and DRD3 (dopamine receptor targets for PD).
Although no drug target appeared in the SASSy scored sub-networks (Figure 4), as they are limited to five nodes, several direct interacting partners of drug targets are present in the sub-networks. For example, three nodes, SRC, FYN and PLCG1 of the subnetwork (a) in Figure 4 are direct partners of GRIN2B (glutamate receptor, ionotropic, N-methyl D-aspartate 2B), which is a target of Memantine, a treatment for AD. APP, found in a subnetwork of 4b and 4d, is an interacting partner of ACHE, the target of Donepezil and other agents. As SASSy was able to identify important nodes both dysregulated in disease and those directly connected to important drug targets, the sub-network nodes may include other potential important new targets that could become the focus of future studies. For example, FYN (a proto-oncogene tyrosine-protein kinase) and SRC related kinases, seen to be highly dysregulated in AD, are important candidates for new drug development in cancer and neuronal diseases49. IGF1R, (Insulin like growth factor receptor 1), a dysregulated target in AD (Figure 4a), is an important target for development of inhibitors in cancer50, interestingly it is up-regulated in AD and thus these agents could be explored for their effects in blocking or reversing the disease. Overall, agents mediating functions of SRC, FYN, and IGF1R represent logical drugs for validating the molecular networks seen in this study, as well for the proteins are potential new targets for AD therapies.
In summary, we identified genes and sub-networks associated with AD and PD as well as sub-networks that connected the two diseases. These strong connections included three levels of evidence: significant overlap of AD and PD associated genes, large common regions among AD and PD networks, and shared nodes for highly dysregulated sub-networks. In addition, the dysregulated sub-networks identified by mutual information scoring of PPI sub-networks with gene expression data using SASSy provided powerful identification of both known and potential new biomarkers for AD and PD and potential novel drug targets for AD.
This work is supported in part by the Case Western Reserve University/Cleveland Clinic CTSA (Grant Number UL1 RR024989 including a supplement to support T1 research) from the National Center for Research Resources (NCRR) to MRC, a component of the National Institutes of Health and NIH roadmap for Medical Research. Support is also acknowledged from the National Science Foundation (CCF-053195 and IIS-0916102) to MK. Support for the development of SASSY is also provided by NCRR through a grant to Neo Proteomics, R43RR031932. The SASSy software tool is available exclusively through Neo Proteomics, Inc. (www.neoproteomics.net).