|Home | About | Journals | Submit | Contact Us | Français|
In order to investigate commonly disturbed genes and pathways in various brain regions of patients with Parkinson's disease (PD), microarray datasets from previous studies were collected and systematically analyzed. Different normalization methods were applied to microarray datasets from different platforms. A strategy combining gene co-expression networks and clinical information was adopted, using weighted gene co-expression network analysis (WGCNA) to screen for commonly disturbed genes in different brain regions of patients with PD. Functional enrichment analysis of commonly disturbed genes was performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Co-pathway relationships were identified with Pearson's correlation coefficient tests and a hypergeometric distribution-based test. Common genes in pathway pairs were selected out and regarded as risk genes. A total of 17 microarray datasets from 7 platforms were retained for further analysis. Five gene coexpression modules were identified, containing 9,745, 736, 233, 101 and 93 genes, respectively. One module was significantly correlated with PD samples and thus the 736 genes it contained were considered to be candidate PD-associated genes. Functional enrichment analysis demonstrated that these genes were implicated in oxidative phosphorylation and PD. A total of 44 pathway pairs and 52 risk genes were revealed, and a risk gene pathway relationship network was constructed. Eight modules were identified and were revealed to be associated with PD, cancers and metabolism. A number of disturbed pathways and risk genes were unveiled in PD, and these findings may help advance understanding of PD pathogenesis.
Parkinson's disease (PD) is the second most common neurodegenerative disorder of the central nervous system, with an estimated prevalence and incidence of 2 per 100,000 people and 797 per 100,000 person-years in China (1). PD is characterized by the progressive degeneration of dopaminergic neurons in the substantia nigra pars compacta and the accumulation of Lewy bodies; α-synuclein-containing inclusions suggested to contribute to motor impairment (2,3). Clinical symptoms include bradykinesia, rigidity, tremors, gait impairment and postural instability, which adversely affect survival and quality of life in patients with PD (4,5). Thus, investigating the pathogenesis of PD and exploring effective therapeutic approaches is of great importance.
Significant effort has been applied to elucidating the molecular mechanisms underlying PD using high throughput microarray technology (6–13). Resultant data are deposited in public databases including the Gene Expression Omnibus (GEO) and ArrayExpress. For example, Zhang et al (7) analyzed gene expression profiles of three brain regions, the substantia nigra, the putamen and Brodmann area 9, demonstrating decreased expression of the proteasome endopeptidase complex NADH, but increased expression of heat shock proteins, metallothioneins, polypyrimidine tract-binding protein and synucleins in PD. Garcia-Esparcia et al (13) investigated changes in gene expression in the frontal cortex and reported downregulated cortical olfactory receptor levels (including OR2L13, OR1E1, OR2J3, OR52L1 and OR11H1) and upregulated taste receptor levels (TAS2R5 and TAS2R50) in patients with PD. However, these datasets are based on different microarray platforms, different sample sources and use small sample sizes, which make their results somewhat inconsistent and difficult to use clinically. Therefore, to discover and understand the common hallmark traits of PD, information from multiple studies should be integrated (14,15). In addition, despite the identification of several markers for the progression of PD, the relationships between genes, the expression of genes and clinical traits of PD remain unclear. Weighted gene co-expression network analysis (WGCNA) can be used to explore these relationships as previously reported (14–16).
The goal of the present study was to analyze a dataset comprising 427 samples collected from 17 publically available datasets, utilizing WGCNA to identify common PD risk genes from different brain regions and functional modules associated with clinical traits.
Gene expression data of PD, submitted prior to April 10th, 2015, were collected from ArrayExpress using the key word ‘Parkinson's disease’ under the two screening conditions (organisms: Homo sapiens; experiment type: RNA array). The following gene expression data were manually excluded: Non-PD studies, studies using cell lines, samples exposed to drugs, blood or leukocyte samples, in vitro studies and repeated data.
Annotated files of microarray platforms were also downloaded from ArrayExpress. The total number of genes for each microarray platform and the number of overlapping genes between platforms was calculated. Microarray platforms with <10,000 genes or <6,000 overlapping genes were excluded.
Raw data files (CEL files) of Affymetrix platforms (excluding Affy HUGENE; Affymetrix; Thermo Fisher Scientific, Inc., Waltham, MA, USA) were read with the function ReadAffy of the package Aff (Affymetrix; Thermo Fisher Scientific, Inc.) in R (17). This was followed by background correction, probe level normalization and expression value calculation using the Robust Multi-array Average (RMA) algorithm (18). Microarray data from the Agilent platform (Agilent Technologies, Inc., Santa Clara, CA, USA) were read with the function read.table of R and normalized with linear models for microarray data (LIMMA) package (19). Extraction and normalization of the microarray data from Affymetrix HUGENE was achieved using the read.celfiles and RMA function of the package oligo (20). Probes were mapped into genes according to the annotation files, and probes mapping to the same gene were averaged as the final expression level of the gene. Overlapping genes of different platforms were retained for further analysis. To overcome variance in overall gene expression levels resulting from diverse pretreatments, ComBat from the package sva (21) was adopted to combine all the microarray data.
By combining gene co-expression networks and clinical information, including brain region, age, sex and sample type, a WGCNA package was adopted (22) to screen out commonly disturbed genes in different brain regions. Cluster analysis was performed on the samples, using the function hclust to exclude outliers. A height of 125 was set as the cut-off point. Cluster analysis was then performed again for the samples, using clinical information to check the distribution of clinical features among samples. Microarray platform information was also included in the analysis to examine the effect of data normalization. A gene co-expression network was constructed, from which gene co-expression modules closely associated with clinical features (i.e. commonly disturbed genes) were screened out.
Functional enrichment analysis, using biological processes from the Gene Ontology (GO) Consortium (http://david.ncifcrf.gov/) and pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) (23), was performed for commonly disturbed genes using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) online tool (24). P-values were calculated using the hypergeometric distribution and adjusted by the Benjamini and Honchberg method (25). A false discovery rate (FDR) of <0.05 was set as the cut-off point.
Pathway information was downloaded from KEGG (26). Pathway expression level was determined as the median expression level of all genes in the pathway:
Where Pathik represents the expression level of pathway i in sample k. g1, g2, g3,… gn represent expression levels of all genes from pathway i in sample k. A pathway expression network was established when expression levels were determined for each pathway.
The correlation of two pathways in both PD samples and normal samples was examined with Pearson's correlation coefficient:
Where Px,y represents Pearson's correlation coefficient of pathway x and pathway y in all samples. X and Y represent the average expression levels of pathway x and pathway y in all samples. Co-pathway relationships with a Pearson correlation coefficient >0.99 were selected out.
If dysfunction of two pathways is associated with a disease, then common genes in the two pathways are likely to be involved in the disease. Therefore, disease-associated co-pathway relationships were screened using a hypergeometric distribution-based test:
Where n represents the number of common risk genes; N represents the number of total genes in both pathways and M represents the number of total risk genes in both pathways. P<0.05 was set as the threshold.
Common risk genes were acquired from pathway pairs. Risk gene pathway relationships were obtained and visualized with Cytoscape (27).
A total of 69 microarray datasets were obtained by preliminary search. Based upon the exclusion criteria, 19 datasets were used for further analysis. They were generated from 10 microarray platforms (Table I): A-AFFY-33, A-AFFY-34, A-AFFY-37, A-AFFY-41, A-AFFY-44, A-AFFY-54 (Affymetrix; Thermo Fisher Scientific, Inc.), A-MEXP-1174 (Illumina, Inc., San Diego, CA, USA), A-AGIL-28 (Agilent), A-GEOD-17047 and A-AFFY-168 (Affy HUGENE; Affymetrix; Thermo Fisher Scientific, Inc.).
Annotation files for the 10 platforms were downloaded from ArrayExpress (Table II). AFFY-34 and AFFY-41 were excluded due to containing only a small number of genes (9,659 and 8,166 genes, respectively) and few genes overlapped (2,508). MEXP-1174 was also removed as it had <7,000 overlapping genes, although it detected 10,742 genes. 7 platforms were retained: AFFY-33, AFFY-37, AFFY-44, AFFY-54, AGIL-28, GEOD-17047 and AFFY-168, containing a total of 10,898 genes.
Clinical information relating to 17 microarray datasets from 7 open access platforms was collected and processed. Clinical information was obtained for 427 cases, including 227 cases of PD and 200 controls. PD samples were obtained from 114 male patients and 54 female patients at the age of 54–94, while control samples were obtained from 97 males and 55 females at the age of 49–97. Thirteen different brain regions were covered, and detailed information is displayed in Table III.
R limma package (http://www.bioconductor.org/packages/release/bioc/html/limma.html) was applied to normalize the raw data from different microarray platforms. These data were subsequently combined and normalized. A good performance of normalization was achieved (Fig. 1).
Cluster analysis was performed to remove outliers based on the threshold value of a height cut of 125 (Fig. 2A). Three samples (GSM506020, GSM1311832 and X21_G05_P2) were removed as outliers. Cluster analysis was then conducted again with clinical information (Fig. 2B). The results indicated that brain region, gender and platform were distributed uniformly among sample types (PD or control), suggesting that a single clinical feature was not sufficient to classify samples. This justified the pre-treatment methods of the present study.
According to previous findings (28), the optimal soft thresholding power is the minimum value where scale-free fit is best optimized. Scale-free fit was set as 0.9 in the present analysis and the optimal soft-thresholding power was set as 14 (Fig. 3). Gene co-expression modules were identified using the function blockwiseModules. Five modules were obtained: Module 0 contained 9,745 genes (in grey, named MEgrey); module 1,736 genes (in turquoise, named MEturquoise); module 2,233 genes (in blue, named MEblue); module 3,101 genes (in brown, named MEbrown); and module 4,93 genes (in yellow, named MEyellow). The results of cluster analysis are displayed in Fig. 4.
Three modules (MEgray, MEblue and MEbrown) had no significant relationship with any clinical feature (Fig. 5). ‘Sex’ and ‘Batch’ revealed no significant relationship with the 5 modules (Fig. 5), demonstrating the suitability of the pre-treatment. ‘Organism part’ was significantly correlated with MEyellow (correlation coefficient 0.25, P=1×10-07; Fig. 5), confirming different expression patterns existed in different brain regions. ‘Disease’ was significantly negatively correlated with MEturquoise (correlation coefficient −0.14, P=0.003; Fig. 5), suggesting that there were commonly disturbed genes in different brain regions. Therefore, the 736 genes from MEturquoise were regarded as candidate PD-associated genes.
Candidate genes were associated with cellular respiration, protein catabolism and negative regulation of protein modification (Fig. 6A). In terms of cellular components, cytoplasm and mitochondrion were overrepresented in candidate genes (Fig. 6B). In terms of molecular function, pyrophosphatase activity and NADH dehydrogenase (quinone) activity were overrepresented (Fig. 6C). KEGG pathway enrichment indicated that these genes were involved in oxidative phosphorylation and PD-associated pathways (Fig. 6D).
A total of 44 significant pathway pairs were identified, from which 52 risk genes were revealed. The network constructed by significant pathways and risk genes can be further divided into 8 modules (Fig. 7). Module 1 was associated with PD, module 2 was associated with several cancers, and modules 3, 5, 6, 7 and 8 were associated with the metabolism of drugs, amino acids and carbohydrates (Table IV).
In the present study, a systematic analysis of 17 microarray datasets from 7 microarray platforms was performed. A total of 736 PD-associated candidate genes were revealed through functional enrichment analysis. Further co-pathway analysis identified 44 significant pathway pairs and 52 risk genes. A network containing pathway pairs and risk genes was constructed, from which 8 modules were identified. Module 1 was associated with oxidative phosphorylation and PD, while module 2 was associated with diverse cancers. Other modules were associated with the metabolism of xenobiotics, amino acids and carbohydrate.
Mitochondrial dysfunction and oxidative stress are closely associated with PD (29,30). Proteins associated with familial PD, including phosphatase and tensin homolog-induced putative kinase 1 (PINK1), Parkinsonism associated deglycase (DJ-1), synuclein alpha (SNCA) and leucine rich repeat kinase 2 (LRRK2), are either mitochondrial proteins or mitochondria-associated, and all interface with pathways involved with oxidative stress and free radical damage (31). Mitochondrial import and accumulation of α-synuclein impairs complex I (also known as NADH-ubiquinone oxidoreductase) in human dopaminergic neuronal cultures and the PD brain (32).
An increased prevalence of malignant melanoma and skin carcinoma is observed in patients with PD (33) and some degree of overlap in the underlying biochemical dysfunction of PD and cancers is known to exist (34). Genes that underlie familial forms of PD are often abnormally expressed in cancer. Functional studies implicate these genes as being associated with the maintenance of the cell cycle, so genes involved in the cell cycle might be potential targets (35).
Cytochrome P450 (CYP) enzymes are responsible for the metabolism of multiple exogenous and endogenous compounds. Brain CYPs are also involved in the progression of PD (36). Morale et al (37) reported that loss of aromatase cytochrome P450 function is a risk factor for PD, and McCann et al (38) discovered that the poor metabolizer genotype of CYP2D6 has a significant association with PD.
Some of the 52 risk genes identified in the present study have been implicated in the progression of PD, including SNCA (39). Deficiencies in complex I of the respiratory chain are frequent causes of mitochondrial diseases, and have been associated with PD (40). Several structural subunits of complex I were identified as risk genes in the present study, inlcuding NADH-ubiquinone oxidoreductase 1 alpha subcomplex subunit 5 (NDUFA5), NDUFA2, NDUFAB1 and NDUFA4. Peralta et al (41) reported that conditional ablation of NDUFA5 results in a mild chronic encephalopathy but no increase in oxidative damage. Three voltage-dependent anion channels (VADCs; VADC1, VADC2 and VADC3) were also identified as risk genes (42). They recruit Parkin to defective mitochondria, which promotes mitochondrial autophagy (42). Chu et al (43) demonstrated that VADC1 is downregulated in response to α-synuclein accumulation and aggregation and thus leads to a decrease in mitochondrial function. Glutathione S-transferase omega-1 (GSTO1) is involved in the metabolism of xenobiotics and has been previously reported to modify the age of PD onset (44,45). GSTA4 may also be involved with PD, but confirmation of this requires further studies.
Overall, a number of relevant pathways and risk genes were revealed in the present study. The risk gene pathway network produced may be useful to guide further research to illustrate the molecular mechanisms underlying PD. Some of the identified risk genes may also be potential therapeutic targets for treatment of PD, although this too requires further study.