In this study we utilize gene expression data to identify genes involved in schizophrenia. We identified 2 co-expression modules (gene networks) that consistently relate to schizophrenia status in two independent data sets. One of these modules (under-expressed in schizophrenia patients) was significantly enriched with brain-expressed genes, leading us to focus on this module. The most central (‘hub’) gene in this network,
ABCF1, is
cis-regulated, which means that nearby genetic variation indirectly/directly regulates this schizophrenia-associated gene expression network. Using data from a previously published GWAS
[33] the overall genes represented in this network are enriched with SNPs nominally associated with schizophrenia, providing further support that this network has a primary role in schizophrenia susceptibility.
In the first test data set, we identified 12 gene co-expression networks (further called modules) that were associated with schizophrenia disease status, reflecting the large number of individual genes found in the differential expression analysis. By relating modules and intramodular hub genes to disease status and SNPs, our systems genetic approach (WGCNA) alleviates the multiple comparison problem inherent in the data: only 14 modules were tested for association instead of >23,000 genes
[25],
[34],
[35],
[36].
To address the concern of expression changes induced by the effects of antipsychotics, we evaluated our modules in a second validation dataset of antipsychotic-free samples and found all 14 modules to be highly preserved between datasets, confirming previous observations that it is possible to infer stable and reproducible modules using whole blood RNA expression profiles
[37],
[38]. In addition, real-time qPCR experiments validated expression changes in the number one hub gene in the Tan schizophrenia module,
ABCF1, as well as the gene with the highest log fold change,
GZMH. Although all modules were found to be preserved, only 2 of the 12 schizophrenia-related modules remained significantly related to disease status in the validation data set of antipsychotic-free samples: the Salmon and Tan module. The fact that 10 of the 12 modules were not associated with schizophrenia in the replication set may reflect large-scale effects of antipsychotic medication on gene expression in blood. This phenomenon warrants further investigation since the majority of schizophrenia patients is readily treated and thus may bias other studies likewise. Since schizophrenia is a severe mental disorder often presenting with acute psychosis, it is a major challenge to recruit patients that have not yet received antipsychotic medication. Other considerations that may affect the replication may be the relative small sample size of the 40 controls and 29 antipsychotic-free cases that was used in the second stage, and the source of sampling with no medicated patients collected in Denmark. Due to limited availability of clinical data on the subjects it was not possible to investigate differences in symptom profile, or other parameters such as age of onset. However, there is no evidence that this has led to a systematic bias or a less representative patient group. Nevertheless, the strong conservation of the modules of co-expressed genes in the replication sample and the consistent finding of two modules associated with schizophrenia provide a strong lead for further study.
As expected we find that many co-expression modules reflect blood cell types. Two modules (Blue and Brown) were enriched for neutrophil markers, the most common leukocyte. The Red module was enriched for lymphocyte markers and the Yellow module for red blood cell size markers. Finally, the Cyan module was enriched for genes related to time of blood draw. We did not find that the disease-associated modules (Tan and Salmon) relate to the studied blood cell markers. Since schizophrenia is a brain-related disorder, the question remains whether whole blood gene expression profiles can be informative for analysis. We therefore tested the two significant modules for enrichment of brain-expressed genes, with the assumption that if a module is enriched, this should prioritize our efforts. Interestingly, out of all of the observed modules, only the Tan module is significantly enriched for brain-expressed genes. For this reason we consider the Tan module the most likely to contain genes underlying neuropsychiatric disease. Since we do not have blood and brain samples available from the same individuals, we cannot formally correlate the expression values in these two tissue types.
The gene content and connections for the Tan schizophrenia module are visually represented in . Besides categories related to hematological function, the Tan schizophrenia module was also enriched for the Neurological Disease category (
CCL5,
PRKCQ,
PTAFR,
AKR1B1,
CD247,
IL10RA and
KHSRP). Moreover, this module was found to contain two genes previously suggested to be involved in schizophrenia; namely Catechol-O-methyltransferase (
COMT) and phosphatidyl-inositol-4-kinase-catalytic-a (
PIK4CA).
COMT is involved in the degeneration of endogenous catecholamines as well as in the metabolism of drugs used in many neuropsychiatric diseases. Results from association studies of the common Val158/108Met polymorphism with schizophrenia, however, remain ambiguous
[39],
[40],
[41],
[42]. The
PIK4CA gene is a catalytic enzyme in the phosphatidylinositol (PI) pathway, involved in the regulation of signal transduction, synaptic transmission and possibly of cell shape of neurons or oligodendrocytes. Like
COMT, this gene has been linked to psychiatric traits associated with the 22 q11.2 deletion syndrome
[43],
[44], which is in turn characterized by increased prevalence of psychotic symptoms
[45],
[46].
Previous studies of blood based gene expression of schizophrenia have been modest in size and identified a number of differentially expressed genes but without a consistent pattern
[21],
[47],
[48]. Our relatively large sample size allows us to evaluate previous findings. One study found 123 genes to be differentially expressed in blood, of which 6 showed the same pattern in brain
[23]. Our results do not confirm differential expression of these 6 genes. A later study attempting to validate the differential expression of 7 genes from previous studies was only able to confirm the differential expression of
CXCL1
[49]. Our study did not confirm differential expression of
CXCL1, but of the other 6 genes,
S100A9 (under-expressed in schizophrenia) was found in the Yellow module, one of the modules that is negatively associated to schizophrenia in the medicated dataset.
A larger sample of 49 Japanese antipsychotic-free patients and 52 controls resulted in 792 differentially expressed probes in whole blood. A supervised artificial neural network analysis identified 14 of these as a predictor set for diagnosis with an accuracy of 91.2%
[50]. Of the 8 known genes in this list,
PGRMC1 was located in the Blue module, confirming over-expression in schizophrenia. The largest Caucasian blood-based study of schizophrenia consists of 32 untreated patients and 32 matched controls found 180 differentially expressed genes
[24]. Of the under expressed genes (97 known genes) we find 17 genes in modules negatively related medicated dataset. Of the over expressed genes (79 known genes), 10 are located in modules positively related to disease status in the medicated dataset. Gene expression analysis in whole blood has been used to investigate the association of psychosis with the ubiquitin proteasome system
[51],
[52] and a top list of 31 genes
[53]. We could only confirm down-regulation of
TCF4 (located in the Red module). The limited overlap in findings characterized gene expression studies of schizophrenia so far both in blood and brain. Although our study confirms a subset of genes found in previous studies, none of them are located in the Tan schizophrenia module. This may result from our large sample size and the fact that we had access to an independent antipsychotic-free sample set to validate our results. Of the genes in the Tan schizophrenia module, 18 genes (
YARS,
AARS,
MAGED1,
HSP90AB1,
AKR1B1,
NUP93,
SNX17,
DDOST,
PSMB2,
NCALD,
ACOT7,
IMP3,
SARS,
GLG1,
COMT,
MXD4,
GPR177,
ARAF) do overlap with co-expression modules constructed using brain tissue and associated with schizophrenia previously
[15]. In addition, three genes (
ALDOC,
ENO1,
SDHA) were found to be differentially expressed in previous microarray studies examining brain (overview in
[15]).
Intramodular hub genes in disease related modules have been found to be biologically and clinically interesting genes in several disease applications
[54],
[55]. The most connected intramodular hub gene in the Tan module is
ABCF1 (ATP-binding cassette, sub-family F (
GCN20), member 1). This gene is located within the MHC region at chromosome 6p. This gene-dense region is essential to the immune system and contains many human polymorphisms
[56]. Common variants from the MHC region have been reported to be associated with schizophrenia
[2],
[3].
The fact that the expression of hub gene
ABCF1 was found to be
cis-regulated in a previous study suggests that a key member from the Tan module is regulated by the MHC region and in turn drives other genes in the module
[31],
[57]. The Tan module also contains a number of other genes within the MHC region: heat shock protein 90 kDa alpha (cytosolic), class B member 1 (
HSP90AB1); ring finger protein 1 (
RING1); casein kinase 2, beta polypeptide (
CSNK2B) and tubulin, beta (
TUBB).
Our results coincide with the findings of existing GWAS studies for schizophrenia that highlighted involvement of the MHC region in disease susceptibility
[2],
[3]. While GWAS studies included thousands of subjects, our study in contrast involved a few hundred cases and controls. Moreover, gene expression data highlights individual genes within the MHC region that are differentially expressed in schizophrenia while the association signal observed in the GWAS studies are dealing with extensive patterns of linkage disequilibrium without the ability of pinpointing single candidate genes.
It is striking that the robustly defined (Tan) schizophrenia module was significantly enriched with nominally significant SNPs from a GWAS study. This corroborates the fact that this module is independent of drug effects and represents primary disease effects. Previous studies find enrichment of disease related SNPs in their co-expression modules of interest, implicating that these modules represent causal effects
[58],
[59]. In addition, this approach highlights the fact that network analyses can be used to reconstruct molecular phenotypes for the identification of the genetic association signal derived from pathways, rather than small effects from individual genes.
Overall we show that gene expression profiling in whole blood provides new insight in the molecular networks that may underlie schizophrenia. By making use of an antipsychotic-free patient sample, we were able to replicate our findings and filter out large-scale medication effects. The Tan schizophrenia module that we identified is enriched with brain-expressed genes In addition; there is enrichment of association signal in GWAS, which further supports causal involvement in disease susceptibility. Moreover the association of genetic variants in the MHC region with the hub genes of this Tan schizophrenia module suggests that recent MHC association findings may increase schizophrenia susceptibility via altered gene expression of regulatory genes in this network. Future studies involving suitable model systems could aim to validate these causal hypotheses.