Few studies are able to access human neural tissue for studying diseases [27
]. Given the difficulty of procuring human brain tissue versus the relative ease of measuring blood expression levels, a question of great practical importance is to determine to what extent blood is a reasonable surrogate for brain in gene expression studies. Here we relate highly reproducible brain expression data from a recent meta-analysis of human brain data sets to two large blood data sets. Overall, we find that mean expression levels are weakly preserved between three brain regions and blood (r
range [0.24,0.32]). Since gene expression profiles in human brain regions are organized into highly reproducible co-expression modules [12
], it is important to determine which of these modules show evidence of preservation in blood. Only three out of 19 cortex modules, one out of 23 caudate nucleus modules and one out of 22 cerebellum modules show strong evidence of preservation. In blood, these five modules exhibit very similar expression patterns as can be seen from the very high absolute correlations (|r
| > 0.96) between their respective eigengenes (Figure ).
Although few modules were preserved, they tended to be relatively large. 67% of genes in the cortex network were part of one of the three preserved modules; 41% of genes in the cerebellum network and 12% of the caudate nucleus network genes were part of their respective preserved modules. Intramodular hub genes inside preserved modules are centrally located in both modules. The number of intramodular hubs depends on the threshold used for the module membership measures in brain and blood. 13.5% (357) of genes in the cortex network, 14.8% (305) of genes in the caudate nucleus network, and 13.8% (277) of genes in the cerebellum network were defined as preserved intramodular hub genes. Using our posted data and R software code, the reader can change the thresholds used for defining these hub genes. Our biological characterization of preserved intramodular hub genes is highly robust with respect to the chosen threshold values.
In mice, mean expression levels of heritable genes have been found to be highly correlated between mouse hippocampus and spleen [28
]. We do not find that heritable genes exhibit highly correlated mean expression levels between brain and blood (Additional file 17
). However, we find that the preserved intramodular hub genes tend to be more heritable (Figure ).
The preserved CTX blue, green, and yellow modules were found to be enriched with neuronal markers, glutamatergic synapse genes, and metabolism-related genes, respectively. The preserved CN yellow module was also found to be enriched with metabolism-related genes, while the preserved CB blue module was enriched with neuronal markers and genes encoding synaptic proteins [12
]. In blood, studying the enrichment with regard to brain cell type markers is not meaningful. However, one can classify blood cell types using human clusters of differentiation (CD) genes. Interestingly, the following CD molecules consistently have significant positive correlation with genes inside the preserved modules: CD58
A functional enrichment analysis of brain module preservation reveals basic functional pathways preserved between the two tissues. Figure shows that these preserved intramodular hub genes are significantly enriched for genes that play a role in infectious disease and infection mechanism, post-translational modification and RNA post-transcriptional modification. Other categories include Cell Death, Energy Production, Nucleic Acid Metabolism, Molecular Transport and Protein Trafficking (Figure ). The 36 intramodular hub genes that were preserved in all three sets exhibit several common functional themes. First, nearly 20% of these genes, including ASF1A, ATF2, DR1, HCFC1R1, HMGN4, MBD3, and RAD21, are known to play roles in modifying chromatin structure. Some of these modifications have been shown to induce transcription (e.g. ATF2, DR1, HMGN4), while others produce repressive effects (e.g. MBD3). A number of other genes in the group of 36 encode signalling proteins that are thought to play roles in a wide variety of cellular processes, including ARPP-19, CSNK1G3, MAP4K5, PPP1CB, and YWHAQ. A third category of genes relates to protein trafficking and includes RAB1A, SNX2, SNX3, while a fourth category consists of genes involved in mitochondrial function, including DLAT, SUCLA2, and YME1L1. Some of the proteins encoded by these 36 genes may physically interact, such as ATP6AP2, which associates with the transmembrane sector of vacuolar ATPases (proton pumps), and ATP6V1C1, which is a subunit of the vacuolar ATPase protein complex. Intriguingly, for a number of other genes in this group, biological functions remained to be elucidated (e.g. FAM3C, FLJ20254, LANCL1, PRNP, RABGGTB, and WRB). We note that many of these 36 preserved intramodular hub genes are expressed ubiquitously. Therefore, it is possible, perhaps even probable, that these genes are also co-expressed in other tissue types beyond brain and blood. Their co-expression may therefore help serve to maintain differentiated cells in a particular state (e.g. chromatin modifying genes) in response to a particular environment (e.g. signalling genes), as well as enable other shared, basic cellular processes (e.g. protein trafficking, energy metabolism).
Our study has several strengths including the use of multiple large data sets, carefully validated brain co-expression modules from Oldham et al 2008, and a powerful statistical approach for evaluating module preservation.
But our study also has several limitations including the following. First, the brain expression data were measured using the Affymetrix platform, while the blood expression data were measured using the Illumina platform. Since platform differences bias our results towards the null hypothesis of no preservation, we can be confident about preservation, but less confident about lack of preservation. The weak correlations between mean expression profiles may reflect platform differences. A second limitation is that we studied the preservation of brain modules in blood (and not vice versa). Our goal was to determine the preservation of robustly defined and well annotated brain modules. Defining blood modules and studying their preservation in brain tissue is beyond the scope of this article. A third limitation is the relatively small set of genes considered for the co-expression module preservation study. Oldham et al. had applied stringent filtering criteria to construct the brain network, which greatly reduced the number of probes considered in that study. After combining probes by gene symbol and merging the brain and blood data, the co-expression module preservation study focused on 2604 CTX, 2001 CB, and 2063 CN network genes. We focused on this relatively small set of genes since their connectivity pattern in brain was found to be highly reproducible across array platforms and independent data sets (Oldham et al 2008). But we should point out that our study of mean expression preservation involved 8799 genes. A fourth limitation is that we only use correlation network methodology. Many alternative co-expression network methods have been proposed in the literature [27
]. We focus on WGCNA since i) this method was used in Oldham et al (2008), ii) it is highly robust [19
], and iii) it affords a geometric interpretation of network concepts [26
]. An exploration of alternative procedures is beyond our scope but we encourage the reader to apply their method to our posted data.