|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide gene expression studies may provide substantial insight into gene activities and biological pathways differing between tissues and individuals. We investigated such gene expression variation by analyzing expression profiles in brain tissues derived from eight different brain regions and from blood in 12 monkeys from a biomedically important non-human primate model, the vervet (Chlorocebus aethiops sabaeus). We characterized brain regional differences in gene expression, focusing on transcripts for which inter-individual variation of expression in brain correlates well with variation in blood from the same individuals. Using stringent criteria, we identified 29 transcripts whose expression is measurable, stable, replicable, variable between individuals, relevant to brain function and heritable. Polymorphisms identified in probe regions could, in a minority of transcripts, confound the interpretation of the observed inter-individual variation. The high heritability of levels of these transcripts in a large vervet pedigree validated our approach of focusing on transcripts that showed higher inter-individual compared with intra-individual variation. These selected transcripts are candidate expression Quantitative Trait Loci, differentially regulating transcript levels in the brain among individuals. Given the high degree of conservation of tissue expression profiles between vervets and humans, our findings may facilitate the understanding of regional and individual transcriptional variation and its genetic mechanisms in humans. The approach employed here—utilizing higher quality tissue and more precise dissection of brain regions than is usually possible in humans—may therefore provide a powerful means to investigate variation in gene expression relevant to complex brain related traits, including human neuropsychiatric diseases.
Gene expression measures provide an intermediate phenotype linking underlying genetic variation with higher order phenotypes observed at the cellular, organ or organism level. Such intermediate phenotypes are particularly important for efforts to understand the genetic basis of heritable human neurobiological traits; the etiological complexity of such traits has continued to frustrate genetic mapping efforts.
The use of gene expression profiles as intermediate phenotypes has been motivated by the demonstration that transcript levels from healthy human subjects, as measured in peripheral blood mononuclear cells and derivative lymphoblastic cell lines, are highly heritable traits which are usually stable over time (1–3). Numerous expression Quantitative Trait Loci (eQTL) involved in natural inter-individual differences in regulation of gene activity have been detected in blood cells (4). Identification of trans-regulatory elements controlling gene expression at different genomic locations and cis-regulatory elements involved in local expression regulation have provided the basis for better understanding transcriptional regulatory processes.
Inter-individual variation in cortical gene expression has been correlated with genome-wide genotypic variation (5). It is, however, extremely difficult to assemble sufficiently large samples of human brains to enable well-powered eQTL mapping studies relevant to brain and behavioral phenotypes. Such mapping studies, therefore, depend on a systematic strategy for identifying brain-related transcripts that demonstrate high inter-individual variability and that can be assessed in readily accessible surrogate tissues. The accessibility of blood makes it an attractive substitute for brain tissues, but there is so far little direct evidence for its suitability for this purpose.
Studies of different types of acute brain injury in rodents have revealed overlapping gene expression profiles between brain and blood (6–8). In humans, the utility of blood as a surrogate for brain tissues has been supported mainly through indirect investigations of expression profiles in relation to specific neurological and psychiatric diseases (9–13). Such studies have shown that, in both brain and blood samples, expression patterns distinguish between samples drawn from patients or control individuals.
Much less data are available from direct comparisons of brain and blood expression patterns. One such study (14) observed a correlation of about 0.5 in transcript levels among all genome-wide transcripts expressed both in brain and whole blood, including substantial correlations in transcripts representing putative candidate genes for schizophrenia. Further studies are needed to extend our database of transcripts whose levels of expression are reliably concordant between brain and blood. Such information is necessary for the development of large scale studies aiming to map eQTL that are relevant to our understanding of brain and behavioral traits.
We describe here the results of direct comparisons of transcript levels between blood and brain samples in a set of related, apparently healthy vervet monkeys, members of an extended pedigree [the Vervet Research Colony (VRC)] in a species (Chlorocebus aethiops) widely employed as a model system in biomedical research. The recent divergence time (23–25 million years) between hominoids and Old World monkeys (15) is reflected in extensive similarities in neurophysiology and neuroanatomy, and in conservation of genomic sequences (~94% sequence identity between humans and vervets) that enables the use of human-specific tools to assess gene expression in Old World monkeys.
The key factors in performing reliable comparisons between blood and brain tissue gene expression are the collection of samples under uniform, well-controlled conditions, the use of a stringent protocol of tissue dissection and sample preservation, and the use of high-quality RNA for gene expression studies. Each of these goals is more readily achievable in a non-human primate model than in human studies. Large-scale gene expression profiling of human brain samples is limited by the frequently poor quality of preserved material, and inconsistencies between specimens in term of tissue dissection and preservation, factors which may have contributed to inconsistent or discrepant results in efforts to replicate gene expression profiling studies of neuropsychiatric phenotypes (9,16–18). Additionally, environmental variables, such as medication use, that may substantially influence gene expression—are much more difficult to control for in investigations of humans compared with non-human primates.
Our study has taken advantage of several features of the VRC, to maximize the reliability of expression comparisons between brain and blood, and to use the information from such comparisons to design well-powered eQTL mapping studies. We were able to obtain replicate measurements of blood expression from several monkeys, and then to collect from the same monkeys, multiple tissues, including brain samples, under highly controlled conditions. The colony management practices minimize environmental variability between the monkeys. These features enabled us to (i) characterize the extent to which inter-individual variation in brain gene expression—of transcripts selected using stringent criteria—is reflected in peripheral blood and (ii) to hypothesize that such variation largely reflects genetic variation. Using such a stringent approach, we identify an initial set of transcripts whose expression levels show low intra-individual variation and large inter-individual variation and are sufficiently heritable to make them candidate expression phenotypes for eQTL mapping.
We determined gene expression profiles across nine tissues—eight brain regions [cerebellar vermis, pulvinar, head of caudate, hippocampus, occipital pole, orbital frontal cortex, frontal pole, dorsolateral prefrontal cortex (DLPFC)] and peripheral blood—in 12 male vervets. Having transcript measures from different tissues from the same individuals allowed us to evaluate, for each transcript, sources of transcript level variation within and between individuals (Fig. 1). We further focused on two classes of transcripts characterized by high variation of expression across brain regions or high variation between individuals. High spatial and temporal inter-individual variation determined, respectively, between brain tissues and blood and between independent blood samplings, allowed us to investigate heritable brain gene expression traits in peripheral blood. The brain and blood gene expression data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus (19) and are accessible through GEO Series accession number GSE15301 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15301).
To evaluate the impact of probe-target sequence incompatibility in our data set, we compared the number of probes widely detected in vervet brain tissues (at least 80% of tissues) and in 193 human cortical samples analyzed by Myers et al. (5), using 6791 probes that are in common between the two data sets. At this threshold, 2410 probes were detected in vervet brain and 4622 probes were detected in human cortex. There was considerable overlap in the probes detected by both studies. In spite of human–vervet sequence differences, 46% (2128/4622) of the probes detected in human cortex were also detected in vervet brain, whereas 88% (2128/2410) of the probes detected in vervet brain were detected also in human cortex, showing that human–vervet sequence differences do not prevent reliable detection of a substantial fraction of the brain transcripts.
We used the detection status of all 22 184 probes represented on the Illumina HumanRef-8 version 2 chip across the tissues sampled in tissue set one to determine relationships in gene expression between eight brain regions. Distances between tissues were estimated on the basis of the probes that showed the most striking differences in terms of number of shared detections. Most such probes were either detected in only one tissue or detected in all or most brain tissues. Similarities between all possible pairs of tested tissues are illustrated by a heat map (Fig. 2A). Cortical tissues generally show the greatest similarity within this sample set, but three genes (KREMEN1, MED13L and ZMYM6) differentiate orbital frontal cortex from frontal pole and one gene (POLE) differentiates DLPFC and frontal pole. As reflected in a corresponding dendrogram showing relations between all brain tissues (Fig. 2B), neocortical regions cluster close to hippocampus and are more distantly linked to caudate and pulvinar tissues, with respect to the number of detected samples. Cerebellar vermis and pulvinar tissues are more distant from other brain tissues in terms of the number of differentially detected transcripts on the dendrogram.
When tissues are ranked according to the number of region specific detections (Table 1), cerebellar vermis is first, followed by head of the caudate. Cerebellar vermis shows a tissue-specific presence of 38 transcripts and absence of 54 transcripts. Transcripts with decreased detection in this tissue are significantly enriched for genes associated with developmental processes and coding calcium binding proteins. In the head of the caudate, 22 transcripts are preferentially detected. Of these transcripts, 18 have a decreased number of detections and are enriched for genes associated with neuronal activities (2.97E−04). Additionally, we grouped together three frontal regions clustering on the bottom of the tissue dendrogram—frontal pole, DLPFC and orbital frontal cortex—and compared them against all other brain tissues, identifying two transcripts (GPR120 and RASGRF2) as specifically expressed in these frontal regions.
We focused on transcripts that are ubiquitously detected across tissues and individuals, as these transcripts provide a means to investigate brain-related biology using peripheral blood samples. Of the 22 184 probes on the array, we identified 2481 probes—representing 2430 genes—for which expression was detected in all 12 individuals in all eight brain regions and in blood (Supplementary Material, Table S1). The regional mean expression levels of these ubiquitously expressed 2430 genes were examined in all pairwise comparisons of brain tissues, and the number of differentially expressed probes used as a distance metric in a hierarchical clustering analysis (Supplementary Material, Fig. S2). The relations between tested tissues based on the number of ubiquitously detected transcripts with differential expression levels are mostly concordant with relationships determined based on differentially detected transcripts (previous section) and with relationships known from human studies (20). The only exception to this observation is the clustering together of occipital pole and pulvinar patterns, while hippocampus unexpectedly localizes closer to frontal cortex than does occipital cortex (Supplementary Material, Fig. S2B). Unlike the tissue grouping based on extreme differences in detection count, this tissue clustering is based on a group of ubiquitously detected transcripts that is clearly depleted of transcripts showing the most extreme differences in expression between tissues, such as those whose expression is restricted to a specific tissue.
Exclusion of the tissue-specific transcripts generates unexpected topography in the tissue dendrogram, possibly indicating that these transcripts determine important regional features to a greater extent than ubiquitously expressed transcripts. Nevertheless, the ubiquitously expressed transcripts still may suggest important aspects of the biology of different brain regions; as many as 1474 ubiquitously expressed genes show increased or decreased expression levels in specific brain regions, based on pairwise comparisons of mean expression values between tissues (Supplementary Material, Fig. S2A). Consistent with the suggestion from the detection based distance measure, the frontal cortical region shows smaller inter-tissue differences than other tissues tested. Additionally, a differential mean probe signal was observed for 90 probes between DLPFC and frontal pole, 92 probes between DLPFC and orbital frontal cortex, and 215 probes between frontal pole and orbital frontal cortex.
Among the 2430 ubiquitously expressed genes, we identified genes showing region—specific differential expression, and we determined functional categories that were under—and over-represented among these genes (Supplementary Material, Table S2). Consistent with the results of the differential detection measures, cerebellar vermis, and head of caudate show the largest differences from other tissues in the number of genes showing differential mean expression levels (790 and 383, respectively). Genes differentially expressed between cerebellar vermis and all other tissues are associated with two interrelated functions, metabolic processes of nucleic acids and mRNA transcription. These two biological processes are significantly over-represented (6.4E−11, 7.06E−03, respectively) among 533 up-regulated genes and under-represented (1.00E−05, 3.84E−02) among 257 down-regulated genes, which may indicate distinctive transcriptional regulation mechanisms acting in this brain region. In head of caudate, a group of 244 up-regulated genes includes structural proteins of small and large units of cytoplasmatic (21) and mitochondrial (5) ribosomes. As a result, among the genes preferentially expressed in this tissue, molecular function of ribosomal proteins is significantly over-represented (2.16E−04), suggesting that specific protein synthesis mechanisms and regulation may be characteristic of this brain region. Among genes down-regulated (139) in head of caudate, there is an over-representation of kinases (4.05E−02).
Among 2481 transcripts that are ubiquitously detected in brain and blood tissues, 2430 also show differential expression between tissues. This group of 2430 transcripts does not include transcripts showing the most extreme differences in cross-tissue expression patterns, such as transcripts expressed exclusively on a single region. A large number of such differentially expressed transcripts (1474) are still widely detected across brain and blood tissues. This set of genes is clearly depleted of genes related to many brain-specific functions including signal transduction or neurogenesis, while it is enriched (with P-values less than 0.05) in genes related to the ubiquitin proteasome system, Parkinson disease (e.g. SNCA, synuclein-alpha) and Ras pathways as well as genes involved in the various biological processes and molecular functions related to mRNA and protein metabolism (Supplementary Material, Table S3).
Over representation of genes involved in basic biological processes is consistent with a high representation of housekeeping (HK) genes which, by definition, are widely expressed in numerous human tissues and are involved in maintenance of basal cellular functions. More than 44% of recognized HK genes (255/575) are ubiquitously detected in vervet brain and blood tissues. Among these, HK genes are genes widely used in quantitative RT–PCR as control genes whose expression is assumed to be constant among samples (21,22). It is therefore noteworthy that from the genes utilized as endogenous controls, ARHGDIA, POLR2A and RPS18 showed differential regional expression and therefore do not suit the purpose of tissue-to-tissue normalization. As expected, this group of HK genes is enriched for genes involved in maintenance of constitutive functions such as protein and mRNA metabolism, ribosomal activity, energy release and cytoskeletal regulation (Supplementary Material, Table S4).
Here among the genes broadly expressed in brain and blood tissues, we detected a considerable number of ubiquitous transcripts that were previously reported as stable across tissues HK genes. Analysis of blood and high-quality brain tissues from precisely dissected regions revealed that levels of almost all these ubiquitous transcripts vary between at least one pair of tested tissues in the current study.
To identify probes showing correlated expression profiles in brain and blood tissues, we estimated, for each brain region, the Spearman rank correlation (SRC) between the paired brain and blood expression measures. Additionally, to identify probes with greater inter-individual variation than intra-individual variation, we used a variance components approach to estimate the percent variation (PV) attributable to the inter-individual component and within monkey component (between tissues). Both measures (SRC and PV) for the whole brain and blood data set are available on the Integrated Vervet Monkey Genomics website: http://genomequebec.mcgill.ca/compgen/submit_db/vervet_web, which enables searches for similarities of expression profiles between eight brain regions and blood for specific genes. Examples of brain–blood expression patterns are shown in Supplementary Material, Figure S3, for highly correlated profiles (left column), moderately correlated profiles (middle column) and poorly correlated profiles (right column).
Among the 2481 widely detected probes, 825 show PV >0.55 and SRC >0.55. For the 2481 ubiquitously detected probes, the number of brain regions with PV >0.55 or with SRC >0.55 was skewed in favor of excluding a majority of probes and varied from zero brain regions (PV: n = 1329, SRC: n = 1574) to all eight brain regions (PV: n = 52; SRC: n = 28). The probes exceeding these PV and SRC thresholds for each brain region are shown in Supplementary Material, Table S5. Correlated expression patterns and inter-individual variation (according to the above criteria) in all eight brain regions were attributable to 23 genes: SPOCK2, HSBP1,CLN3, CIRBP, ANXA11, PNKP, CCT2, RPS20, GGA2, SCO1, CCDC115, DDOST, DUSP11, MRPL51, DMTF1, RPL31, RPL35A, LARP5, SS18L2, TUBA1B, C9orf114, SRF, MRPS17. Even though these genes displayed correlated patterns across all brain tissues, ten of them showed regionally increased or decreased expression levels, for example, three-ribosomal genes RPS20, MRPL51 and RPL35A are up-regulated in the head of the caudate in comparison to all other brain tissues. Supplementary Material, Table S6 shows the number of probes that met PV and SRC criteria for comparison between blood and each brain region. Both methods consistently identified hippocampus as the brain region most dissimilar to peripheral blood with regard to transcriptional profiles, but did not show considerable variation among other brain tissues. Probes that have passed the PV and SRC thresholds, for at least one brain region, are probes that show similar profiles in brain and blood expression and have more inter-monkey variation than intra-monkey variation. Therefore, these selected probes can be further studied in blood samples to identify brain gene expression traits.
For investigating specific regional brain functions, genes expressed only in one or a few brain regions only are of interest. Such regional specificity, however, is likely to predict lack of expression of transcripts in peripheral tissues and therefore the impossibility of using correlated brain and blood expression patterns to guide studies of such transcripts in peripheral blood. We verified that only three transcripts in our data set (LHX1, PPP1R1B and RGS9) meet both tissue-specific detection and brain–blood correlation criteria.
To assess the reproducibility of gene expression profiling, which could be affected by either technical variation or transcript level stability over time, we used expression data set two, derived from 18 monkeys each sampled twice for blood. From the initial list of 22 184 probes, we identified 1880 probes that were detected in peripheral blood in all 36 replicate samples from 18 monkeys. We required detection in all replicates in order to identify the most reliable transcripts for future eQTL mapping in blood samples. The group of 1839 genes represented by these1880 probes is greatly enriched for genes involved in protein and mRNA metabolism and various signaling pathways, as well as pathways characteristic of peripheral blood such as lymphocyte activation and inflammation (Supplementary Material, Table S7). Among the most overrepresented biological processes in this set of genes are those implicated in oxidative phosphorylation, apoptosis, immunity and defense, and cell cycle, structure and motility. Nucleic acid binding, and ribosomal and cytoskeletal functions are the molecular functions most enriched among these genes.
To determine the biological reproducibility over time of the gene expression profiles in these 1880 probes, we assessed the within monkey versus between monkey (between duplicate samplings) variance, including sex as a fixed effect in the model. Examples of the similarity in expression signal between replicate samples is shown in Supplementary Material, Figure S4, for highly correlated replicates (left column), moderately correlated replicates (middle column) and poorly correlated replicates (right column). There were significant differences (at the 0.05 level, uncorrected for multiple testing) between males and females in blood expression data for 238 of the 1880 probes that passed the detection threshold.
For each of the 18 vervets with duplicate blood expression measures, correlation of duplicate measures across all probes was at least 0.9 (range across the 18 vervets was 0.89–0.99). To identify transcripts with stable levels in blood but differentially expressed between monkeys, we focused on transcripts which had more variation between monkey than within monkey (temporal transcript variation in blood and technical reproducibility) as defined by the percent of total variance attributable to the within monkey component PV>55%. Among the 1880 probes that passed the detection threshold in peripheral blood replicate samples, there were 134 probes (representing 133 genes) with PV>0.55, indicating that for these probes the majority of the total variation in probe signal was between monkey rather than within monkey (between replicates over time). Both high inter-individual variation and intra-individual reproducibility make these selected genes suitable candidates for genetic mapping of their eQTL.
We merged results from data sets one and two to select expression traits for future eQTL mapping. We examined the 2481 probes that passed detection thresholds in brain and blood expression data set one to identify probes that passed PV and SRC thresholds for both brain–blood similarity (825) and the PV threshold for biological reproducibility (Fig. 3A) from data set two (130). We identified 53 of 2491 probes that met both these criteria, i.e. having correlated expression profiles in brain and blood (for at least one brain region) and showing more inter-individual variation than intra-individual (between tissues) variation. Next, we limited the list of probes to the probes that meet the detection threshold (36 measurements) for the biological reproducibility data set (Fig. 3B). The reduced subset of 32 probes passing all variation, reproducibility and detection thresholds is presented in Table 2.
Significant correlation and PV of the TUBA1B transcript and also PV of additional transcripts (BAT1, C19orf62, EIF1 and SUV420H1) was observed between all brain regions and blood, suggesting a common regulatory mechanism acting across brain tissues. Region-specific correlation of brain expression with blood was observed in caudate (EIF1), cerebellar vermis (SMOX and TSPAN14), DLPFC (SLC25A23), frontal pole (RAB5A), hippocampus (STOM) and orbital frontal cortex (ERAL1 and TFE3); no such correlation was observed specifically for occipital pole and pulvinar.
Using transcript levels measured in the peripheral blood of 347 individuals (tissue set three) from the extended vervet pedigree with known and genetically confirmed structure, we estimated the heritability of the 32 selected transcripts (Fig. 3B, Table 2). Twenty-nine of these expression traits showed heritability at a significance level less than 0.05 and 25 transcripts showed heritability at P < 0.001; 62.5% of these traits displayed heritability estimates of ≥0.4. The generally high heritability of these transcripts suggests that selecting transcripts whose inter-individual variation in transcript levels is greater than their intra-individual variation, identifies transcripts whose regulation has a strong genetic component. Evidence consistent with this hypothesis is provided by much lower estimates of heritability in a comparison set of 32 transcripts chosen randomly from the data set (data not shown).
Polymorphisms located within a probe sequence may cause differential hybridization which mimics differential expression results and lead to inaccurate estimates of the heritability of levels of particular transcripts. To assess the effects of such polymorphisms on our results, we sequenced probes for 16 transcripts showing significant heritability in the vervet monkey pedigree. Ten of these probes were monomorphic. For the six probes in which we detected SNPs (TUBA1B, TMED3, BAT1, CDKN1A, C19orf62 and SMOX), we compared expression level measures between different SNP genotype classes to observe possible correlations between signal intensity and genotype. Four of these six probes showed marked correlation between signal intensity and genotype, raising the possibility that probe hybridization properties rather than differential gene regulation are responsible for observed inter-individual variation in these transcripts. A probe for the SMOX transcript did not show such signal intensity–genotype correlation, most likely due to the low frequency of the minor allele. The probe for the CDKN1A transcript showed consistent differences between genotypes in seven tested tissues but not in hippocampus and pulvinar. This observation suggests that the CDKN1A probe is sensitive to transcript level, but confirmation of this interpretation will require use of an alternative gene expression assay such as quantitative real-time PCR.
Fourteen of the heritable transcripts showed differential expression levels across brain regions. This group includes transcripts specifically elevated in cerebellar vermis (5), head of caudate (3) and hippocampus (1). It will be of interest to examine further the genetic determinants of regional gene expression that may be involved in specific tissue functions.
We are developing genome-wide gene expression resources for the vervet that permit investigation of variation between different tissues within an individual or between individuals. Genome-wide gene expression studies may provide comprehensive insight into gene activities and biological pathways differing between various tissues and individuals. Brain regional differences in gene expression levels relate to specific functions of brain tissues including disease symptoms distinctively affecting specific brain regions, while variation in brain gene expression profiles among individuals indicates possible genetic factors regulating gene transcript levels. Our research addressed both kinds of gene expression variation—between brain regions and between individuals—by expression profiling in brain tissues derived from eight brain regions and blood from 12 vervet monkeys.
We employed the vervet monkey as a non-human primate model to assure precisely dissected high-quality brain tissues, which are difficult to obtain from human subjects. Gene expression profiles from distinct vervet brain regions are generally comparable to profiles of equivalent human brain regions, as indicated by the similarity of the hierarchical structure of tissue dendrograms (20). On the basis the observed overall conservation of gene expression profiles between vervet and human brain regions, we expect that identification of genes that are preferentially expressed or that show exclusive expression in specific vervet brain regions may provide a means to identify genes with region-specific functions in humans. Several of these genes are already known to be involved in region-specific functions, for example the caudate-specific transcript RGS9, which is inversely correlated with striatal dopamine metabolism (23).
The primary focus of our study was to identify correlated brain and peripheral blood transcriptional profiles that could be reproducibly measured over time. We selected stable blood biomarkers of brain gene expression that may warrant further investigation of expression eQTL differentially regulating transcript levels in the brain across individuals. Our approach to select promising candidates for mapping brain eQTL using blood as a surrogate tissue included very restrictive criteria for probe filtering. We focused on transcripts with both ubiquitous tissues expression and with much higher inter-individual than intra-individual variation, which resulted in selecting a relatively small number of candidate transcripts (32). However, both the high proportion of heritable transcripts in this selected set (90%) and their high estimated heritabilities demonstrate the utility of this approach for identifying transcripts whose variation has a strong genetic component. These heritable expression phenotypes can be genetically mapped using the VRC pedigree, which has proven to be an efficient tool for linkage studies (24,25). It is hypothesized that heritable variations in transcript levels are more directly related to underlying genetic variation than many other nervous system traits, and therefore these molecular phenotypes are attractive candidates for genetic mapping. Indeed several studies using recombinant inbred mouse strains have clearly demonstrated the utility of eQTL studies focused on brain tissues for mapping of complex traits related to these tissue (27–30).
We consider that the 32 transcripts highlighted by this study provide an initial set of candidate transcripts for brain-related eQTL mapping using peripheral blood measurements. It is likely that further investigations—using larger samples and more powerful technologies for measuring gene expression—will identify a much larger set of transcripts suitable for eQTL analyses. We anticipate, for example, that such investigations may incorporate a substantial number of transcripts for which we found evidence suggestive of brain–blood correlation in expression levels but that we now exclude due to the stringency of our selection criteria. We further emphasize that the use of such rigorous criteria currently precludes us from drawing inferences regarding the overall degree of such brain–blood correlations among the transcripts assessed in this study.
By applying rigorous inclusion criteria, we identified transcripts whose stable profiles in blood also reflect their profiles in brain. These transcripts therefore provide easily accessible biomarkers of brain gene expression variation. The substantial known similarities between human and vervet in regional brain expression suggest that the transcripts identified here as stably correlated between brain and blood in vervet may be similarly informative in human studies that assay inter-individual variation of expression in peripheral blood. Information from this type of investigation of vervets or other non-human primates could be very valuable, given that reliable brain–blood gene expression comparisons in human subjects are limited by the availability and quality of matched blood and brain tissues.
Studies of expression quantitative traits using brain–blood expression profile correlation in the vervet model also have wider implications for understanding complex human neurobehavioral traits. Several quantitative variables that are relevant to such traits are the focus of active genetic investigation in the vervet, including variation in neuroanatomic features (26), neurochemistry (24) and disease-related behaviors such as impulsivity (27). The opportunity to investigate these phenotypes—in conjunction with gene expression profiling—longitudinally is another important advantage of such a non-human primate model. The availability of such gene expression profiles may greatly facilitate the identification of the genetic variants underlying variation in such traits.
Two types of technical issues influence the interpretation of our results; the first issue concerns the complexity of the tissues analyzed and the second issue concerns the effect of possible incompatibilities between microarray probe sequences and the transcripts under investigation. Tissue samples derived from anatomically different brain regions, even when carefully dissected, show substantial complexity at a cellular level (28). The diversity of cell types in the brain tissues that we have investigated may influence the results presented here regarding gene expression across brain regions and peripheral blood. Different brain regions show different levels of cellular complexity (29). Such cellular heterogeneity may particularly strongly affect low abundance transcripts which are known to be important contributors to the specificity of brain-related functions (29). An alternative approach to studying overall regional brain expression is high resolution analysis of specific cells isolated from tissues using laser capture microdissection (30), although this approach is not currently practical for large-scale studies.
The sequence divergences between humans and vervets result in imperfect hybridization which may clearly lower transcript detection. However, our observation that most of the probes that were widely detected in vervet samples were also detected in a similar proportion of human brain samples, demonstrates the overall expression similarities between human and vervet brain and indicates that such false-negative results are unlikely to be a substantial issue in interpreting the findings reported here.
False-positive results may result from genetic variants in probe interaction sites, causing differential probe hybridization between individuals. Although commercial microarrays are designed to minimize the inclusion of known common sequence variants in such sites, several studies have shown that a substantial fraction of probes contain at least one SNP (31,32). In our study, it is not currently possible to estimate, on a genome-wide basis, the degree of vervet genetic variation in the sequences corresponding to human probes. Interpretation of gene expression profiling in the vervet will be enhanced in the near future using data generated by the Vervet Genome Sequencing Project (VGSP), which has been initiated recently. The VGSP is not only determining the vervet reference genomic sequence, but is also cataloging common vervet SNPs on a genome-wide basis.
Until such data are available, targeted resequencing is required to determine whether vervet genetic variation may be influencing particular expression results. We have shown that the majority of heritable transcripts that we have suggested as candidates for eQTL mapping, show true inter-individual variation in expression rather than hybridization artifacts. Although we identified genetic variation in more than a third of the probes that we sequenced, it is not evident that this proportion is substantially different than that observed in experiments where the array and the transcript both represent the same species. Rather, the potential for false-positive results reflects the intrinsic drawbacks of array-based approaches for assessing gene expression. This kind of false-positive effect may be enriched among transcripts that show correlation between blood and all brain regions, whereas correlations observed between blood and a single brain region or only few brain regions are less likely to reflect the effect of a SNP on probe hybridization affinity.
As in other species, we anticipate that investigation of gene expression in the vervet will evolve from array-based to more sensitive and specific sequencing-based approaches. It is likely that we will be able to detect a much larger number of transcripts for which the pattern of expression in blood and brain can be correlated, and therefore which will be suitable candidates for genetic mapping of eQTL relevant to brain. The high-quality tissue and RNA resource that we have established will be extremely valuable for mapping brain expression traits using blood as a surrogate tissue.
Tissue specimens were collected from vervet monkeys that were related members of the VRC pedigree. Three tissue sets were collected within the VRC. Set one consisted of brain and blood samples collected from 12 monkeys. Set two consisted of replicate blood samples collected from 18 monkeys. Set three consisted of blood samples collected from vervets over 2 years old from the entire pedigree, n = 347.
Set one sampled for comparison of gene expression between brain and peripheral blood, was collected from 12 male vervets, aged ~3 years at the time of sample collection. The average kinship coefficient among the 12 monkeys represented in tissue set one was 0.025 (min=0.0035, max=0.154). These subjects were euthanized by ketamine (10 mg/kg, intramuscular), followed by overdose with sodium pentobarbital (30–60 mg/kg, intravenous). Circulatory perfusion with cooled isotonic saline was achieved by cardiac cannulation; after ~15 min of perfusion, the calvarium was opened, the brain was removed and coronally blocked and samples were microdissected on a thermostatic platform held at 4–6°C. Tissue was collected from eight brain regions: cerebellar vermis, pulvinar nucleus, head of the caudate, hippocampus, frontal pole, DLPFC, orbital frontal cortex and occipital pole. Localization of sampled regions is shown in Supplementary Material, Figure S1. A sample of 20–180 mg of tissue from each brain region was collected directly into RNAlater reagent (Ambion), immediately stabilizing the RNA profiles.
Set two, enabling gene expression analysis of biological replicate samples, was collected twice, at 20 weeks intervals, from the peripheral whole blood of 18 VRC animals; 12 males (5.5–7.8 years old, mean 6.4 years old) and six females (4.7–7.6 years old, mean 6.1 years old). For blood collection, animals were anesthetized with ketamine HCl (8–10 mg/kg, intramuscular) and blood was collected via femoral venipuncture. The average kinship coefficient among the 18 monkeys sampled twice was 0.026 (min=0, max=0.204).
Set three, enabling estimation of heritability of gene expression data, was obtained from peripheral blood drawn from 347 monkeys aged 2 years and older. For blood samples, 2.5 ml of peripheral blood was drawn from the femoral vein directly to a PaxGene RNA Blood tube (PreAnalyticX) containing a solution to preserve RNA integrity.
Total RNA from whole blood preserved in PaxGene RNA Blood tubes (PreAnalyticX) was extracted using PAXgene Blood RNA Kit (PreAnalyticX). Total RNA from RNAlater-preserved brain tissues was isolated with PefectPure RNA Cell and Tissue Kit (5 Prime) using 20–40 mg of brain tissue homogenized with rotor-stator (Omni International). Since the most important confounding factor affecting expression levels is total RNA quality, we evaluated the total RNA integrity of all of our tissue types (Supplementary Material, Table S8) using the Agilent 2100 Bioanalyzer with the RNA 6000 Nano Assay Kit (Agilent Technologies). Average RNA integrity number (RIN) for eight brain tissue types ranged from 7.5 ± 0.4 for head of caudate to 8.2 ± 0.5 for cerebellar vermis (Supplementary Material, Table S8). All RIN values were greater than 7. Observed RIN values are generally higher and less variable than those usually obtained from human post-mortem brain tissues. For example, Lipska et al. (33) reported RIN for human tissues of 5.7 ± 1.0 (hippocampus) and 6.7 ± 1.3 (DLPFC), whereas we obtained respective RIN values of 7.6 ± 0.4 and 7.5 ± 0.3 for the same tissue types. For blood samples, our average RIN was 9.1 ± 0.6. Total RNA sample concentrations were quantified with RiboGreen RNA (Invitrogen).
For assessing transcript levels, we used the Illumina HumanRef-8 v2 chip. This chip provides genome-wide transcriptional coverage of well-characterized genes. Low representation of splice isoforms on this chip do not, however, allow efficient assessment of splicing QTL (sQTL).
This chip uses 22 184 probes representing 18 189 unique human genes (or 20 424 unique transcripts) from Reference Sequence database1, Release 17. The Illumina gene expression platform utilizes long 50-mer gene-specific probes that provide both good selectivity and sensitivity (34). Although short oligonucleotide probes (32-mers or shorter) are very sensitive to probe-target mismatches such as these resulting from sequence variation between individuals, long cDNA probes are less influenced by such mismatches (35–39). We used intermediate size oligonucleotides expecting that these probes would sufficiently tolerate sequence incompatibilities between human probe sequence and vervet target transcripts and be more robust than shorter probes to possible allelo-specific differences in hybridization efficiency due to vervet-specific SNP variants occurring in probe-interaction sites (40).
cDNA was synthesized and in vitro transcribed into biotinylated cRNA using the Illumina Totalprep RNA amplification kit, following the manufacturer's instructions (Ambion). Labeled cRNA was hybridized to the HumanRef-8 version 2 (Illumina) gene expression bead-chip. A gene was called detectable by BeadStudio when the detection P-value was less than 0.01. The gene expression module of the BeadStudio software version 3.1 (Illumina) was used for initial data processing and background correction. Lumi software was also used to perform a variance-stabilizing transformation that takes advantage of the technical replicates available on every Illumina microarray (usually over 30 randomly distributed beads per probe), and subsequently performs robust spline normalization and quality control of gene expression measures (41). Expression profiles from all collected samples passed sample quality checks in Lumi, except for one pulvinar sample that was excluded from further analysis. Therefore, the total brain and blood sample set used for subsequent analysis consisted of 12 samples from blood and from all brain regions except the pulvinar, which was represented by 11 samples in this study.
For quality control purposes, gene expression data were filtered based on detection scores defined by Illumina (detection P < 0.01). We used two criteria to select probes for further analysis: (i) detection in all brain and blood samples from the 12 animals in the brain and blood data set and (ii) detection in all 36 samples from the biological replicate blood data set (two samples from each of 18 vervets). Although both steps narrowed down the number of transcripts analyzed to select candidates for brain eQTL mapping, they permit selection of transcripts consistently measured across the samples despite possible sequence incompatibilities in cross species hybridizations. By applying very stringent detection criteria, we excluded also many potentially interesting transcripts that were detected in some but not all brain regions. As such transcripts did not show sufficient correlation between the tissues of interest and blood, we chose not to examine them further through pedigree-wide expression profiling in blood.
For each brain region sampled in tissue set one, we used a mixed model ANOVA and a variance components approach to compare the blood expression levels to those in each of the eight brain regions. Each analysis included 12 paired expression measurements, one from blood and one from a given brain region, from each animal. The total variation in expression for any probe can be divided into the between monkey component and the within monkey (between blood and brain tissues within the same monkey) component. Probes where most of the variation is attributable to the between monkey component show more inter-monkey variation than intra-monkey variation, and therefore have less variation between blood and brain tissue than between monkeys. Prior to performing the variance components analysis, brain and blood measures were standardized to Z-scores separately for each tissue type by subtracting the mean and dividing by the standard deviation (both mean and standard deviation were determined across animals), as overall mean levels of expression may differ in brain tissues and blood. We also estimated the SRC between expression levels in blood and expression levels in each brain region.
Variance components analysis was also used in a similar manner on the biological replicates data set in tissue set two. The total variation in expression for any probe can be divided into the between monkey component and the within monkey (between replicates from the same monkey) component. Probes where most of the variation is attributable to the between monkey component show more inter-monkey variation than intra-monkey variation, and therefore have less variation between replicates than between monkeys. As the biological replicates analysis included both male and female monkeys, sex was included as a fixed effect in the model.
Probes that are differentially expressed in brain regions in tissue set one were determined using the limma package (42) of the Bioconductor project, a linear modeling approach that uses an empirical Bayes method in which the t-statistic standard errors have been moderated across probes. This procedure effectively borrows information from the ensemble of probes to aid with inference about each individual probe (43). The correlation of repeated measures from the same vervet was addressed using a version of a mixed model analysis that again borrows information across probes (42). All pairwise contrasts between the nine tissues (eight brain regions and blood) were assessed. The false-discovery rate procedure was used to control for multiple testing. Only probes that were detected in all twelve individuals in all available brain and blood samples were examined for differential expression.
The number of genes that were differentially expressed between a pair of tissues in tissue set one was used as a distance metric, and hierarchical clustering, using complete linkage (44), was applied to this distance matrix to cluster tissues. This clustering analysis used only probes that were detected in all 12 individuals in all brain and blood samples. Probes that were not detected in some tissues would not be used in the cluster analysis, yet these probes may provide interesting information on differences among tissues. Further comparisons of expression in different brain tissues were performed with all 22 184 probes, using, as a distance metric, the number of differentially detected probes in the two tissues. A probe was considered to be differentially detected in two tissues, if the number of animals in which the probe was detected (based on the above described Illumina detection criteria) in one tissue was less than two and the number of animals in the other tissue was nine or more. Probes detected in three to eight individuals in either of the two tissues being compared were not considered differentially detected.
Heritability of gene expression in blood of selected transcripts was estimated with data from 347 subjects from the extended vervet pedigree (tissue set three), using a variance component analysis method as implemented in SOLAR (45). Sex, age and sample grouping during microarray experiment (batch) were included as covariates.
Functional category enrichment analysis was performed using the Panther classification tool (http://www.pantherdb.org/) (46). Briefly, this tool maps the gene list of interest to the Panther ontology and compares to the selected reference gene list to identify over- and under-represented terms. The expected value is determined as the number of genes that are expected in the gene list for a given Panther category, based on the term incidence in the reference list. Total of 17 253 of 18 189 HumanRef-8 v2 chip genes were represented in the Panther ontology and we used them as a reference gene list for comparison with lists of selected genes. P-values of binomial statistic were obtained for Panther classification categories for each molecular function, biological process and pathway term. Bonferroni correction was applied to correct for multiple testing. Owing to sequence incompatibilities in cross-species hybridization, the lists of genes selected based on gene expression measures may be biased against rapidly evolving genes and therefore functional categories attributable to such genes may appear as underrepresented in our analysis.
Genomic DNA from 12 monkeys used for gene expression comparison between brain and blood was used to sequence probe interacting region in 16 heritable transcripts (Table 2). A standard sequencing procedure with BigDye Terminator v3.1 was used to sequence PCR amplicons.
Financial support was provided by NIH grants: PL1NS062410, R01RR16300, RL1-MH083270, P50-MH077248 and P40 RR19963. Funding to pay the Open Access publication charges for this article was provided by NIH grant R01RR016300.
Conflict of Interest statement. None declared.