|Home | About | Journals | Submit | Contact Us | Français|
Integrating quantitative proteomic and transcriptomic datasets promises valuable insights in unraveling the molecular mechanisms of the brain. We concentrate on recent studies using mass spectrometry and microarray data to investigate transcript and protein abundance in normal and diseased neural tissues. Highlighted are dual spatial maps of these molecules obtained using voxelation of the mouse brain. We demonstrate that the relationship between transcript and protein levels displays a specific anatomical distribution, with greatest fidelity in midline structures and the hypothalamus. Genes are also identified that have strong correlations between mRNA and protein abundance. In addition, transcriptomic and proteomic analysis of mouse models of Parkinson's disease are discussed.
The mammalian brain is a complex organ exhibiting a rich variety of gene expression patterns across a broad range of cell types. In-depth multimodal study is required to understand this complex network of cells and their associated transcripts and proteins. Recent advances in the quantitative detection of mRNA and proteins on a genomic scale permit localization of these gene products onto high-quality maps of the brain. Integrating this diverse quantitative information in a spatially resolved manner will be a potent tool in unraveling the molecular mechanisms of the brain, while shedding light on the etiology and pathology of disease. For reasons of accessibility and cost, the mouse (Mus musculus) is commonly used for these studies.
A number of methods are currently being employed to map and visualize gene products in the mouse brain. A recent large-scale effort by the Allen Institute for Brain Science utilized in situ hybridization (ISH) to localize transcripts on sections of the mouse brain . This approach produced single-cell-resolution images of gene expression patterns in a quantitative fashion for the entire genome. Although advances in automation have increased the throughput of ISH, this commonly employed and informative method can still be laborious and costly. Sample variability also remains a concern. Quantitative comparison of expression between genes may be problematic, since mRNA levels for each gene are not measured in the same sample. Furthermore, anatomical analysis of normal and disease models would be impractical using ISH, as the interrogation is serial in nature and the cost and labor increases linearly with the number of transcripts evaluated.
The Gene Expression Nervous System Atlas (GENSAT) visualizes gene expression in the brain using large stretches (~200 kb) of the mouse genome cloned into bacterial artificial chromosomes. These clones are sufficiently large enough to cover the coding regions of genes along with many of their regulatory elements. A gene coding sequence in a bacterial artificial chromosome is replaced with enhanced green-fluorescent protein  and injected into mouse eggs, creating transgenic mice. Patterns of neural gene expression are then traced by localizing fluorescence in tissue sections. The process is also performed serially and is highly labor intensive. In addition, the approach does not work well with large genes (>250 kb).
High-throughput localization of protein abundance in the brain is more difficult than transcripts. This is largely owing to proteins displaying diverse chemical properties that cannot be manipulated using the generic approaches useful with nucleic acids. In principle, antibodies can be used to generate protein-abundance maps. However, this would mean producing antibodies to all known proteins - a daunting task.
Mass spectrometry (MS), an analytical technique used to identify chemical properties of unknown compounds using separation by ionization, shows much promise in elucidating regulatory mechanisms of protein abundance . This method can be applied to dissected brain regions, providing crude spatial maps. An interesting approach giving higher-resolution spatial maps uses MALDI MS . A fresh frozen brain section is mounted on a stainless-steel target plate, which is then coated with a matrix solution that aids in energy absorption from a laser beam while protecting the proteins. The laser beam rasters across a region of interest, vaporizing peptides and proteins that can be detected and quantified using a mass spectrometer. By collecting this information at each position across the section, 2D images of peptide localization can be reconstructed at a resolution that is only limited by the number of data points that are collected (typically 50 μm between spots). The identity of the detected peptides can be determined using automated protein database searching. As well as in the brain, the technique has been used in the relatively rapid identification of cancer-specific markers in dissected tumors .
Significant advantages of this MS imaging approach include its unbiased nature, high multiplexing capability and ability to detect high-molecular-weight polypeptides of up to 300 kDa in size. However, data acquisition using this approach can be time-consuming. Variability in sample preparation combining the matrix and the analyte and inconsistency in laser-impact angle can also lead to preferential ionization of soluble proteins and inconsistent results. As is the case for many imaging modalities, detection sensitivity comes at the expense of resolution.
Another drawback of MALDI MS tissue imaging is that it is less sensitive in detecting noncovalent interactions  between proteins than electrospray ionization, a flow-based method often paired with MS for protein detection. Electrospray ionization is not used for imaging purposes but may serve as a complementary measure of protein composition in dissected brain regions. Standard methods, including immunohistochemistry and immunofluorescence, can be used to validate MS tissue imaging findings at single-cell resolution.
Automated, multidimensional fluorescence microscopy using multiepitope-ligand cartography (MELC) is a recently heralded technique to generate spatial maps of protein localization . The system incorporates cycles of affinity agent-based fluorescence tagging, imaging and bleaching in situ, and is capable of localizing hundreds of proteins in tissue sections with cellular resolution. Interestingly, hierarchical clusters of protein colocalization and organization (toponome) can be established using this method. However, this method requires libraries of affinity-based recognition agents, such as monoclonal antibodies, limiting the discovery of new proteins. In addition, the collected data are robust but binary rather than quantitative in nature.
Another approach giving dual transcript and protein maps in the brain is voxelation. This discovery-driven approach involves dividing the mouse brain into spatially registered voxels or cubes. Analyses of the voxels using microarrays or MS allows the reconstruction of spatial images with quantitative information on transcripts or proteins, respectively, in parallel.
A recent study voxelated a coronal section (bregma = 0.02 mm; interaural = 3.82 mm) from adult C57BL/6J mouse brains at a resolution of 1 mm . For each replicate, voxels from approximately 20 mice were pooled to obtain sufficient mRNA for a linear-labeling reaction that was subsequently applied to microarrays. Three biological replicates were obtained and gene expression levels measured using custom cDNA microarrays incorporating a dye swap. This study allowed reconstruction of 20,000 2D images of gene expression.
A number of analyses confirmed that the voxelation data was of good quality. These included good concordance between replicates, a high degree of congruence between the left and right hemispheres, good agreement with quantitative reverse transcription PCR voxelation for selected genes, and strong spatial and quantitative concordance of the voxelation dataset with ISH-derived images from the Allen Brain Atlas.
A clustering analysis of gene expression patterns from the mouse voxelation data revealed four distinct groups of genes corresponding to different mapping patterns. The first cluster consisted of genes expressed in the cortex, the second cluster contained genes showing a shallow dorsal/ventral gradient, the third cluster showed genes expressed in the hypothalamus, while the fourth cluster revealed genes with expression in the striatum and corpus callosum. Although the resolution of this method is not sufficient to localize gene products at a cellular level, clustering algorithms can provide the means to detect unanticipated patterns of regionally specific gene expression in the voxelation data. Interestingly, the cluster of genes expressed in a dorsal/ventral gradient was not restricted to a known anatomical distribution. One of the genes in the cluster, CBS, was implicated previously in dorsal neural tube defects in humans. These observations suggest that the cluster may be involved in specifying the dorsal/ventral axis of the mammalian brain.
A parallel study with the same brain coordinates used voxelation at a resolution of 1 mm in combination with capillary liquid chromatography (LC) Fourier transform ion cyclotron resonance (FTICR) MS to create spatial maps of relative protein abundance [9,10]. LC-FTICR MS is a powerful method for determining the proteomic differences between normal brains and disease models and makes integration of information on transcript and protein abundance practical and feasible. In this investigation, trypsin catalyzed C-terminal isotopic labeling of peptides with 16O and 18O isotopes was used to quantify relative protein levels. FITCR MS offers superior sensitivity, mass-measurement accuracy and dynamic range compared with other MS approaches, albeit at higher cost and operational complexity.
The study successfully quantitated more than 1000 relative peptide abundances. To validate the observed data, protein abundance images were compared with existing mRNA expression data from the Allen Brain Atlas and GENSAT. Although there was not a simple direct correspondence between mRNA and protein levels, good agreement was found between the datasets, lending credence to this high-throughput proteomics methodology.
Integrating data from the aforementioned voxelation-based mRNA and protein-abundance maps provided the opportunity to examine the relationship between these two molecular domains across a coronal section of the brain. Permutation testing revealed significant correlation between relative transcript and protein levels . Examples of agreement between ISH images from the Allen Atlas and voxelation maps for both microarrays and MS are shown in Figure 1.
To create a spatial map of the fidelity between transcript and protein levels across the coronal section, we used the Pearson correlation coefficient to evaluate concordance between relative transcript levels and protein abundances (Figure 2A). The p-value was determined empirically for each voxel by resampling the protein level data and the mRNA-expression level data 100,000 times to create a null distribution of r-values. The proportion of the null distribution that was more extreme than the actual observed r-value was taken as the p-value. Significance is shown as -log10 p-value corrected for false-discovery rates .
The central axis of the section showed the highest correlation between mRNA and protein levels, with pronounced bilateral similarity in the hypothalamus. This pattern of transcript/protein correlation may reflect the greater nuclear heterogeneity of the hypothalamus. These findings are consistent with a recent study in 3-month postnatal mice showing high levels of correlation between transcript and protein levels across different individuals using Affymetrix microarrays and 2D gel electrophoresis, respectively . This work also used MELC to show high stability of protein levels in the stratum pyrimidale and stratum radiale of the hippocampus within samples and between individuals.
Conversely, we looked at the correlation of transcript and protein levels across all 71 voxels of the coronal slice for each gene. Figure 2B represents a histogram of the -log10 (p-values) of the Pearson correlation for each gene across all voxels across the 2D slice. Again, p-values were determined empirically for each gene by resampling the protein levels and mRNA gene-expression levels 10,000 times to obtain a null distribution of the r-test statistic. The p-values represent the proportion of null r-values more extreme than the observed.
Table 1 shows genes with high levels of correlation between mRNA and protein level across the coronal slice (corrected p ≤ 0.0068). These may represent genes with little translational control. By contrast, for less well-correlated genes, transcriptional and post-translational regulatory mechanisms, including variations in molecular stability, may contribute to discrepancies between detected mRNA and protein levels. A recent study integrating transcriptome and proteome data from mouse brain tissue at two different embryonic stages showed good concordance between the two datasets when compared at the level of expression ratios (E13.5 vs E9.5) of significant differentially expressed proteins and genes . However, absolute expression values showed a reduced correlation that could be attributed to ambiguities in the classification of genes that give rise to multiple transcripts and protein variants. Thus, alternative splicing in mammalian cells can add substance to transcriptome and proteome comparisons, and examining gene expression at the exon instead of the gene level could improve the quality of overlap analysis. Additionally, proteins that are more prominent in the toponomic hierarchy may correlate more highly with transcript levels than those that are lower.
Another strategy for spatially resolving neural gene and protein expression involves the transcriptomic and proteomic analysis of dissected brain regions. One recent study combined microarray analysis of the striatum with LC-FTICR MS .
Two different toxicological mouse models of Parkinson's disease (PD) were used, both of which recapitulate the hallmark loss of dopaminergic neurons in the substantia nigra pars compacta. These neurons project to the striatum, and their loss is thought to be responsible for the characteristic akinesia, rigidity and tremor of PD. The first model used 1-methyl-4-phenyl-1,2,3,6-tetra-hydro pyridine (MPTP) , while the second model employed toxic doses of methamphetamine (METH) .
Using Affymetrix 430A 2.0 microarrays to evaluate relative transcript changes between experimental and control samples, the MPTP-treated striata showed 34 significantly upregulated and 29 downregulated genes, while the METH-treated striata showed 51 significantly upregulated and 40 downregulated (all with false-discovery rate < 0.05). There was significant overlap between the two disease models, with 17 upregulated and ten downregulated genes in common (χ2 = 104; df = 1; p < 0.001). This reproducibility of relative expression levels extended to all genes (r = 0.718; p < 10-16) in addition to the significantly regulated genes.
The shared genes may represent a common response of the striatum to the loss of dopaminergic afferents induced by the two different neurotoxins, while the unique genes represent idiosyncratic responses to each of the drugs. The shared genes in the two PD models implicated pathways involved in oxidative stress, mitochondrial dysfunction and proapoptotic activity. By contrast, Cyc1 and Gapdh were found with increased abundance in only the MPTP-treated mice, suggesting higher levels of cell death and oxidative damage in this model.
The MS investigation used 16O/18O labeling in conjunction with LC-FTICR MS. The analysis identified approximately 4600 unique peptides corresponding to 1614 proteins. There was good reproducibility between relative protein abundances in the biological replicates (r = 0.94 ± 0.02 within METH and MPTP models). There was also a high degree of similarity between the METH and MPTP models, with a Pearson correlation coefficient of r = 0.89 ± 0.03, suggesting that common pathways may be utilized by the two neurotoxins. The METH and MPTP models identified 149 and 199 proteins (Student's t-test) respectively, with log2 ratios significantly different from zero (p < 0.05). After further requiring a greater than 25% relative change in order to reduce false-positive results, 86 of these proteins were in common between both lists.
Only two proteins, GFAP and GPX4, showed significant relative changes at both the transcriptomic and proteomic levels in each of the neurotoxin models. Similarly, there was no significant concordance between relative protein and transcript changes for either drug (MPTP: r = 0.034, p = 0.333; METH: r = 0.042, p = 0.216). It is possible that neither arrays nor MS offers sufficient reproducibility to reliably quantitate comparatively small relative changes (<30%). Translational and post-translational regulation may also complicate the relationship of relative values between transcripts and proteins.
By contrast, comparing absolute intensity for transcript abundances across all experiments with absolute protein levels, (estimated using spectral counts) revealed a significant positive relationship (r = 0.2889; p < 10-21) . These results indicate that absolute transcript abundance can predict absolute protein abundance, even in the presence of post-translational regulation.
Based on the availability of both absolute protein and mRNA levels, it was possible to examine whether codon usage affected translational efficiency. Proteins were divided into two groups, efficiently and inefficiently translated, depending on their position in the regression scatterplot relating absolute transcript and protein levels in the striatum. Figure 3 plots the percent usage difference between the efficiently and inefficiently translated proteins in the striatum for each codon, with codons ranked from most to least common. As expected, the data showed that proteins with frequent codons tended to be efficiently translated and vice versa. As transcriptome and proteome analyses improve in sensitivity and reliability, analyses such as these can give insights into the regulation of molecular and translational processes in the cell.
Developing technology for high-throughput detection and quantification of proteins presents greater challenges than for transcripts. Each method for spatial mapping of proteins and transcripts in the brain offers compelling information but also suffers from significant drawbacks. Integrating data from complementary imaging modalities can offer a more comprehensive view of the brain. ISH is expensive and time-consuming but offers a single-cell-resolution view of transcript expression. Similar strengths and weaknesses apply to immunohistochemistry for visualization of proteins with the additional expense of creating antibodies. Tissue imaging using MALDI provides good resolution (500 nm) but is slow in data acquisition. Technical problems with matrix and sample preparation, along with laser angle, can lead to less-reliable results. MELC is an innovative new imaging approach with cellular resolution and the power to determine toponomic hierarchies but protein localization data is binary. Voxelation allows high-throughput acquisition of both transcript and protein mapping data and provides intermediate resolution. For all methods, cost is the major limitation, although at different steps of the analysis. For example, the expense of capital equipment can be daunting in MS, while personnel and consumables offer the greater burden in ISH. Perhaps some judicious combination of the available methods will allow optimal in-depth study of the normal and diseased brain in the future.
Information from disease models using all mapping approaches is lacking. Databases integrating many modalities of imaging across diverse disease states at differing time points will produce a clearer picture and understanding of neural disease. Currently, these resources are independently maintained but combining their strengths can prove a potent tool for the modern biologist to develop novel hypotheses. Further efforts to integrate different sources of information must be initiated and implemented.
Using voxelation to combine gene expression data collected using microarrays with protein abundance data, (obtained using high-throughput LC-FITCR MS) will serve as a potent method to help unravel post-transcriptional and translational control in the brain. Integrating these complementary biological datasets will help identify brain regions where translational regulation varies, leading to novel biological hypotheses. These datasets will also offer the possibility of interrogating gene expression at various stages along the continuum of disease progression. Additional advances in transcript and protein detection will be required to decrease the size of voxels and improve resolution. Given the current rapid progress in transcriptomic profiling and MS, it seems probable that these improvements will be imminent.
Voxelation can be extended to chart gene expression in three dimensions. Reconstruction of these data combined with other 3D imaging modalities, such as MRI, will allow for a much clearer understanding of neural signaling at the transcript and protein level. Voxelation and precision laser microscopic dissection of neural cellular structures can also be paired with deep-sequencing technologies, which promise greatly improved sensitivity and robustness compared with current microarray platforms . Absolute rather than relative transcript abundance can be assessed while remaining nearly completely unbiased, allowing for detection of novel transcripts and spliceoforms without a priori knowledge of sequence information.
Proteomic approaches face a multitude of future challenges, but the field continues to evolve. Enhancements in sensitivity and dynamic range owing to better LC separations and MS instrumentation will offer increased coverage of complex proteomes. Improved bioinformatic algorithms will facilitate comparison of transcript abundance and protein levels across all imaging modalities, leading to more reliable datasets and further biological insights on a systems level. For instance, combining ISH transcript data from the Allen Brain Atlas with protein-localization data using MELC would be an intriguing prospect, as both methods provide cellular resolution. In addition, future work will incorporate functional information of genes and their co-localization in brain structures. Integrating these large datasets will lead to an increasingly systems-level view of neural biology. We can, therefore, expect forthcoming work in neurogenomics, and neuroproteomics will yield increasingly detailed signatures of disease, leading to improved characterization, diagnosis and treatment.
Financial & competing interests disclosure Portions of the research were supported by the NIH National Center for Research Resources (RR18522 to R.D.S.) and NIH grant R01 NS050148 to Desmond Smith. ISH images downloaded from the Allen Brain Atlas [Internet]. Seattle (WA, USA): Allen Institute for Brain Science. © 2008. Available from: www.brain-map.org. Proteomic analyses were performed in the Environmental Molecular Sciences Laboratory, a US Department of Energy (DOE) national scientific user facility located at the Pacific Northwest National Laboratory (PNNL) in Richland, DC, USA. PNNL is a multiprogram national laboratory operated by Battelle Memorial Institute for the DOE under Contract DE-AC05-76RL01830. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
Papers of special note have been highlighted as:
• of interest
•• of considerable interest