As the first stage of a genomic convergence analysis of SCZ, we sought gene expression changes between post-mortem cerebellar cortices of 14 patients and 6 controls using shotgun, clonal mRNA sequencing, together with a rigorous and systematic analysis approach. Cerebellum was chosen for study since it has been shown to be affected in SCZ: Patients exhibit impaired sensory-motor integration, reduced cerebellar volume and grey matter, increased cerebellar blood flow and glucose uptake, both at rest and following cognitive challenge, greatest approximate conditional likelihood score in cerebellar vermis, and gene expression changes in the lateral hemispheres of the cerebellar cortex consistent with decreased GABAergic and increased glutamatergic neurotransmission
[32],
[51],
[52],
[58]–
[66]. In addition, the simple organization and lack of perturbation by antipsychotic medication of cerebellar cortical circuits relative to other brain regions affected by SCZ was appealing for the current study. Specifically, granule cells, which are important in processing sensory information, are by far the most abundant cerebellar glutamatergic neuron, comprising the majority of neurons in the brain
[67].
Cases and controls were all male and were matched reasonably well for age, race, cause of death and RNA quality metrics. Since only cases had received psychotropic medication, this variable could not be matched. There is currently considerable excitement about stratification of patients with SCZ on the basis of clinical or neuroimaging endophenotypes
[20]. Unfortunately, that was not possible in this series.
Approximately 22 million random, cDNA sequence tags were generated from each of 20 samples, providing the deepest mRNA coverage reported to date and providing an unparalleled assessment of the transcriptional complexity of the human cerebellar cortex. Sequences did not align to 1879 viral genomes, offering negative evidence of chronic viral etiology for SCZ in these patients. 85% of annotated transcripts were expressed (based on molecular counting, ), of which 63% were detected at a level of at least 1-fold coverage (equivalent to 56 reads per million or 1.1 copies per million). This pervasive transcription accords with other recent studies
[32],
[44],
[56],
[68]. In contrast to 3′ end tag sequencing, random-primed mRNA sequencing provided good coverage of coding domains (), allowing assessment of exon utilization in cerebellar cortex. Alternative splicing is thought to generate more than 5 transcripts per human locus, most of which have not yet been annotated
[56]. 85% of loci expressed at a level of at least 1-fold coverage showed evidence for utilization of unannotated exons, based on excess genomic read alignments. Proline oxidase, for example, is a very well studied gene that appeared to utilize hitherto unannotated exons in human cerebellum (). For this reason, sequences were aligned both to a reference transcript database (for estimation of expression of well-annotated transcripts) and to the human genome (for estimation of total expression of all exons, whether annotated or not). With additional improvements in library preparation and analysis software it should be possible to measure digital gene expression on an exon-by-exon basis in a strand-specific manner
[31]–
[50],
[69]. The number of transcripts with aligned reads declined greater than exponentially as a function of level of expression (). As a result, 29 million aligned reads provided an impressive linear dynamic range (4.37×log10, ). Despite lower read lengths and quality scores in this study () than currently obtained with the Illumina GA II instrument, only 9% of reads mapped non-uniquely. Finally, technical imprecision of DTE measurements was paltry (CV 3.4%). In summary, our technical assessment indicated deep mRNA sequencing to be a powerful tool for digital gene expression and empiric annotation of transcribed elements in genomes, in accord with a rapidly growing literature
[31]–
[50],
[69].
Gratifyingly, digital transcript expression values, expressed as aligned read frequencies, were amenable to very similar data transformation, quality control, pattern discovery, row-by-row modeling and annotation analyses as array hybridization datasets. This enabled direct comparison of digital transcript expression and array hybridization data metrics (–). Like array hybridization signals, aligned read frequencies benefited from log transformation as evidenced by improved Mahalanobis distances, overlaid kernel density estimates, univariate distribution results, and unsupervised principal component analysis. Aligned read frequencies had greater correlation coefficients in pair wise sample comparisons than array hybridization, and much greater ability to distinguish SCZ cases from controls, as evaluated by unsupervised principal component analysis (PCA), Ward hierarchical clustering of Pearson product-moment correlations of read frequencies, and the magnitude of the component of variance attributable to diagnosis. DTE was clearly better than array hybridization for PCA-based segregation of cases and controls. Read frequencies based upon alignment to a reference transcript database and to the human genome were almost indistinguishable by these metrics. Partitioning of variance components of clinical, sample and experimental metadata revealed the largest components of array hybridization variability to be metrics of mRNA quality, in accord with recent studies of post-mortem brain tissue
[70]. DTE was much less influenced by RNA quality. Non-diagnosis DTE variability included year of sequence generation, sequencing instrument variation and cause of death (). This is not surprising since substantial improvements occurred in sequencing instrument specifications, reagents and base-calling software during the project. Given continued, rapid evolution of generation II sequencing technologies, it is important to keep instrument specification, reagents and base-calling software relatively stable during mRNA sequencing projects to minimize non-hypothesis-related variability. Importantly, variance decomposition allowed incorporation of large clinical, sample and experimental effects as fixed effects in analyses of variance, minimizing detection of gene expression changes associated with confounding variables. Incorporation of fixed effects in array hybridization, but not DTE, analyses of variance resulted in no expression changes meeting significance cutoffs. Coefficients of correlation between aligned read frequencies, adjusted for transcript length, and array hybridization signals were 0.35 for all values and 0.46 for fold-change in genes with significant differences in expression. These values are somewhat lower than similar comparisons between array hybridization platforms and are probably reflective of the exacting RNA source
[71]. It should be noted that aligned read frequencies correlate very well with quantitative PCR results
[57],
[72]. In conclusion, mRNA sequencing appears to be superior to array hybridization for gene expression analysis and is amenable to standard statistical procedures for quality assessment and gene expression analysis.
Expression of 215 genes differed significantly between cases and controls in post-mortem cerebellar cortex by analysis of variance (). Of these, 88 were identified in genome-aligned reads and 152 in transcript-aligned reads. Only 25 genes were common to both alignments, but 96% of the 215 had congruent direction of change and correlation between genome- and transcript-aligned fold-change was 0.73. Major contributors to disparity between alignments appeared to be detection of unannotated exons in genome alignments and forced mis-alignments in incomplete transcript datasets. The latter may largely be avoided by removal of non-unique alignments from calculations. As RefSeq transcript becomes more complete, this disparity should decrease. In the interim, it is wise to align both to a well-annotated transcript dataset and to the genome.
17 of the 215 genes with altered cerebellar expression have previously shown changes in SCZ in the dorsolateral prefrontal cortex or superior temporal gyrus (APBA2, BTG1, CACNG2, CAP2, DKFZP434A0131, GABRA1, GOLGA1, HSPBP1, KIAA0256, KPNA1, RPL3, SLC35A3, SLC39A7, SRRM2, TCF4, ZNF148 and ZNF195). Despite substantial differences in gene expression in cerebellar cortex and cerebral cortex
[73], 13 genes were congruent with the direction of change reported previously. Three genes involved in GABAergic neurotransmission (GABRA6, GABRB3 and SLC6A1) showed changes in cerebellar cortex in SCZ that agreed with a previous report
[52]. In the context of cerebellar cortical function in SCZ, alteration in expression of genes may be causal, consequential, compensatory, or the result of confounding factors. Because the substantial heritability of schizophrenia appears to be polygenic
[1]–
[5], expression differences might reflect SCZ-associated nucleotide variants that alter expression in cis (e.g. eSNPs). Indeed, seven of these genes have previously shown association with SCZ in genetic association studies (CACNG2, GABRA1, GPSN2, HIRA, PSAP, RANBP5 and TCF4)
[74]. Alternatively, expression differences may represent compensatory changes in pathways and networks. GO annotation of genes with altered expression revealed over-representation of membrane association, Zinc binding or transport, regulation of transcription, Golgi apparatus and vesicle-mediated transport.
Most striking were 23 genes involved in presynaptic vesicular transport / Golgi apparatus or post-synaptic neurotransmission, 15 of which were up-regulated and 8 down-regulated. Up-regulated genes included nine involved in transport from the trans-Golgi network to the synaptic vesicle (golgi autoantigen A1, UDP-N-acetylglucosamine transporter (SLC35A3), component of oligomeric golgi complex 6, thyroid hormone receptor interactor 11, adaptor-related protein complex 1G1, ADP-ribosylation factor guanine nucleotide-exchange factor 2, vesicle docking protein p115, Rho-associated, coiled-coil containing protein kinase 1 (ROCK1) and RAB9B), one involved in synaptic vesicle exocytosis (synaptotagmin I), one involved in clathrin-mediated endocytosis (AP2 associated kinase 1 (AAK1)), one involved in homotypic fusion of early endosomes (early endosome antigen 1 (EEA1), and one involved in retrograde transport to the Golgi apparatus (vacuolar protein sorting 35 (VPS35)). Down-regulated genes included two involved in transport from the trans-Golgi network to the synaptic vesicle (syntaxin 10 and ADP-ribosylation factor related protein 1), two involved with synaptic vesicle exocytosis (synaptic vesicle glycoprotein 2A and neurochondrin), two involved in transport from the endoplasmic reticulum to the Golgi (golgi phosphoprotein 2 and synaptic glycoprotein 2) and one involved in retrograde transport to the Golgi apparatus (sorting nexin 17) (, ). Up-regulated post-synaptic membrane genes included GABRA1 (see above) and CACNG2 (calcium channel, voltage-dependent, gamma subunit 2). Several other genes involved in GABAergic neurotransmission showed non-significant changes in expression in cerebellar cortex in SCZ, in agreement with previous reports
[52]. One down-regulated post-synaptic membrane gene was ligand-gated ion channel, zinc activated 1. Most of the up-regulated genes, but none of the down-regulated genes in this set, had change in expression of greater than 30%. The largest alterations were CACNG2 (+169%), EEA1 (+88%), ROCK1 (+85%), SLC35A3 (+69%), AAK1 (64%) and GABRA1 and VPS35 (both +52%). Several previous studies have shown decreased expression of genes involved in presynaptic vesicular transport in the dorsolateral prefrontal cortex and hippocampus in SCZ
[75]–
[78]. While the molecular determinants of synaptic vesicular transport have been intensively studied, it is not yet possible to integrate these transcriptional changes into a single biological outcome. Isolated elevation of AAK1, for example, should decrease neurotransmitter endocytosis
[79], while elevated EEA1 should increase early endosomal fusion
[80]. However, activity of most of these proteins is dependent either upon post-translational modification or presence of interacting proteins. Furthermore, there is likely to be heterogeneous expression of these proteins among cerebellar cortical neurons. Nevertheless, it appears clear that presynaptic vesicular transport, Golgi function and GABAergic neurotransmission are perturbed in cerebellar cortex in SCZ, in agreement with previous analyses of gene expression in post-mortem brain in SCZ
[28],
[52],
[75] and genetic associations of nucleotide variants in synaptic vesicular transport genes with SCZ (namely CACNG2, COG2, STX1A, SNAP29, MUTED, NRG1, PLDN, NRG3, HOMER3, RTN4R, NRG2, RTN4, CPLX2, BLOC1S1, SYNJ1, HOMER1, CLINT1, SYP, BLOC1S2, SYN3, SYT11, SYNGR1, SNAPAP, STX7, BLOC1S3, DISC1, SNAP25, DLG4, CPLX1, CNO, SYN2, DTNBP1 and DAOA)
[74]. Replication of alterations in presynaptic vesicular transport and Golgi function, particularly as affecting GABAergic neurotransmission, in additional cases or other brain regions–such as prefrontal cortex–would substantiate novel molecular mechanisms underpinning SCZ and establish novel targets for therapeutic intervention.
The current study reports the first phase of a genomic convergence analysis of SCZ, which is intended to integrate linkage and expression studies. In a subsequent manuscript, we will present results of expressed nucleotide variants that differ in allele frequency between cases and controls, and integrate nucleotide variant and expression analyses. In addition, a recent study has shown substantial variation in alternative splice isoform expression and alternative polyadenylation in cerebellar cortex between normal individuals
[57], and it will be interesting to ascertain whether specific examples of alternative isoform expression show association with SCZ. Confirmation of causal components of the molecular basis of SCZ is anticipated to have significant impact on clinical practice, particularly with respect to timely diagnosis and prognostic or therapeutic categorization
[12],
[19].