|Home | About | Journals | Submit | Contact Us | Français|
Numerous genome-wide association studies (GWAS) of schizophrenia have been published in the past six years, with a number of key reports published in the last year. The studies have evolved in scale from small individual samples to large collaborative endeavors. This review aims to critically assess whether the results have improved as the sample size and scale of genetic association studies has grown.
Genomic genotyping and increasing sample sizes for schizophrenia association studies has led to parallel increases in the number of risk genes discovered with high statistical confidence. Nearly 20 genes or loci have surpassed the genome-wide significance threshold (p = 5 × 10−8) in a single study, and several have been replicated in more than one GWAS.
Identifying the genetic underpinnings of complex diseases offers insight into the etiological mechanisms leading to manifestation of the disease. New and more effective treatments for schizophrenia are desperately needed, and the ability to target the relevant biological processes grows with our understanding of the genes involved. As the size of GWAS samples has increased, more genes have been identified with high confidence that have begun to provide insight into the etiological and pathophysiological foundations of this disorder.
The genetic basis of schizophrenia has long been known through its tendency to run in families, as well as through adoption studies and twin studies . The heritability (genetic contribution to the phenotypic variance observed) is generally accepted to be in the range of 64–81% [2, 3]. Studies attempting to identify causal genes have evolved concomitant with methodological progress. Initial cytogenetic investigations of large (3 MB or greater) chromosomal deletions, duplications, and rearrangements were followed by linkage studies to assess polymorphisms co-segregating with disease in families, albeit typically with sparse markers that only implicate large chromosomal regions. Candidate gene association studies have implicated myriad putative risk genes; however, these studies are hampered by their single gene focus, reliance on incomplete biological information, and limited gene annotation for much of the genome. Inconsistent results raise questions as to the validity of candidate gene findings, and while inadequate power is one possible reason for intermittent detection of a true genetic association, candidate genes have not been strongly supported by GWAS [4, 5].
It has been only six years since the first schizophrenia GWAS was reported . This short time span has witnessed great strides in understanding the genetic foundations of this disorder, with a number of key findings in the past year. The success of GWAS is often gauged by the number of genes discovered for a disease, but this is only one aspect of what this method has to offer. Associated genes can offer insight into the pathophysiological mechanisms giving rise to the disease phenotype and subsequently yield potential therapeutic targets. Even one GWAS finding leading to an effective treatment is arguably a resounding success. Additionally, while specifically associated single nucleotide polymorphisms (SNPs) generally explain only a small fraction of the genetic variation for a disorder, considering aggregated genetic variation demonstrating association below the typical threshold for genome-wide significance (p = 5 × 10−8)  can account for a much larger portion. For schizophrenia, 32–36% of the genetic liability can be explained in this manner [8, 9**]. These sub-threshold associations offer insight into the genetic architecture of complex genetic disorders, and in the case of schizophrenia, hundreds or even thousands of variants with low effect sizes appear to confer risk. Identification of these risk variants necessitates genome-wide genotyping and very large samples.
The ever growing number of subjects included in schizophrenia GWAS and the resultant findings invite a continuous process of evaluation. Several prior reviews specifically address or encompass schizophrenia genetics [10–14]. This review focuses on recent large GWAS and their results, and discusses the implications of robust genetic findings for our understanding of the etiological and pathophysiological changes giving rise to schizophrenia.
GWAS offer an unbiased assessment of variation across the entire genome, with the capacity to implicate specific variants in disease risk. A genotyped SNP demonstrating disease association may be causal or, more likely, is inherited with and serves as a marker for a causal genetic variant (i.e. in linkage disequilibrium). The genomic region implicated is far more circumscribed for GWAS than for linkage studies, and on a larger scale than candidate gene studies.
In contrast to monogenic disorders in which a single gene with complete or high penetrance causes the disease, the individual genetic effect sizes for nearly all complex genetic disorders are relatively small. The vast majority of genetic disorders are polygenic, involving multiple genes and multiple alleles within these genes. Targeted selection of candidate genes is hindered by incomplete genomic annotation, making the unbiased assessment of genome-wide variation an essential advance in the search for causal disease variants for complex disorders.
Within the GWAS framework, a few distinct study designs exist. Pooled genotyping of many DNA samples offers a cost benefit but diminished ability to detect rare alleles. This has given way to genotyping individual samples following waning genotyping costs and additional analytic possibilities involving haplotypes or epistasis. Individual genotyping also allows for the potential detection of rare copy number variants (CNVs; e.g., deletions, duplications).
Although it is tempting to assume that a high disease heritability offers easier gene-finding capacity, these are actually unrelated. High heritability indicates large contribution to the disease by genes, as opposed to environmental or other non-genetic factors, but does not dictate large effect sizes for individual genes. Diseases with similar heritability can differ dramatically in terms of genetic architecture, and the ability to discover risk variants is dependent upon their effect size, frequency in the population and, of course, the sample power. It is intuitively obvious that increasing sample sizes should enhance power to detect associations , and this has been shown for schizophrenia  as well as for other diseases and phenotypes. Compared to some traits, such as body mass index assessed in nearly 250,000 people with 42 identified loci  or height studied in over 180,000 subjects resulting in 180 significant loci , the current numbers of cases in schizophrenia GWAS are low (Table 1). Bolstering sample numbers is the surest way to reveal additional associations.
Recognizing that the number of subjects is the only determinant of sample power under experimental control, many consortia have emerged to combine samples to identify genetic variation contributing to a particular disorder. With respect to schizophrenia, the International Schizophrenia Consortium (ISC), Molecular Genetics of Schizophrenia (MGS), and SGENE are three consortia that have brought together sample collections from multiple sites and additionally collaborated with each other [9**, 23**, 24**]. The Schizophrenia Psychiatric GWAS Consortium (PGC) has been the largest collaboration for this disorder to date, with an initial collection of 21,856 individuals from 17 samples and a replication collection of 29,839 subjects from 19 samples [30**]. Future GWAS of even greater numbers of subjects have been planned.
GWAS on separate samples are often folded into collaborative efforts involving numerous merged samples. Meta-analysis (combining results) and mega-analysis (combining data) yield greater power to detect disease associations with the increased number of subjects. Merging samples, however, does have potential drawbacks. Mixing populations could dilute association signals if recombination has separated a causal variant from a genotyped marker in some of the populations. Similarly, while much of the known genomic variation is observed across many global populations, population-specific variation is substantial and associations could be masked in a mixed sample. Furthermore, the number of directly genotyped SNPs common to all studies may be very low, necessitating imputation of variants not genotyped using a reference panel. Since it is difficult to impute genetic variants located on the sex chromosomes and mitochondrial DNA, they are often removed from association analyses, potentially excluding relevant variation. Imputed SNPs significantly associated with disease should be validated by direct genotyping, adding to research cost and time. In addition to genetic concerns, differences in diagnostic assessments and ascertainment procedures across constituent sites can undercut power gains from multi-sample analyses. While these concerns should be taken into consideration, the benefits of combining samples certainly outweigh the possible detractions.
Nearly 20 GWAS of schizophrenia were published as of the end of 2011 (Table 1). Although most of the early studies seemed large at the time, they were underpowered to detect genome-wide associations when appropriately correcting for multiple tests, given the effect size of risk genes is generally below 1.2 - much lower than expected. A trio of publications by the ISC, SGENE, and MGS consortia in 2009 ushered in the initial genome-wide significant findings that implicated the major histocompatibility complex (MHC) region, transcription factor 4 (TCF4) and neurogranin (NRGN) [9**, 23**, 24**]. TCF4, involved in nervous system development, has twice demonstrated subsequent association [29*, 30**] and the 11q24.2 locus containing NRGN, a protein kinase substrate that binds calmodulin, was also significant in the in the PGC primary sample mega-analysis [30**]. Steinberg et al (2011) added vaccinia-related kinase 2 (VRK2), a widely-expressed protein kinase highly expressed in actively dividing cells, in addition to replicating the MHC region and TCF4 associations [29*]. The largest schizophrenia GWAS, reported by the PGC, identified five novel loci: 1p23.3 (MIR137), 2q32.3 (PCGEM1), 8p23.2 (CSMD1), 8q21.3 (MMP16), and 10q24.32-q24.33 (CNNM2/NT5C2) [30**]. Recently, two groups published the first significant hits for schizophrenia GWAS in non-Western populations. Specifically, Yue et al (2011) reported a novel association at 11p11.2 and replicated an association in the MHC region [31*], and Shi et al (2011) found two new loci at 8p12 and 1q24.2 [32*], both in Chinese samples. Consistent findings are now beginning to emerge across studies, but since many reports contain overlapping samples, the findings are not completely independent. Several large, rare CNVs have also been associated with schizophrenia, although they are only present in a small proportion of cases . A comprehensive overview of CNVs in schizophrenia is beyond the scope of this review, and has been thoroughly covered elsewhere [14, 34–38].
Although there is a small chance that SNPs exceeding a genome-wide significance p-value of 5 × 10−8 might be false positives (Type I error), the chance of false negatives (Type II error) is almost certain, with many legitimate causal loci likely falling short of this statistical threshold. For example, the initial GWAS report suggesting involvement of ZNF804A in schizophrenia yielded a p-value just shy of significance, 1.61 × 10−7 , and ultimately surpassed this threshold through meta-analysis of several samples [27*]. This illustrates one example of how larger samples may reveal significant association for tentatively implicated loci, and many more are sure to follow.
In addition to identifying specific genetic risk loci for complex genetic disorders, GWAS offer insights into the biological mechanisms giving rise to the disease. Age-related macular degeneration (AMD) is one such success story, in that GWAS identified several genes involved in complement-mediated inflammation , a pathway previously unsuspected in this disease and now targeted for therapeutics . Similarly, GWAS of inflammatory bowel disease (IBD) identified SNPs impacting the autophagy pathway and revealed this process as central to the etiology of this disease [41, 42].
Explorations of how GWAS-identified loci mediate schizophrenia risk have begun, but are, for the most part, still in their infancy. The microRNA MIR137 and four of its gene targets (TCF4, CACNA1C, CSMD1, and C10orf26) demonstrated genome-wide significance in either the schizophrenia or combined schizophrenia-bipolar disorder PGC analyses [30**], implicating a shared biological pathway in schizophrenia. MIR137 functions in regulating adult neurogenesis and neuronal maturation [43–45], suggesting it may contribute to schizophrenia via perturbed developmental processes. Known functions of its four associated targets, such as the expression of CSMD1 in the nerve growth cone , initiation of neuronal differentiation by TCF4 , and the role of voltage-gated calcium channel genes such as CACNA1C in interneuron development , support this idea. Although the most replicated schizophrenia GWAS association is the MHC locus, its complex composition of many genes functioning in immunity, neurodevelopment, synaptic plasticity, and other processes has thus far hindered investigations relating the association signal in this region to disease pathophysiology.
As risk alleles for schizophrenia are discovered with increasing certainty, the identification of subjects carrying these alleles opens up new avenues of research to elucidate the pathophysiological mechanisms. For example, “deep phenotyping” of subjects carrying risk alleles may connect variation in the clinical phenotype with genetic subtypes. Furthermore, since the associated variants may simply be proxies for near-by causal loci, sequencing the surrounding gene regions in risk allele carriers will yield single nucleotide resolution capable of revealing rare variants, insertions, and deletions, and may pinpoint variants conferring protein-coding or other functional changes. The cellular consequences of these genetic changes can be explored through generation of induced pluripotent stem (iPS) cells , and post-mortem brain studies of mRNA expression or protein alterations in subjects with and without the risk alleles. Generating cellular and mouse models with the risk variants or haplotypes and characterizing cellular, behavioral, and other processes also promises to help link the GWAS risk genes to disease mechanisms.
Endophenotypes such as heritable impairments in cognition and prepulse inhibition, and neuroanatomical and functional abnormalities, also offer a starting point to delineate the functional relevance of risk genes, since they are thought to be more directly linked to the genes and less influenced by environmental factors that can influence clinical features. In schizophrenia patients, ZNF804A and TCF4 risk allele carriers are reported to have better cognitive functioning than non-carriers, indicating possible subgroups with relatively spared cognition [50, 51]. The risk variant in ZNF804A has also been associated with impaired connectivity of the prefrontal cortex across hemispheres and with the hippocampus in a dosage-dependent manner . As more risk loci are identified, clearer biological patterns will emerge.
In addition to gene discovery, numerous other investigations predicated on GWAS data also justify the collection large samples. Heterogeneity in the clinical presentation of schizophrenia has strongly suggested variability in the etiology of this disease, but most efforts to link genes to symptom profiles have been small in scale. Analyses exploring the genetic relationship to symptom dimensions, age of onset, illness severity, and other features can now be conducted in an unbiased, genome-wide manner in samples well-powered to yield high-confidence results. Also, all models of the genetic architecture of schizophrenia consistent with current knowledge involve the non-additive interaction of genetic variants (i.e. epistasis) . Because the number of tests is effectively multiplied, many more subjects are needed to have adequate power to test epistatic interactions than single variants, and epistasis testing on a genomic scale becomes more tractable with greater numbers of subjects. Furthermore, the ways in which environmental exposures relate to the genetic predisposition to schizophrenia and its manifestation deserve greater attention, and larger samples will facilitate these studies as well.
With increased confidence in risk locus identification by GWAS, there is greater incentive to invest in studies of the mechanisms by which these loci effect biological changes. In addition to primary DNA sequence changes, epigenetic factors such as methylation at CpG islands or acetylation of histones that regulate expression of risk genes can also be investigated. MicroRNAs, such as the recently associated MIR137, and other non-coding RNAs, as well as transcription factors that may play a role in pathogenesis “upstream” of the identified risk loci, also merit further scrutiny [30**]. Certainty in the biological pathways identified through pathway analyses will also increase , bringing us closer to connecting findings from genetics and neuroscience.
Schizophrenia association studies have incorporated genome-wide markers and an increasing number of subjects, leading to identification of numerous novel risk loci with high statistical confidence. As risk genes are discovered, research emphasis can shift from gene-finding to exploring the downstream biological mechanisms resulting in schizophrenia, and ultimately facilitate development of new treatment and prevention options. Going beyond case-control analyses to test for epistatic interactions and aggregation of associations in biological pathways will offer additional insight into the distinct etiologic pathways converging on the schizophrenia phenotype. Although some inroads in connecting genetic findings to their neurobiological sequellae have been made, this will be a long-term endeavor. GWAS have brought us closer than ever to understanding the aberrant processes leading to schizophrenia, and larger samples in conjunction with additional analytic and methodological approaches will continue to further our understanding of this disease.
Supported by NIMH grant number P50 MH080272 and the Stanley Medical Research Institute.
Conflicts of Interest
The authors have no conflicts of interest to declare.