|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide experiments are rapidly changing our understanding of the molecular genetics of schizophrenia. These studies have discovered uncommon copy number variations (mainly deletions) associated with schizophrenia as well as common SNPs with alleles associated with schizophrenia. The aggregate data provide initial support for polygenic inheritance and for genetic overlap of schizophrenia with autism and with bipolar disorder. It is anticipated that as genetic discoveries accumulate, the application of a myriad of tools from systems biology will lead to a delineation of biological pathways involved in the pathophysiology of schizophrenia, and eventually to new therapies.
The aim of this chapter is to introduce the reader to the genetics of schizophrenia: its background, the status of a variety of genetic findings, new developments (which are many since our last review 1), and current and future challenges. Schizophrenia is a devastating psychiatric disorder with a median lifetime prevalence of 4.0 per 1,000 and a morbid risk of 7.2 per 1,000 2. The age at onset is typically in adolescence or early adulthood 3, with onset after the fifth decade of life and in childhood both being rare 4,5. Although the prevalence for males and females is similar 2, the course of schizophrenia is often more severe and with earlier onset for males 3,6. The standardized mortality ratio (SMR; ratio of observed deaths to expected deaths) for all-cause mortality is 2.6 for patients with schizophrenia compared to the general population 2, with excess deaths mainly from suicide during the early phase of the disorder, and later from cardiovascular complications.
Schizophrenia commonly has a chronic course albeit with fluctuating patterns, and cognitive disability. Its hallmark is psychosis, mainly characterized by positive symptoms such as hallucinations and delusions that are frequently accompanied by negative (deficit) symptoms such as reduced emotions, speech, and interest, and by disorganization symptoms such as disrupted syntax and behavior. Severe mood symptoms, up to and including manic and major depressive episodes, are present in many cases. There are no diagnostic laboratory tests for schizophrenia; instead, the diagnosis relies on clinical observation and self-report. It is then remarkable that ongoing epidemiological study over the last century using the clinical phenotype, but with variable ascertainment and assessment rules, has consistently shown the importance of genetic factors in schizophrenia.
The definition of caseness is fundamental to research design decisions. Bipolar disorder, schizoaffective disorder, and schizophrenia share some phenotypic aspects in common, both in terms of symptoms and also therapeutics, with all responding to antipsychotic drugs. Emil Kraepelin defined dementia praecox as a group of psychotic conditions with a tendency toward poor prognosis 7. He grouped under the term manic-depressive psychoses a set of conditions that included periodic and circular insanity, simple mania, and melancholia, which he thought did not result in deterioration. Kraepelin believed that dementia praecox and manic-depressive psychoses had specific and separate causes. However, reality proved to be more complex, and in 1933 Jacob Kasanin coined the term schizoaffective psychosis to refer to a disorder with mixed features of schizophrenia and affective disorder 8. Compared to the general population, family studies show that the clinically intermediate diagnosis of schizoaffective disorder is more common in families ascertained from probands with schizophrenia as well as in families ascertained from probands with bipolar disorder 9–14. The diagnostic distinction between schizophrenia or bipolar disorder and schizoaffective disorder is not reliable 15. The specific time criterion for affective symptoms relative to the schizophrenic symptoms is not well defined and varies in different modern classifications 16,17.
Our knowledge of the molecular mechanisms of schizophrenia pathophysiology remains very incomplete. False starts and research dead ends have taught the field the need for cautiousness; the biological complexity of schizophrenia is much higher than was anticipated. This complexity also applies to simple Mendelian disorders, which although easily analyzed by studying pedigrees, can present unexpectedly intricate biology. Yet, the architecture of schizophrenia is incommensurably more difficult than simple genetic disorders. The idea that one or a few common major gene effects explain schizophrenia was empirically tested in genome-wide linkage scans but results mostly fell short of genome-wide significance 18. That schizophrenia is very complex should not be surprising. First, the brain is more complicated than any other organ; the number of neuronal interconnections and permutations thereof in humans is enormous (~2×1010 neocortical neurons and ~1014 synapses) 19,20, and our knowledge of the physiological basis of higher brain functions is very incomplete. Second, the absence of well-defined, focal, and specific microscopic neuropathology has contributed to making schizophrenia particularly impervious to molecular progress, but this is starting to change as we discuss below.
Schizophrenia belongs to a group of pathologies known as complex genetic disorders. Our understanding of complex genetic disorders is still evolving as new experiments uncover novel mechanisms of disease. It is commonly thought that many genes are involved in each disorder with each gene conferring only a small effect on the phenotype. The individual risk variants are thus without diagnostic predictive value, and any estimations of risk are probably going to change in the future as large epidemiological samples become available for analysis 21. Epistatic interactions between these genes and among their products, and interactions with environmental risk factors are considered highly plausible. However, the study of genetic interactions utilizing genome-wide data remains largely unexplored because of need to correct for an enormous number of statistical comparisons. Our knowledge is shifting from oligogenic models to a polygenic model of schizophrenia, but its genetic architecture still remains largely unknown. The current evidence strongly suggests that the mutation frequency spectrum comprises a mix of many common and rare mutations. The idea that complex disorders do not result from abnormal function of individual genes but from dysfunction of entire molecular networks, the concept of system disorder, is making strong inroads in the literature (see, for example, 22). Whether this applies to schizophrenia is still an empirical question that remains to be addressed.
It has traditionally been assumed that changes in DNA sequence are solely responsible for the transmission of schizophrenia. However, twin studies show that it is also conceivable that an epigenetic mechanism may contribute to the transmission of schizophrenia. The possibility of a role for epigenetics, i.e., changes in phenotype not explained by DNA sequence, was raised first as an explanation of the incomplete concordance for schizophrenia in monozygotic twins (see, for example 23), but still remains little tested due to methodological difficulties 24.
The longstanding and influential belief that the incidence of schizophrenia is unaffected by place and time has been recently disproven, opening a remarkably productive period for the study of schizophrenia epidemiology. New epidemiological results show specific circumstances where risk for schizophrenia is increased, including various obstetric complications 25,26, urban birth or residence, famines, migrant status, and seasonal effects (via prenatal infections, e.g., influenza) 2. Other epidemiological evidence strongly suggests that advanced paternal age 27,28 (see more in “Darwinian Paradox”), along with cerebral hypoxia and other severe pregnancy and perinatal complications 29,30 are also environmental risk factors. Overall, the landscape of environmental risks is fertile, and growing rapidly, pointing to a myriad of risk factors acting early during development. Yet, the individual effects of environmental risks, even those that are biologically catastrophic such as famines 31,32, are relatively small. Although the specific pathophysiological connections between environmental risk factors and schizophrenia remain largely tentative, epidemiological findings can potentially provide strong guidance to molecular genetic experiments, e.g., screening specific genes involved in prenatal nutrition and performing serologic assays from epidemiological samples 33. While the prevalence of some other complex disorders are rising, such as obesity 34 and diabetes 35, there is no evidence for such a rapid change for schizophrenia (a registry analysis from Denmark has even detected a possible trend towards decreased incidence 36). Finally, it is likely that additional environmental factors associated with increased risk for schizophrenia still remain to be discovered, and that an understanding of gene-environment interactions will be necessary to unravel the biology of schizophrenia.
The modern twin and adoption studies were instrumental in rejecting psychological hypotheses of schizophrenia causation 37 and became the main foundation for the search of molecular genetic risk factors.
Ernst Rüdin, who was a disciple of Kraepelin, but later infamously became the main scientific leader of Nazi eugenics 38, conducted the first systematic family study for a psychiatric disorder 39. He realized that the data would not fit a model of simple monogenic Mendelian transmission, but missed the evidence for additional complexity. Many family studies of schizophrenia were conducted since then, with the available evidence showing that the child of a parent with schizophrenia has an elevated empirical risk about tenfold over the general population risk (for a review, see 40). The risk of a disease in a type of relative compared to that in the general population is often called λ (if the risk is conferred by an allelic variant, it is further specified as an allele specific λ). The relative risk to siblings resulting from having a proband with the illness is called λs 41. Common disorders have a smaller λs than rare disorders, even with similar overall genetic effects. For example, the respective λs for the autosomal dominant Huntington disease (assuming population prevalence 0.0001), the autosomal recessive cystic fibrosis (assuming population prevalence 0.0004), and autism are 5,000, 625, and 60–100 42,43, though the λs for major adult psychiatric disorders of the adult typically are under 10 (λs is ~10 for schizophrenia). The risk for schizophrenia to a relative of an affected proband decays much more rapidly than the proportion of genes shared between them, which is also inconsistent with a simple Mendelian model 40. Still most cases of schizophrenia in the general population are sporadic 44,45, which may seem surprising at first glance. However, assuming polygenic inheritance (which explains the molecular findings of schizophrenia better than other models, see below), for a disease with a prevalence of 1% and 90% heritability, more sporadic than familial cases are expected 44.
Differences between monozygotic (identical) twins are attributed to the environment, and differences between dizygotic (fraternal) twins to both hereditary and environmental factors in twin studies. The concordance rate, the probability that a second twin will develop a disorder if the proband (first examined) twin has the disorder, is commonly used. Heritability is the proportion of variance explained by genetic factors. The concordance rates of schizophrenia for monozygotic twins have been found to be about 40 to 50%, and heritability estimates are around 80% 46,47. The reader should note that heritability per se is not an estimation of the cause of the disease, but rather of the cause of the variation of the disease in a particular population 48. Studies from Denmark and Finland finding concordances consistent with older studies have employed population registries 49,50, which present two major advantages 46: systematic ascertainment and an estimation of the population risk for the studied trait. Contemporary studies based on hospital registries from Germany, UK, and Japan also yielded similar results 51,52. The risk of schizophrenia and schizophrenia-related disorders is similar for the offspring of both the unaffected and the affected monozygotic twins 53,54, which suggests that the unaffected twins do carry a heritable genetic risk for schizophrenia without expressing the disease (supporting either or both, epigenetics and non-shared environments). It has recently been proposed that DNA methylation differences might be the cause of monozygotic twin discordance 55, and also might provide a mechanism for a variety of environmental risk factors for schizophrenia 33,56.
Such studies allow dissection of genetic from environmental contributions to a disorder in ways that twin studies cannot (see review 57 which also explores methodological strengths and weaknesses of these approaches). The high-risk adoptees approach evaluates adopted away offspring of parents with schizophrenia to see if risk for schizophrenia (or often also schizophrenia spectrum disorders) is elevated. These studies have found an elevated risk for psychosis in such offspring, whether the parents had schizophrenia onset before or after adoption, and whether the rearing environment was foster parents or institutional 58–64. Consistent with the risk traveling with the biological rather than the adoptive relationship, it was shown that the risk was similar for offspring of schizophrenia mothers, whether they were raised by the biological (schizophrenic) parent or an adoptive (non-schizophrenic) parent 59, and that offspring of mothers without schizophrenia did not have an increased risk when raised by psychotic adoptive parents 60. Furthermore, adoption studies can yield some insight into gene-environment interactions, for example by comparing communication deviance in adoptive parents of high-risk adoptees 65. The adoptees’ family approach starts with schizophrenic adoptees and matched control adoptees, and evaluates their adoptive and biological families for illness. These studies have shown elevated rates of schizophrenia and schizophrenia spectrum disorders in biological families of schizophrenic adoptees compared to biological families of control adoptees, coupled with low and equivalent such rates in adoptive families of both types of adoptees 66–70.
Schizophrenia has long been known to be associated with decreased fertility 39,71, which is explained by the behavioral and social characteristics of schizophrenia. Fertility is substantially compromised in both genders 72,73, though more markedly in males. Decreased fertility is anticipated to increase because of the delayed marriage patterns in Western societies, while age of onset for schizophrenia has not changed. It is expected that natural selection would decrease the population frequencies of disorder genes that diminish fertility. However, the prevalence of schizophrenia remains high — much higher than for Mendelian disorders. How schizophrenia circumvents the effect of natural selection (sometimes called a “Darwinian paradox”) remains an enigma and multiple hypotheses have been proposed – see review 74. Fananas et al 75 proposed that the relatives of schizophrenics might have a compensatory increase in fertility, but preliminary data did not replicate in larger samples 72,73,76. Lack of evidence for increased fertility in relatives of schizophrenics weakens alternative explanations, such as heterozygote advantage (either homozygote shows reduced fitness compared to the heterozygote such as with sickle cell anemia 77) and antagonist pleiotropy (an allele might reduce fitness for one trait while increasing fitness of a related trait) 74.
Another proposed explanation is that the clinical phenotype might have poor correlation with the underlying genetic susceptibility (i.e., genotype), and it has been suggested that endophenotypic variables (sometimes called intermediate phenotypes) such as structural and functional neuroimaging characteristics constitute a better index of the underlying gene effects than the clinical phenotype 78. There are two problems with this argument: First, a large body of genetic epidemiology is based on the clinical phenotype. Second, none of the proposed endophenotypes has been proven yet to be more heritable than the aggregate clinical phenotype 79.
It is also conceivable that the alleles conferring susceptibility to schizophrenia might be maintained in the population against negative selection by a high mutation rate 80. Advanced paternal age would then be a risk factor as spermatogonia replicate many more times over life than oocytes and the age of fathers is greater than expected for some autosomal dominant diseases due to new mutations 81. A study of an epidemiological sample of 87,907 individuals born in Jerusalem between 1964 and 1976 found that the relative risk of schizophrenia increased continuously with the age of the fathers to a maximum of 2.96 in offspring of fathers aged 50 and 54 years 27. This finding has been replicated in larger samples from different populations, especially for older fathers (see review 28), and found to be a stronger effect in sporadic (family history negative) cases 82, as would be predicted for de novo mutations. As reviewed 74, paternal age effects challenge neutral and balancing selection (such as heterozygote advantage and antagonist pleiotropy) explanations of schizophrenia’s Darwinian paradox, but are expected under a mutation-selection model 83. In addition, polygenic mutation-selection balance (where deleterious mutations have yet to go extinct: many older mutations of milder individual effect that are removed more slowly from the population, and rarer recent mutations of larger effect that have not had time to diminish over generations) is consistent other important aspects of schizophrenia (e.g., its prevalence, reproductive fitness costs, and its expression via the body’s most complex organ with an enormous “mutational target size”) 74, and also with recent findings detailed below from genome wide association studies (GWAS), especially support for the importance of polygenic inheritance.
Before the availability of GWAS, most gene association studies consisted of tests of candidate gene involvement. Close to 800 genes have been tested for association (see www.schizophreniaforum.org/res/sczgene; 84). This makes schizophrenia one of the most studied disorders through a candidate gene approach. Unfortunately, none of them as of today can be considered fully established. As samples frequently lacked sufficient statistical power, the problem of non-replication has been far from trivial. In a comprehensive study of some of the most cited candidate genes (e.g., DISC1, DTNBP1, NRG1, DRD2, HTR2A (5-hydroxytryptamine [serotonin] receptor 2A), and COMT), each of 14 genes were tested by genotyping a sample of 1,870 cases and 2,002 screened controls of European ancestry (EA) 85. A total of 789 single nucleotide polymorphisms (SNPs), including tags for common variation in each gene (tag SNPs are SNPs that are correlated with many other nearby SNPs, for which they are proxies), SNPs previously reported as associated, and SNPs located in functional domains of genes was genotyped, but no association was found (Figure 1), which clearly contradicts ORs predicted from the analysis of smaller samples (the effect size can be conceptualized as the strength of the association between a marker and the disorder, and it can be expressed as the odds ratio, OR, which is the odds for an event, here, possessing a risk allele, in the risk group, i.e., cases, divided by the odds in the non-risk group, i.e., controls). The dilemma for the field is interpreting the reasons for the abundance of positive and negative associations with candidate genes. It is likely that the use of small sample sizes and inadequate or loose statistical thresholds are behind many of the unreplicated observations. Other potential causes of false positives are multiple analyses and selective reporting 86. It is possible that genetic heterogeneity in some specific cases would preclude a replication, but it would seem unlikely that this would be a robust general argument (for a detailed discussion of heterogeneity, see 87). Multiplicative epistasis (where the individual gene effects might not be detectable but the product of the effects might become detectable) is another largely unexplored possibility that could in principle explain non-replication, and of course environmental variation is another source of heterogeneity. Furthermore, very provocative work by Richter et al. 88 suggests that increased standardization (such as in experiments designed to decrease heterogeneity, which allows and frequently requires smaller sample sizes) can actually decrease reproducibility in animal behavioral experiments, challenging long held ideas. Recent schizophrenia GWAS results (where each candidate gene is typically more comprehensively tested than in most candidate gene experiments), overall, have not supported most associations to classical candidate genes (Table 1, also see supplementary data file 3 from 89), a pattern consistent with the general results of GWAS in complex disorders (www.genome.gov/gwastudies; 90). Although some candidate genes have been replicated (e.g., APOE4 in Alzheimer’s disease), most discovered associations from GWAS were either in genes that were not previously suspected to be involved in the disease, or in regions of the genome with no obvious genes. Still, the evidence supporting some candidate genes is difficult to ignore (for example, ERRB4, the receptor for NRG1 has shown high significance in an African American (AA) GWAS sample 89. Additional research and the analysis of cumulative data, with particular attention to both quality control and statistical rigor, will be required for definitive conclusions. The interpretation of data can be treacherous. For example, the reader should be aware that gene pathway analyses do not specifically confirm individual gene associations. While an association with ERRB4 may suggest potential involvement of NRG1 as both belong to the same biological pathway, it does not confirm, by itself, participation of NRG1 in schizophrenia, which would still require a genome-wide association signal between NRG1 and schizophrenia.
Genome-wide studies, in combination with system biology approaches, yield comprehensive information and have been demonstrated to be more useful to deal with complex phenotypes. In direct opposition to candidate gene studies, GWAS interrogate, one at a time, markers of common variation across the human genome, investigating all genes and the majority of the non-genic regions, whether they were previously implicated by pathophysiological hypotheses or not. The large number of tests in a GWAS makes the method highly susceptible to false positive hits; therefore, the estimation of an appropriate genome-wide significance threshold is fundamental. The genome-wide significance threshold, for a value of 5% significance assuming tests for all common SNPs, has been estimated to be around p<5×10−8 ,91–93. Due to their more comprehensive coverage of the human genome, GWAS have been more successful than any previous approach to find new susceptibility loci for complex disorders. According to www.genome.gov/gwastudies (as of 09/20/2009), 732 genes were reported associated to one or more complex disease phenotypes at genome-wide significant levels (p<5×10−8) 90,94, and many of these associations have already been replicated. GWAS are based on linkage disequilibrium (LD), a non-random statistical association of alleles at two or more loci, which is characteristically associated with short physical distance between genetic markers.
To a large extent, GWAS was made possible by the Human Genome Project (www.ornl.gov/sci/techresources/Human_Genome/home.shtml). Major improvements in SNP genotyping and DNA sequencing were spinoffs from the human genome project, and microarrays made possible rapid and accurate genome-wide genotyping resulting in a map of common genetic variation in a reference set of individuals of European, Asian, and African descent (HapMap project). The majority of the markers used for GWAS are tag SNPs; thus the most significant associated SNP in a GWAS may reflect an indirect association (i.e., be in LD with a causative variant). The Affymetrix 6.0 and Illumina 1M SNP arrays include ~1M common SNPs and probes for analysis of copy number variants (CNVs), with their SNPs assaying ~80% of the common variation in the genome for EA samples 95. However, the estimated number of common (minor allele frequency, MAF>1%) SNPs is ~10M, but our genotyping capabilities are not sufficiently developed yet to genotype every SNP in a very large clinical sample (deep re-sequencing technology and new arrays may soon overcome this difficulty). In the meantime, imputation, the computational prediction of genotypes from non-genotyped SNPs, is used to extend GWAS map coverage 96,97. By design, the main assumption under GWAS is the common disease/common variant hypothesis (CDCV) 98,99.
Recent complex disorders GWAS show two main characteristics: First, common loci with small effects are typically reported (ORs=1.1–1.5), an empirical confirmation that a large body of epidemiological studies predicting multiple small common genetic effects for complex disorders were correct (including for schizophrenia 100,101), since loci with larger effects are rapidly eliminated from the population through selection. Second, most studies have tended to detect new susceptibility loci, and only very large samples obtained from combining studies are powered to show robust replication. This is because the power to detect one out of many possible risk loci is much larger than the power to detect specific disorder alleles 21. Furthermore, if only small effects are found, many genes would be predicted to underlie the pathophysiology of most complex genetic disorders. On the other hand, it is important to also emphasize some main GWAS limitations. The reader should be aware that the statistical power of GWAS to detect an association with rare alleles (i.e., SNPs or CNVs with MAF<1%) is very limited, that for the detection of rare variants re-sequencing is more useful than GWAS, and that the study of gene-gene interactions (epistasis), although widely expected to be a significant source of heritability, is strictly limited by the statistical power of currently existing samples contrasted to the large number of such tests.
GWAS have already yielded genome-wide significant results for schizophrenia, which we now discuss in more detail, though the reader should note that the small individual ORs do not permit prediction of caseness from specific individual susceptibility loci. Seven GWAS for schizophrenia have been published (Table 1). The sizes of the investigated discovery samples have ranged from 322 to >16,161, but even the largest studies did not yield a genome-wide significant result before combined testing of independent samples. This was not unexpected. The collective experience of GWAS for complex disorders shows that a typical susceptibility locus has an OR of 1.1–1.3, which often necessitates extremely large samples for detection. A sample with a total N of 5,334 such as the Molecular Genetics of Schizophrenia (MGS) EA sample (most investigated samples have been smaller), has adequate statistical power only to detect very common risk alleles (30–60% frequency, log additive effects) with genotypic relative risks ~1.3 89. To reach sufficient statistical power, the combined analysis of independent datasets is useful. Although the diagnostic spectrum of the final combined sample is naturally wider than for the component datasets, combining datasets has been remarkably successful for a variety of complex disorders including schizophrenia 89,102,103 (see below, meta-analysis). Different samples often were typed with different platforms, but imputation largely overcomes the limitation that many SNPs from different platforms do not overlap. These results suggest that schizophrenia, despite the very high reported heritability, is among the most complex of human genetic disorders. An additional analysis of the International Schizophrenia Consortium (ISC) and MGS samples 102 supported a polygenic model for schizophrenia susceptibility, involving a set of hundreds of genes, each with unquantified but very small individual effects 100 (see below, polygenic section). Finally, rapidly mounting evidence shows that cases have more rare (<1%) and large (>100kb) CNVs than controls.
The initial attempts to map schizophrenia to the MHC started in the 1970s 104, only a few years after the discovery of the human HLA system 105. Many attempts since then had been made, and some yielded suggestive evidence (see for example, 106), but definitive evidence of MHC involvement was only recently obtained from a combined analysis of GWAS data. Three GWAS studies published jointly in 2009 (ISC, MGS, and Schizophrenia Genetics Consortium, SGENE), reaching a total EA sample of 8,008 cases and 19,077 controls, performed a meta-analysis of schizophrenia GWAS for the first time 89,102,103. The meta-analysis combined the p-values for all imputed and genotyped SNPs from the most significant regions of each study. This analysis generated a genome-wide significant association at the MHC region on chromosome 6.
The MHC signal extends over much of the MHC region, from ~26 Mb to ~33 Mb (Figure 2). The strongest evidence (rs13194053, p=9.54×10−9) for association from the meta-analysis was observed near a cluster of histone genes and several immune-related genes, including butyrophilin subfamily 3 member A2 and A1 (BTN3A2 and BTN3A1) and protease serine 16 (PRSS16), but each individual dataset tends to have a different location for its best findings. The MHC region has a very high gene density and long-range LD blocks 107 – the human genome is structured in many “blocklike” islands of LD generated by a great variation of recombination rates. Blocks from regions of low recombination are long and are interspersed with interblock regions of higher recombination 108,109. The location of the causative variation remains indeterminate but it could be in one or more genes or a nongenic region within the MHC. In the MGS sample (and undoubtedly similarly in the other two samples), ~50% of the top 1,000 highest ranking GWAS SNPs were intergenic, located outside the 10 kb region on either side of a gene, although many of these may represent a genic region signal due to LD. Even at the MHC locus, rs13194053, the SNP with the most significant association from the meta-analysis, is ~29 kb away from its closest gene, HIST1H2AH (histone cluster 1, H2ah) 89,102,103. A functional role for many intergenic regions would not be surprising. Many of these regions contain highly conserved sequences believed to have a regulatory function 110. The associated variants, or variants in LD with them, in intergenic regions may then alter expression of upstream or downstream genes. Moreover, most of the human genome is transcribed, with some transcripts serving as regulatory RNAs, but the function of most transcripts still undefined 111.
The genes in the MHC region have many different biological functions, but genes with an immune function predominate. Histones regulate DNA transcription by chromatin modification through histone methylation or acetylation 112–114, and have a role as antimicrobial agents – histones disrupt the bacterial cell membrane and interfere with microbial gene expression 115. In human placenta, histones (H2A and H2B) neutralize bacterial endotoxins as part of an infection barrier 116. This raises the possibility that genetic variation in histones might underlie a differential placental susceptibility to infections, and that one or more haplotypes spanning histones might increase the susceptibility to schizophrenia through this mechanism. A Danish registry study reported an increased risk of autoimmune disorders (thyrotoxicosis, intestinal malabsorption, acquired hemolytic anemia, chronic active hepatitis, interstitial cystitis, alopecia areata, myositis, polymyalgia rheumatica, and Sjögren’s syndrome) for schizophrenics, and a history of any autoimmune disorder (of 29 evaluated) was found associated with a 45% increase in risk for schizophrenia 117. The MHC region has been implicated in many genetic disorders with immune-related abnormalities 118, including type 1 diabetes (T1D), multiple sclerosis (MS), Crohn’s disease (CD), and rheumatoid arthritis (RA), among many others per www.genome.gov/gwastudies 90. It is noteworthy that rs3800307 (found on the DRB1*03-DQA1*0501-DQB1*0201 haplotype), a SNP in complete LD (r2=1) with rs13194053, which reached genome-wide significant association with schizophrenia in the combined GWAS meta-analysis 89,102,103, is associated with T1D 119. In addition, rs3131296 at NOTCH4 is in strong LD (r2>0.73) with the classical HLA allele DRB1*03 and other SNPs that are associated with several autoimmune disorders (T1D, celiac disease, systemic lupus erythematosus, etc.), albeit with opposite alleles 103. Finally, the MGS GWAS showed some evidence, with p=3.5×10−5 in the EA data set and p=1.9×10−6 in the EA plus AA data set, for association with schizophrenia at the chromosome 1p22.1 FAM69A-EVI-RPL5 gene cluster 89 which has been implicated in MS 120.
Other genes in the same region are involved in chromatin structure (high mobility group nucleosomal binding domain 4, HMGN4), transcriptional regulation (activator of basal transcription 1, ABT1; zinc finger protein 322A, ZNF322A; zinc finger protein 184, ZNF184), G-protein-coupled receptor signaling (FKSG83), and the nuclear pore complex (nuclear pore membrane protein 121 -like 2, POM121L2). The SGENE-plus (their GWAS set) and follow-up samples (i.e., an extended SGENE dataset which added a follow-up EA sample of 4,999 cases and 15,555 controls) analysis reported an independent association (i.e., in weak LD with rs13194053 at the histone gene cluster) at NOTCH4 (Notch homolog 4 [Drosophila], rs3131296, p=2×10−8), located at 32.28 Mb on chromosome 6, and the combined meta-analysis of SGENE-plus and follow-up samples, along with MGS and ISC samples, gave a p=2.3×10−10 there 103. See Table 2 for a list of genes mentioned in the text and their functions.
The combined analysis of SGENE-plus GWAS samples and replication samples uncovered associations with neurogranin (NRGN) and with transcription factor 4 (TCF4) that subsequently reached genome-wide significance in the combined analysis of SGENE-plus and replication samples along with ISC and MGS samples (Table 1) 103. NRGN encodes a postsynaptic protein kinase substrate that binds calmodulin, mediating N-methyl-d-aspartate (NMDA) receptor signaling that is important for learning and memory, and relevant to the proposed glutamate pathophysiology of schizophrenia 121,122. TCF4 is a neuronal transcriptional factor essential for brain development, specifically neurogenesis 123. Mutations in TCF4 cause Pitt–Hopkins syndrome, a neurodevelopmental disorder characterized by severe motor and mental retardation, including absent language development, microcephaly, epilepsy, and facial dysmorphisms 124–126. It is also of interest that homozygous and compound-heterozygote deletions and mutations in CNTNAP2 and NRXN1 can symptomatically resemble Pitt–Hopkins syndrome along with autistic behavior 127, and both NRXN1 (via the 2p16.3 CNV, Table 3) and CNTNAP2 (a rarer CNV 128) have previously been implicated in schizophrenia (and also autism spectrum disorders and epilepsy, as reviewed in 127). Another new schizophrenia susceptibility gene from schizophrenia GWAS is zinc finger protein 804A (ZNF804A), which was identified by a two-stage study, with a GWAS discovery phase using 479 cases and 2,937 controls, followed with 6,829 cases and 9,897 controls for loci with a discovery p<10−5 ,129. A combined p=1.61×10−7 was obtained for SNP rs1344706 in the initial report, and the association evidence was supported in later large GWAS of schizophrenia 89,102,103. Subsequently, rs1344706 in ZNF804A was reported to be associated with altered neuronal connectivity in the dorsolateral prefrontal cortex in a functional magnetic resonance imaging study of healthy controls 130.
Many genetic variants, each with a very small effect, combined together, make substantial contributions to disorder risk under a polygenic model, first hypothesized for schizophrenia four decades ago 100. Simulations show that even a disorder with 1,000 risk loci with low mean relative risks (RR=1.04), when evaluated in a large scale (10K cases and 10K controls) GWAS would still allow prediction of individual disorder risk with an accuracy >0.75 by using 75 loci explaining ~50% of the risk variance 131. The first empirical test of the polygenic hypothesis of schizophrenia by the ISC used their GWAS (discovery data set) to define a large set of very small effect common variants as “score” alleles with increasingly liberal association significance thresholds 102. With the set of score alleles, the ISC generated an aggregate risk score for each individual in independent target GWAS data sets of schizophrenia, using the MGS EA and AA data sets as well as a UK sample 89,129. Aggregate risk scores in cases were found to be significantly higher than in controls in each of the GWAS data sets of schizophrenia, and also in GWAS data sets of bipolar disorder 132,133, but not in control GWAS data sets of non-psychiatric disorders: T1D; type 2 diabetes, T2D; hypertension; CD; RA; and coronary artery disease 102,133. Collectively, ISC concluded that thousands of common polygenic variants with very small individual effects explain about one-third of the total variation in genetic liability to schizophrenia 102. In an independent bioinformatics-based study 134, schizophrenia candidate genes selected from literature mining were found to be enriched in the list of genes with small p-values from independent schizophrenia GWAS data sets. The polygenic model under the assumption of less common (MAF<5%) causal alleles did not fit well with the observed schizophrenia GWAS data 102, however, both simulated and empirical data indicate that the spectrum of risk alleles for common disorders includes both common and rare variants 98,135. Furthermore, despite the substantial variation in liability to schizophrenia possibly explained by polygenic variants (~30%), coupled with contributions from common and small-effect variants individually detectable in GWAS (e.g., MHC variants, TCF4, etc., above) and rare and large-effect CNVs (see below), the problem of how to explain the substantial missing heritability remains fundamental. Missing heritability here refers to heritability that is unexplained after well-powered GWAS have been conducted. Although it has been argued that the heritability of some behavioral traits and disorders may have been overestimated 136, this seems unlikely for schizophrenia given the large body of high quality evidence that is available, and other reasons seem more plausible (see an excellent review on the topic 137)
CNVs are stretches of genomic deletions and duplications ranging from 1 kb to several Mb, and thus are likely to have larger phenotypic effects than SNPs. Only rare (<1%) and large (>100kb) CNVs have thus far been implicated in schizophrenia 138–144, as reflected by overall CNV burden and individual CNV loci. Supporting evidence for association of specific rare and large CNVs with schizophrenia is emerging at 1q21.1, 2p16.3 (NRXN1), 15q11.2, 15q13.2, 16p11.2, and 22q11.21 138–144 (Table 3). The 3 Mb deletion at 22q11.21 (22qDS) has been known to cause velocardiofacial syndrome (VCFS), and increases the risk for schizophrenia 145–147. An epidemiological study found that more than 30% of 22qDS carriers develop psychosis, about 80% of this manifesting as schizophrenia 147, which represents the largest known individual risk factor for the development of schizophrenia, besides having an identical twin with schizophrenia. The 22q11.21 CNV was the only CNV reaching genome-wide significance in some schizophrenia GWAS 142 and the only one not found in controls (case%/control% was 0.30/0.00). Each of these other CNVs was found over-represented in schizophrenia cases in at least one study (Table 3), with the supporting evidence remaining consistent for CNVs at the 1q21.1 deletion (0.24/0.02), 2p16.3 (NRXN1) deletion (0.10/0.02), 15q13.2 deletion (0.20/0.02), and 16p11.2 duplication (0.35/0.03), and plausible for the 15q11.2 deletion (0.65/0.22). All the CNVs (except for 22q11.21) initially found only in schizophrenia cases were also found in healthy controls in later studies, suggesting that the penetrance of these rare CNVs may be relatively low (Table 3). The deletions at NRXN1, which only involve this single gene, suggest that exon disruptions rather than deletions of other parts of NRXN1 are associated with schizophrenia 148. With the rest of the rare and large CNVs implicated in schizophrenia spanning multiple genes, specific gene effects, including possibly genes presenting pleiotropy (see next section), will be difficult to disentangle.
Pleiotropy refers to the common phenomenon of variation in a gene simultaneously affecting different phenotypes. While examples abound in model organisms (e.g., flies 149), evidence for pleiotropy in humans is also available, such as genes for body weight and height 150, and also for disorders such as T2D 151 and prostate cancer 152. The molecular genetic overlaps between schizophrenia and bipolar disorder, and between schizophrenia and autism are consistent with pleiotropy; but shared genetic loci may actually determine an aspect (somewhat in isolation from the overall phenotype) shared by two disorders such as psychosis in schizophrenia and in bipolar disorder 153.
Although schizophrenia and bipolar disorder are classified as different psychiatric disorders in most contemporary classifications such as the Diagnostic and Statistical Manual of Mental Disorders Fourth Edition, DSM-IV 154, a distinction historically referred to as the Kraepelinian dichotomy 7, they share some similarities, such as peak onset in early adulthood, prevalence around 1% generally similar worldwide, psychotic symptoms in more than half of bipolar disorder type 1 subjects and the same treatment of psychosis (in both cases with antipsychotic medications), substance use comorbidity, increased suicide risk (and common severe mood disorder in schizophrenia), and sometimes a difficult differential diagnosis with schizoaffective disorder. Overlap of susceptibility genes has been postulated 155–157 for more circumscribed aspects such as psychosis proneness 153,158 especially for “positive symptoms” such as hallucinations and delusions, or might reflect a wider range of susceptibility to higher brain dysfunction such as pleiotropy found with CNVs (an overlapping CNV for schizophrenia and bipolar disorder has been reported for the 16p11.2 duplication in a meta-analysis with an association p=4.8×10−7 for schizophrenia and p=0.017 for bipolar disorder 159). Family studies have shown that schizoaffective disorder is more common in families ascertained from probands with schizophrenia or with bipolar disorder type 1 than in the general population 9–14. A recent meta-analysis of family studies found familial coaggregation of schizophrenia and bipolar disorder as well 160. The largest family study reported to date, comprised of almost 36,000 schizophrenia and over 40,000 bipolar disorder Swedish probands, concluded that familial coaggregation between schizophrenia and bipolar disorder was ~63% due to additive genetic effects common to both disorders 161. Recent GWAS SNP data strongly support a genetic overlap between schizophrenia and bipolar disorder, which were shown to share polygenic common variants with very small effect sizes (see polygenic section above) 102. A gene-wide analysis found a significant excess of genes showing associations with both disorders 162. ZNF804A and CACNA1C (calcium channel, voltage-dependent, L type, alpha 1C subunit) are two such genes that are shared by both disorders: ZNF804A was initially identified as a schizophrenia susceptibility gene, and CACNA1C was initially identified as a bipolar disorder susceptibility gene, in respective GWAS 129,163.
Interestingly, the strongest SNP association of p=4.6×10−7 in the MGS GWAS EA data set was found with centaurin gamma 2 (CENTG2, also known as AGAP1), a gene that has been implicated in autism 164. It is also noteworthy that our exploratory analysis of MGS GWAS data combining both EA and AA ancestries (3,967 cases and 3,626 controls) showed a p=1.9×10−6 association with fragile X mental retardation, autosomal homolog 1 (FXR1) 89, and the association reached genome-wide significance when the ISC and SGENE-plus GWAS datasets were included in the analysis along with MGS EA+AA (data not shown). FXR1 is a paralog of fragile X mental retardation 1 (FMR1), dysfunction of which causes the FMR syndrome that includes autism as a common feature 165. Schizophrenia and autism also share some clinical features such as social interaction and communication impairments, and some negative/deficit symptoms 166. This is more noticeable for childhood-onset schizophrenia where developmental delays may be more marked than in adult-onset schizophrenia with subtler such findings 167. However, autism remains an exclusion criterion for schizophrenia in the DSM-IV unless prominent hallucinations and delusions are present for at least a month 154. A study of 129 adults with autism spectrum disorders, ASD, found that 7% had psychotic bipolar disorder and 8% had schizophrenia or other psychotic disorders 168. The current diagnostic hierarchy, which largely treats ASD and schizophrenia as mutually exclusive, could mask some ASD/schizophrenia comorbidity since an autism case might be less likely to be diagnosed with schizophrenia in adulthood 167, even in the presence of overt and chronic psychosis. A two-fold increase of schizophrenia in the parents of autism cases 169, a risk ratio of 3.44 for autism when prenatal parental history of a schizophrenia-like psychosis was present in a nationwide Danish study 170, and an increased risk of schizophrenia in autism patients 168,171 suggest overlapping risk factors between schizophrenia and autism. Genetic overlap of schizophrenia and autism (and other conditions) has recently received indirect support since a number of schizophrenia cases carrying rare and large CNVs have comorbidities such as learning disabilities, mental retardation, autism and autism spectrum disorders, and seizures or epilepsy, and/or such disorders’ own CNV scans have implicated the same CNV loci (Table 3). Besides the aforementioned conditions, others are over-represented in schizophrenia cases in general, such as seizures 172,173.
After over a quarter century of molecular genetics work in schizophrenia, advances in biotechnology and statistics applied to the study of large and well characterized clinical samples have made possible the discovery of individual susceptibility loci with subsequent replication. A comprehensive discussion of what comes next after a successful GWAS is outside the scope of this manuscript. We have selected for discussion a handful of issues that have been instrumental to generate progress until now, and are a foundation for further progress. First, the social environment where science is conducted has deeply changed during the last years. Of fundamental importance is the accentuated stance of new openness in the field of schizophrenia genetics in consonance with the instituted NIH policy of wide GWAS data sharing. De-identified phenotypic and genotypic data from GWAS studies funded by NIH are to be submitted to a centralized NIH GWAS data repository, the database of Genotypes and Phenotypes (dbGaP, www.ncbi.nlm.nih.gov/gap) hosted by the National Center for Biotechnology Information (NCBI), and studies supported the Wellcome Trust Case-Control Consortium are also deposited at a database (WTCCC, www.wtccc.org.uk). Data, in both cases, are accessible by application to access committees. The new NIH policy (grants.nih.gov/grants/gwas) has already created extraordinary opportunities to access data from independent research groups before publication 174. For example, the MGS Genetic Association Information Network (GAIN) genotypic/phenotypic sample (www.genome.gov/19518582) has been accessed 140 times for a large diversity of genetics research projects as of 11/17/09. The Psychiatric GWAS Consortium (PGC) 175,176 continues this new tradition of openness. The PGC is comprised of five groups: schizophrenia, bipolar disorder, major depressive disorder, attention deficit hyperactivity disorder (ADHD), and autism. A primary goal of the PGC is to perform disease specific and cross-disorder analyses from combined GWAS datasets composed of all qualifying samples for each of the disorders.
The method for following up GWAS results needs to be thorough; replication and fine mapping of associated regions are necessary for further progress (see informative review 177). The preferred approach is to combine GWAS data from independent studies, but when some of the samples do not have GWAS data, focused genotyping is still useful, although less informative. The analysis of combined data is important because most clinical samples do not carry the power to detect effect sizes typically uncovered in well-powered GWAS 178, and the estimated ORs tend to be inflated because only top-ranking associations are reported 179; a less biased estimation of ORs requires the systematic combination of GWAS and focused replication studies. Data can be meta-analyzed with a variety of methods (see comprehensive review 177). For example, three consortia 89,102,103 meta-analyzed a set of their most significant p-values (p<0.001) from their independent GWAS uncovering a genome-wide significant locus at the MHC. SNPs other than genome-wide significant (p<5×10−8) ones merit inclusion in confirmation experiments: while some genome-wide significant SNPs from a single study might not be confirmed in replication studies, other SNPs very highly ranking in the primary study, though not achieving genome-wide significance (e.g., SNPs with p<1×10−5), might surpass that threshold in a replication experiment. Association signals in an extended LD block that spans many genes (e.g., the MHC locus implicated in schizophrenia) make it hard to disentangle which gene/s is/are likely to be causal. Populations of non-European ancestry might have some non-overlapping susceptibility loci and it is fundamental to investigate these differences, as they can also be informative about different environmental risk factors. An important characteristic of African populations (e.g., AA) is reduced LD, which would facilitate the narrowing of the associated genomic intervals; existing limitations of CNV and SNP map coverage, and imputation of AA datasets are currently being addressed (for examples, see 180,181).
It is important to bear in mind that given the GWAS SNP design (where SNPs are selected because they are common and are informative of many other SNPs, not because of their functional properties), in most cases the associated SNPs are probably not the causal SNPs. As previously mentioned, we have noted that in the MGS GWAS and in the combined sample (MGS, ISC, and SGENE-plus 89,102,103), the vast majority of the strongest common SNP associations were not located in coding sequences where such a signal would be easier to interpret, but are in intergenic regions (over half of these SNPs >10 kb from a gene, almost all with no clear association to any known gene, i.e., via LD) or of unclear function, e.g., intronic, but not near a splice site, or known or putative regulatory site. Although the causal SNPs should be in LD with the GWAS associated SNPs, the causative genes may be close to the statistically associated locus, but may also be farther removed, even on a different chromosome. For example if the causal variant was a trans-acting factor that regulates transcription, the regulated gene/s might be located on a different chromosome. The integration of genome-wide transcription data (expression quantitative trait loci, eQTL, currently detected by microarray expression data) and GWAS data (DNA variation data) can help close this gap by linking the GWAS statistical results and biology and is expected to lead to discoveries of mechanisms of disease susceptibility otherwise obscured to either method in isolation. The approach has been proven successful in asthma 182. Interestingly, within the MHC region implicated in schizophrenia, there are more than 10 cis-eQTL (cis meaning nearby on the same chromosome) in the eQTL database, which uses expression data from lymphoblastoid cell lines (LCLs) of asthma patients (Figure 2) 183, and the SNP showing the most significant association with schizophrenia, rs13194053 with p=9.54×10−9, is in strong LD (r2=0.43) with a SNP showing the strongest association with BTN3A2 expression (Figure 2).
The selection of tissue for gene expression study is critical, and brain is not always necessarily the best choice of tissue. Epidemiological evidence strongly suggests that some of the primary genetic mechanisms leading to schizophrenia might reside in other tissues than brain – for example, an autoimmune mechanism that would compromise the brain – in such a case, the symptoms of schizophrenia would still reflect brain dysfunction, but would be removed from the primary noxae (investigations of these leads remain to be thoroughly explored). A more explicit example would be if a genetic abnormality affecting an immunological response to a virus contributed to schizophrenia risk, studying the brain transcription characteristics of a neurotransmitter system would only reflect secondary (or even terminal) neural changes to the primary immune response (which might be more apparent in immune tissue such as lymphocytes).
Establishing causal mechanisms may require, in addition to statistical testing of association, the functional characterization of implicated genes and variants in simple cell models (and in model organisms) targeting phenotypes with a high probability of association with the studied disorder – among other potential advantages, in the absence of buffering effects present in multicellular systems, in vitro effects are expected to be amplified (which may make detectable an effect that is very small in the whole/intact organism). Dendrou et al. 184 studied cell-specific protein phenotypes for IL2RA, a locus associated with T1D. By taking advantage of a large collection of normal donors from whom fresh, primary cells could be analyzed (the experimental subjects can be recalled for repeated measurements; this resource is known as the Cambridge BioResource) it was very elegantly demonstrated that elevated CD25 expression is associated with IL2RA haplotypes that protect from T1D 184.
It is still premature to conclude whether the genetic architecture of schizophrenia is like mental retardation where thousands of individual genetic disorders have been cataloged, or whether some widely speculated upon, but still little investigated mechanism such as epigenetics (which influences phenotypes through the regulation of gene expression) or gene-environment interactions will explain the bulk of the missing heritability for the disorder. Basic genomics research has produced major breakthroughs during recent years such as discovery of microRNAs, long-range promoters, epigenetic factors, and variable copy number variations, and many more will probably be made as our knowledge of the genome is rapidly increasing. It should not be surprising if still unknown genetic mechanisms will, at the end, explain a substantial proportion of the heritability of schizophrenia. Nonetheless, the task of defining the spectrum of molecular genetic mechanisms in schizophrenia is now at the forefront of our field. Some immediate research efforts will, in large measure, focus on whole genome re-sequencing and genome-wide gene transcription and epigenetic analyses. Rapid progress in biotechnology 185 is making the study of rare variants in many genes or large genomic regions in larger samples increasingly feasible – proof of principle is provided by the 1,000 genomes project (www.1000genomes.org), which is designed to build a deep catalog of human genetic variation. The design of experiments aimed at fine mapping of regions of association and the precision of imputation will both benefit from this project.
It is anticipated that as genetic discoveries accumulate, the application of a myriad of tools from systems biology (e.g., genomics, transcriptomics, proteomics, etc.) will lead to a delineation of biological pathways involved in the pathophysiology of schizophrenia, and eventually to new therapies (developments in treatment still lag compared to discoveries of new genetic associations for complex disorders, see 186, but this situation is expected to change as biological research makes inroads into still purely statistical associations). There is, however, a strong temptation to accept the simplest observations (i.e., those with immediate biological connotation) as the most meaningful and the only ones that merit follow-up. For example, Mitchell and Porteous stated: “Occam’s razor and statistical probability both argue that the co-inheritance of one or just a few risk genes by any individual case is the more likely explanation for the majority of incidence” 187. They continued: “Haven’t we learnt more about disease mechanism and potential routes to the treatment of Alzheimer’s disease from the rare variant examples of amyloid beta (A4) precursor protein (APP), presenilin-1 (PS1) and presenilin-2 (PS2) than from the archetypal common variant example of apolipoprotein E, isoform 4 (ApoE4)?” These arguments appear necessarily true at first sight, however, as previously discussed 188, an explanation may superficially appear more complicated than need to be, but only if considered apart from its evolutionary context 189. Research in model organisms (e.g., Drosophila) shows that most phenotypes are the result of complicated genetic architectures: multiple genes, often showing pleotropy (thus likely associated with multiple traits) and epistasis, and even single mutation effects differing with genetic background and environment 190,191, and this landscape will probably be true for complex human behavioral traits as well. Explanations relying on single genes are unlikely to capture the fundamental complexity of most human complex traits, and all the associated genetic variation needs to be pursued to understand the pathophysiology of a complex disorder. A task of utmost importance is the integration of the spectrum of mutations found in schizophrenia into a system that takes into account constantly changing environments and evolution.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.