|Home | About | Journals | Submit | Contact Us | Français|
The purpose of this invited review is to summarize the state of genetic research into the etiology of schizophrenia (SCZ) and to consider options for progress. The fundamental uncertainty in SCZ genetics has always been the nature of the beast, the underlying genetic architecture. If this were known, studies using the appropriate technologies and sample sizes could be designed with an excellent chance of producing high-confidence results. Until recently, few pertinent data were available, and the field necessarily relied on speculation. However, for the first time in the complex and frustrating history of inquiry into the genetics of SCZ, we now have empirical data about the genetic basis of SCZ that implicate specific loci and that can be used to plan the next steps forward.
The major goal of SCZ genetic research is to develop a complete list of genetic loci and pathways that confer risk or protection. Given how little we know for certain about this enigmatic, clinically heterogeneous, and genetically complex disorder, even one unequivocal insight would be of immense value. The enormous advantage of genetic studies compared with virtually all other human biomarker studies is that the key element of causation is present: exposure to a genetic risk factor begins at conception and prior to disease onset, and thus temporality is satisfied.
The second but more distal goal requires completion of the first: to reduce SCZ incidence, to minimize time-to-treatment for incident cases, and to reduce morbidity and mortality in cases with SCZ. We stress that this is an ultimate and not a proximal goal.
First, generations of physicians have evaluated the family histories of probands with SCZ. It is notable that this informal surveillance network has not identified any pedigree with SCZ segregating in a Mendelian fashion. This stands in contrast to many other complex traits where Mendelian subforms have been identified. For example, small proportions of cases with breast cancer, Alzheimer's disease, and type 2 diabetes mellitus (T2DM) are caused by single gene mutations that have very high penetrance and are typified by early age of onset. Notably, genetic evaluation of childhood-onset SCZ cases has not yielded causal mutations.17 Because the clinical surveillance network has been nonsystematic, the main conclusion can only be that SCZ is unlikely to have Mendelian subforms.
Second, a large body of genetic epidemiological studies has provided the rough outlines of the genetic architecture of SCZ. There is strong but indirect evidence that SCZ is familial and highly heritable. Third, application of segregation analysis to SCZ pedigree data has been inconclusive. Consistent with inference from the clinical surveillance network, it is possible to reject a few extreme models (eg, exclusively dominant or recessive Mendelian models). However, many other genetic models are consistent with the data.
Fourth, evaluation of “microscopic” genomic changes using cytogenetic methods has been informative. The most notable finding was the identification of the 22q11.2 deletion, a rare, potent, and nonspecific risk factor for SCZ. Its prevalence in cases is ~0.3% with genotypic relative risk of ~20.18 This deletion does not act in a Mendelian fashion: a minority of individuals with this deletion develops SCZ and it increases risk for multiple other neuropsychiatric disorders along with the somatic features of velo-cardiofacial syndrome.19
Fifth, there have been over 30 genome-wide linkage studies of SCZ along with a meta-analysis.9 No genomic region emerged that exceeded genome-wide significance and which was reproducible across studies. These results are typical for complex biomedical diseases.
Sixth, hypothesis-driven candidate gene studies have been a major focus in SCZ research with the SZGene database listing >1400 studies since 196520 (for comparison, there are ~2200 PubMed citations for “SCZ randomized controlled trials”). This body of work has not yielded associations that meet modern criteria for replication.21 Indeed, the nonsystematic nature of this search has led to mistakes (eg, TCF4 has strong evidence of association with SCZ and yet has multiple negative studies of the wrong variant). There are major problems with the hypothesis-driven candidate gene approach.
In sum, prior to 2008, the cytogenetic finding of an association of SCZ with the 22q11.2 deletion was the only robust and reproducible genetic association for SCZ.
GWAS have yielded a plethora of findings23,24 that meet modern criteria for replication in human genetics.21 A “primer” is available.25 Since 2005, >700 GWAS have been published; considering findings exceeding a conservative significance threshold (P < 5 × 10−8), GWAS have implicated ~1500 genetic markers for 101 human diseases and 124 biomedical traits (eg, height, body mass index (BMI), and lipid levels). GWAS have produced more etiological knowledge than virtually any other technology in the history of medicine, save for clinical microbiology and radiology.
There are 8 published SCZ GWAS of European samples that used individual-level genotyping26–33 (table 2). In addition, 2 studies included African-American samples,31,33 one subjects of Japanese ancestry,34 and 2 used less reliable DNA pooling methods.35,36 By current standards in human genetics, the sample sizes for virtually all of these studies are small. Therefore, the Psychiatric GWAS Consortium37 has conducted an integrated mega-analysis of all available GWAS data on European samples (Schizophrenia Psychiatric Genome-Wide Association Study Consortium, Submitted12).
In contrast to the paucity of findings from prior methods, GWAS has “worked” for SCZ in terms of identifying common genetic variation that meet modern standards for replication and significance in human genetics. These findings include:
The MHC finding has been criticized as being due to bias or artifact. However, the empirical results meet accepted criteria for replication in human genetics, the P value exceeds chance by 10000×, there are consistent effects across samples, and appropriate control for population stratification does not explain away the association. Moreover, the MHC region has not emerged in analyses of other psychiatric disorders (eg, attention-deficit hyperactivity disorder, autism, bipolar disorder, major depressive disorder, and smoking behavior) using similar analytic methods and samples from overlapping sites. The MHC region emerges in only ~25% of diseases studied using GWAS (eg, multiple sclerosis (MS), rheumatoid arthritis, and systemic lupus erythematosus [SLE]). The association of genetic variation in the MHC with SCZ thus appears robust.
For the first time, we have a direct and empirical view of the genetic architecture of SCZ. The data tell a clear story—genetic variation for SCZ exists with frequencies from very common to rare. The risk conveyed by these variants is inversely associated with frequency. Empirical data about the allelic spectrum of risk for SCZ are depicted in Figure 1. In the upper left are rare CNVs of strong effect, common variants of quite subtle effect are on the lower right (red and yellow dots) and a polygenic signal (turquoise dots plus a blue best fit line).
The data that generated this graph provide an answer to a question that has bedeviled SCZ genetics for a century: is the syndromic entity SCZ a collection of rare Mendelian/Mendelian-like disorders or is it due to a large set of polygenes? SCZ is both.
The “map” of the genetic architecture of SCZ in Figure 1 is not the final draft. Larger GWAS will yield greater numbers of significant loci. More comprehensive analyses could yield more CNVs, and whole-genome, regional, and exome-sequencing efforts might add loci that are uncommon or rare.
Very large samples are essential. Empirical data from >200 different human traits provide guidance for the determinants of the “success” of a GWAS. The single most important factor is sample size: very large samples by historical standards are required (a corollary is that negative results from small studies are meaningless). This relationship is illustrated in Figure 2 for SCZ.
Combining data across samples is valid. One argument that is not supported by data is that combining samples across different sites and countries will introduce crippling heterogeneity or bias. This argument has little support because there are dozens of examples where different studies have been combined to augment power. For example, height is surprisingly difficult to assess and yet a meta-analysis of 46 studies yielded compelling and coherent genetic results.39
Obey the laws of probability. Basic algebra classes include combinatorics and elementary probability. Application of these basic mathematical principles yields a conclusion of exceptional relevance to psychiatric genetics: gambling is not a strategy for progress. More specifically, everyone has to pay the price of multiple comparisons to avoid crippling type 1 error,22 integrated replication is essential,21 and underpowered studies are not worth doing. These principles are widely known but not universally appreciated.
Highly significant and replicated loci for SCZ typically have genotypic relative risks ~1.10. For replication of a specific association, the effect size in an initial study overestimates the actual value (ie, “winner's curse”)41 and high power is desired (90%). For one maker, 11000 subjects are required (5500 cases and 5500 controls). Sample sizes increase to 17500 subjects for 10 markers and 24000 subjects for 100 markers. These sample sizes are more than an order of magnitude larger than historically typical for the SCZ candidate gene association field.
Hypothesis-driven approaches have not generally worked. In human complex disease genetics, empirical results have identified “usual suspects”42 such as the MHC locus for type 1 diabetes mellitus (T1DM) or APOE for Alzheimer's disease. However, these examples are infrequent, and the vast majority of high-confidence results from genetic studies point to unsuspected loci. Therefore, SCZ researchers need to question seriously our most cherished ideas about the etiology of SCZ. We recently evaluated historical candidate genes for SCZ in comparison to GWAS findings and found essentially no overlap. The hypothesis-driven candidate genes for SCZ that have been studied the most had no indication of common-variant signal (ie, COMT, DRD3, DRD2, HTR2A, NRG1, BDNF, DTNBP1, and SLC6A4) (Collins et al, Submitted10). The status of “the special gene,” DISC1, is particularly unclear: there is no common-variant signal, rare variation has not been found in resequencing of large samples, and reevaluation of the initial report suggests the name “disrupted in SCZ” is a misnomer. In this carefully assessed pedigree, the propositus had conduct disorder, and SCZ is a minor phenotype associated with the (1;11)(q42;q14.3) translocation (38% normal/other, 34% recurrent major depression, and 24% SCZ).43
In the complex and difficult history of SCZ genetics, many different technological approaches have been tried (table 1). To date, however, the only proven approaches are GWAS and assessment of CNVs. Should we push for more GWAS for SCZ?
First, as noted above, GWAS has an impressive track record of success in human complex disease genetics as well as in SCZ genetics. It has delivered where most other technologies have failed. Second, GWAS is a mature and inexpensive technology. Quality control, imputation, and analysis are readily accomplished. There are large amounts of data that can be used for comparisons. The cost can be as low as $250/sample. Third, most GWAS arrays contain single nucleotide polymorphisms (SNP) content sufficient to capture the majority of common variation in European and many other world populations along with content for large CNVs.
Fourth, we have empirical data that allow prediction of what might be discovered if we were to conduct GWAS on more SCZ cases. Given that research on the genetics of SCZ is 3–4 years behind other biomedical diseases, we can look at the data for SCZ in relation to other complex human traits instead of relying on assumption-laden predictions. We used the NHGRI GWAS catalog23 to identify studies for 11 complex traits including SCZ. We reviewed 104 studies and included 72 (individual genome-wide genotyping of European subjects). Each study was reviewed at least twice to capture the number of cases analyzed and the number of genomic regions that exceeded genome-wide significance (P < 5 × 10−8). For each trait, we fit a regression line (number of regions ~ number of cases) to estimate the relation of these variables. There was a significant relation for 8 of the 11 traits.
The estimates in table 3 reveal intriguing lessons about the “mapability” of these 11 complex traits and provide a glimpse into their genetic architectures. The slope estimates (ie, the number of genome-wide significant regions per 1000 cases) vary widely, from 0.1 for BMI to over 3 for Crohn's disease and SLE. Intriguingly, SCZ is in the middle of the pack and about the same as T2DM, lung cancer, age-related macular degeneration (AMD), and MS. However, when we estimate the number of cases required before the first genome-wide significant result, SCZ is the worst of the complex diseases in table 3 (the continuous traits of height and BMI are on a different scale). This is also apparent in Figure 2 where a “hockey-stick” relation can be imagined. The major correlates of the number of significant regions are sample size and heritability (Spearman ρ both ~0.6). Because heritability is not under experimental control, increasing sample size is therefore the way to increase the yield of GWAS (on average).
The estimates in table 3 encapsulate broad aspects of the genetic architectures of these traits (ie, the number of loci and effect size distributions as well as the inherent noisiness of phenotypic assessment). We do not see evidence that SCZ is qualitatively different save for a higher minimum number of cases to first detection (which we suspect reflects that there is no common genetic variant of strong effect as for AMD and T1DM combined with phenotype imprecision). SCZ is wonderfully typical.
What would happen if the numbers of SCZ cases were increased? In order to achieve the power of the most successful GWAS to data,39 50000 SCZ cases and 50000 controls are required (Ya44 Around 12000 SCZ cases with GWAS data are now available. If the sample size were increased to 100000, there would be 2 immediate yields: the number of loci exceeding genome-wide significance can be predicted to total 21 and, equally important, the rank order of the SNPs would greatly improve meaning that pathway analyses would become far more reliable.
Continuing with GWAS would be one of the better bets for progress in the history of SCZ genetic research.
For many complex traits, the amount of variance due to genome-wide significant loci (R2) is a small portion of the overall heritability (<10%). Some have argued that this “missing heritability” is a reason why GWAS has failed and that their results do not matter.
We do not find this objection compelling. First, the goal of genetic studies of SCZ is to find pathways and loci that are strongly and robustly associated with disease. R2 should not be the criterion for success because it is more relevant for individualized medicine (a distal not proximal goal). Second, we have remarkably poor heritability estimates for many complex traits meaning that this criterion is imprecise and subject to bias. Third, if sample sizes are too small, the genome-wide significance bar is conservative, and the impact of true effects that are not quite significant are missed. This issue is compounded by the fact that the current generation of GWAS SNP arrays imperfectly assess common genetic variation. If analyses account for these considerations, it can be seen that heritability is “hidden” rather than missing.45,27,38 Indeed, 2 independent analyses suggest that common variants account for about a third of the variance in liability for SCZ.
Finally, some have argued that GWAS findings that do not have immediately obvious functional significance are irrelevant. This is a weak argument. Appropriate experimentation is required, and the literature is replete with examples of GWAS findings of mechanistic importance that emerged only after follow-up molecular work.
There have been spectacular advances in sequencing technologies. It is now feasible to resequence all known exons and even whole genomes. Costs are likely to decline but these are expensive technologies. At the time of this writing in Q1/2011, confident sequencing of an exome costs about $US 3000 and a genome costs $12000. In the ideal situation, we would obtain genome resequencing for large collections of SCZ cases.
Does it make sense to shift direction entirely to sequencing as some have argued? It is unfortunate that sequencing has already been the focus of considerable “hype” (in contrast to GWAS where many investigators argued for conservative expectations).46 There are reasons for caution, and we need to be mindful of painfully learned lessons from the past.
Psychiatric genetics has always underestimated the necessary sample sizes by several orders of magnitude. Although sequencing can discover causes of Mendelian disorders in small samples,44 SCZ is unlikely to have Mendelian subforms. Quick successes are unlikely, and sequencing efforts could prove to be far more complex than predicted and very large samples are likely to be required. Exome and whole-genome sequencing works best if SCZ is caused exclusively or nearly so by rare variants that are not readily detected using the current generation of genotyping arrays. This strong assumption is inconsistent with empirical data.38
Sequencing is an appealing technology. However, we need to be realistic about what it might yield, and pay particular attention to its underlying assumptions and limitations. We need to be a bit jaded and worldly: many shiny new technologies have been applied to SCZ with great fanfare that ultimately failed to deliver. We should be interested, appropriately skeptical, and resist efforts to rely on a single technological approach.
The main goal of genetic studies of SCZ is to identify pathways that confer risk and protection. For the first time, there is demonstrable progress toward this end as SCZ appears to be a relatively highly polygenic disease. If this knowledge is developed far more completely, we may be able to describe the basic mechanisms that go awry in the pathogenesis of SCZ (eg, the miR-137 hypothesis described above). Such knowledge can deliver compelling biological hypotheses to fuel more refined and specific investigations into the causes of SCZ.
Genetic approaches—particularly GWAS—are working. The field needs to stay focused on what has worked in order to maximize progress.
National Institutes of Health (NIH) grants (R01 MH080403, MH077139 to PFS); NIH Building Interdisciplinary Careers in Women's Health award (K12 HD01441 to SZ); NIH grant (T32 MH076694 to ST).
Conflicts of Interest: The authors report no conflicts. Author Contributions: All authors reviewed and approved the final version of the manuscript. PFS wrote the manuscript, YK coordinated the literature review, and YK, SZ, ST, and PFS conducted the literature review.