After over a quarter century of molecular genetics work in schizophrenia, advances in biotechnology and statistics applied to the study of large and well characterized clinical samples have made possible the discovery of individual susceptibility loci with subsequent replication. A comprehensive discussion of what comes next after a successful GWAS is outside the scope of this manuscript. We have selected for discussion a handful of issues that have been instrumental to generate progress until now, and are a foundation for further progress. First, the social environment where science is conducted has deeply changed during the last years. Of fundamental importance is the accentuated stance of new openness in the field of schizophrenia genetics in consonance with the instituted NIH policy of wide GWAS data sharing. De-identified phenotypic and genotypic data from GWAS studies funded by NIH are to be submitted to a centralized NIH GWAS data repository, the database of Genotypes and Phenotypes (dbGaP,
www.ncbi.nlm.nih.gov/gap) hosted by the National Center for Biotechnology Information (NCBI), and studies supported the Wellcome Trust Case-Control Consortium are also deposited at a database (WTCCC,
www.wtccc.org.uk). Data, in both cases, are accessible by application to access committees. The new NIH policy (grants.nih.gov/grants/gwas) has already created extraordinary opportunities to access data from independent research groups before publication
174. For example, the MGS Genetic Association Information Network (GAIN) genotypic/phenotypic sample (
www.genome.gov/19518582) has been accessed 140 times for a large diversity of genetics research projects as of 11/17/09. The Psychiatric GWAS Consortium (PGC)
175,
176 continues this new tradition of openness. The PGC is comprised of five groups: schizophrenia, bipolar disorder, major depressive disorder, attention deficit hyperactivity disorder (ADHD), and autism. A primary goal of the PGC is to perform disease specific and cross-disorder analyses from combined GWAS datasets composed of all qualifying samples for each of the disorders.
The method for following up GWAS results needs to be thorough; replication and fine mapping of associated regions are necessary for further progress (see informative review
177). The preferred approach is to combine GWAS data from independent studies, but when some of the samples do not have GWAS data, focused genotyping is still useful, although less informative. The analysis of combined data is important because most clinical samples do not carry the power to detect effect sizes typically uncovered in well-powered GWAS
178, and the estimated ORs tend to be inflated because only top-ranking associations are reported
179; a less biased estimation of ORs requires the systematic combination of GWAS and focused replication studies. Data can be meta-analyzed with a variety of methods (see comprehensive review
177). For example, three consortia
89,
102,
103 meta-analyzed a set of their most significant
p-values (
p<0.001) from their independent GWAS uncovering a genome-wide significant locus at the MHC. SNPs other than genome-wide significant (
p<5×10
−8) ones merit inclusion in confirmation experiments: while some genome-wide significant SNPs from a single study might not be confirmed in replication studies, other SNPs very highly ranking in the primary study, though not achieving genome-wide significance (e.g., SNPs with
p<1×10
−5), might surpass that threshold in a replication experiment. Association signals in an extended LD block that spans many genes (e.g., the MHC locus implicated in schizophrenia) make it hard to disentangle which gene/s is/are likely to be causal. Populations of non-European ancestry might have some non-overlapping susceptibility loci and it is fundamental to investigate these differences, as they can also be informative about different environmental risk factors. An important characteristic of African populations (e.g., AA) is reduced LD, which would facilitate the narrowing of the associated genomic intervals; existing limitations of CNV and SNP map coverage, and imputation of AA datasets are currently being addressed (for examples, see
180,
181).
It is important to bear in mind that given the GWAS SNP design (where SNPs are selected because they are common and are informative of many other SNPs, not because of their functional properties), in most cases the associated SNPs are probably not the causal SNPs. As previously mentioned, we have noted that in the MGS GWAS and in the combined sample (MGS, ISC, and SGENE-plus
89,
102,
103), the vast majority of the strongest common SNP associations were not located in coding sequences where such a signal would be easier to interpret, but are in intergenic regions (over half of these SNPs >10 kb from a gene, almost all with no clear association to any known gene, i.e., via LD) or of unclear function, e.g., intronic, but not near a splice site, or known or putative regulatory site. Although the causal SNPs should be in LD with the GWAS associated SNPs, the causative genes may be close to the statistically associated locus, but may also be farther removed, even on a different chromosome. For example if the causal variant was a
trans-acting factor that regulates transcription, the regulated gene/s might be located on a different chromosome. The integration of genome-wide transcription data (expression quantitative trait loci, eQTL, currently detected by microarray expression data) and GWAS data (DNA variation data) can help close this gap by linking the GWAS statistical results and biology and is expected to lead to discoveries of mechanisms of disease susceptibility otherwise obscured to either method in isolation. The approach has been proven successful in asthma
182. Interestingly, within the MHC region implicated in schizophrenia, there are more than 10
cis-eQTL (
cis meaning nearby on the same chromosome) in the eQTL database, which uses expression data from lymphoblastoid cell lines (LCLs) of asthma patients ()
183, and the SNP showing the most significant association with schizophrenia, rs13194053 with
p=9.54×10
−9, is in strong LD (r
2=0.43) with a SNP showing the strongest association with
BTN3A2 expression ().
The selection of tissue for gene expression study is critical, and brain is not always necessarily the best choice of tissue. Epidemiological evidence strongly suggests that some of the primary genetic mechanisms leading to schizophrenia might reside in other tissues than brain – for example, an autoimmune mechanism that would compromise the brain – in such a case, the symptoms of schizophrenia would still reflect brain dysfunction, but would be removed from the primary noxae (investigations of these leads remain to be thoroughly explored). A more explicit example would be if a genetic abnormality affecting an immunological response to a virus contributed to schizophrenia risk, studying the brain transcription characteristics of a neurotransmitter system would only reflect secondary (or even terminal) neural changes to the primary immune response (which might be more apparent in immune tissue such as lymphocytes).
Establishing causal mechanisms may require, in addition to statistical testing of association, the functional characterization of implicated genes and variants in simple cell models (and in model organisms) targeting phenotypes with a high probability of association with the studied disorder – among other potential advantages, in the absence of buffering effects present in multicellular systems,
in vitro effects are expected to be amplified (which may make detectable an effect that is very small in the whole/intact organism). Dendrou et al.
184 studied cell-specific protein phenotypes for
IL2RA, a locus associated with T1D. By taking advantage of a large collection of normal donors from whom fresh, primary cells could be analyzed (the experimental subjects can be recalled for repeated measurements; this resource is known as the Cambridge BioResource) it was very elegantly demonstrated that elevated CD25 expression is associated with
IL2RA haplotypes that protect from T1D
184.
It is still premature to conclude whether the genetic architecture of schizophrenia is like mental retardation where thousands of individual genetic disorders have been cataloged, or whether some widely speculated upon, but still little investigated mechanism such as epigenetics (which influences phenotypes through the regulation of gene expression) or gene-environment interactions will explain the bulk of the missing heritability for the disorder. Basic genomics research has produced major breakthroughs during recent years such as discovery of microRNAs, long-range promoters, epigenetic factors, and variable copy number variations, and many more will probably be made as our knowledge of the genome is rapidly increasing. It should not be surprising if still unknown genetic mechanisms will, at the end, explain a substantial proportion of the heritability of schizophrenia. Nonetheless, the task of defining the spectrum of molecular genetic mechanisms in schizophrenia is now at the forefront of our field. Some immediate research efforts will, in large measure, focus on whole genome re-sequencing and genome-wide gene transcription and epigenetic analyses. Rapid progress in biotechnology
185 is making the study of rare variants in many genes or large genomic regions in larger samples increasingly feasible – proof of principle is provided by the 1,000 genomes project (
www.1000genomes.org), which is designed to build a deep catalog of human genetic variation. The design of experiments aimed at fine mapping of regions of association and the precision of imputation will both benefit from this project.
It is anticipated that as genetic discoveries accumulate, the application of a myriad of tools from systems biology (e.g., genomics, transcriptomics, proteomics, etc.) will lead to a delineation of biological pathways involved in the pathophysiology of schizophrenia, and eventually to new therapies (developments in treatment still lag compared to discoveries of new genetic associations for complex disorders, see
186, but this situation is expected to change as biological research makes inroads into still purely statistical associations). There is, however, a strong temptation to accept the simplest observations (i.e., those with immediate biological connotation) as the most meaningful and the only ones that merit follow-up. For example, Mitchell and Porteous stated: “Occam’s razor and statistical probability both argue that the co-inheritance of one or just a few risk genes by any individual case is the more likely explanation for the majority of incidence”
187. They continued: “Haven’t we learnt more about disease mechanism and potential routes to the treatment of Alzheimer’s disease from the rare variant examples of amyloid beta (A4) precursor protein (APP), presenilin-1 (PS1) and presenilin-2 (PS2) than from the archetypal common variant example of apolipoprotein E, isoform 4 (ApoE4)?” These arguments appear necessarily true at first sight, however, as previously discussed
188, an explanation may superficially appear more complicated than need to be, but only if considered apart from its evolutionary context
189. Research in model organisms (e.g., Drosophila) shows that most phenotypes are the result of complicated genetic architectures: multiple genes, often showing pleotropy (thus likely associated with multiple traits) and epistasis, and even single mutation effects differing with genetic background and environment
190,
191, and this landscape will probably be true for complex human behavioral traits as well. Explanations relying on single genes are unlikely to capture the fundamental complexity of most human complex traits, and all the associated genetic variation needs to be pursued to understand the pathophysiology of a complex disorder. A task of utmost importance is the integration of the spectrum of mutations found in schizophrenia into a system that takes into account constantly changing environments and evolution.