|Home | About | Journals | Submit | Contact Us | Français|
Schizophrenia is a heritable disorder with substantial public health impact. We conducted a multi-stage genome-wide association study (GWAS) for schizophrenia beginning with a Swedish national sample (5,001 cases, 6,243 controls) followed by meta-analysis with prior schizophrenia GWAS (8,832 cases, 12,067 controls) and finally by replication of SNPs in 168 genomic regions in independent samples (7,413 cases, 19,762 controls, and 581 trios). In total, 22 regions met genome-wide significance (14 novel and one previously implicated in bipolar disorder). The results strongly implicate calcium signaling in the etiology of schizophrenia, and include genome-wide significant results for CACNA1C and CACNB2 whose protein products interact. We estimate that ~8,300 independent and predominantly common SNPs contribute to risk for schizophrenia and that these collectively account for most of its heritability. Common genetic variation plays an important role in the etiology of schizophrenia, and larger studies will allow more detailed understanding of this devastating disorder.
Schizophrenia is an idiopathic mental disorder with substantial morbidity, mortality, and personal and societal costs. 1-3 An important genetic component is indicated by a sibling recurrence risk ratio of 8.6, high heritability estimates (0.64 in a national family study,0.81 in a meta-analysis of twin studies, and 0.23 estimated directly from common SNPs), and prior genomic findings. 4-8
Although the rationale for genomic searches is strong, there are only a handful of robust empirical findings for schizophrenia. Genome-wide linkage studies to date have been inconclusive, 9 and no compelling Mendelian variants have been identified.8 Eight rare copy number variants of strong effect (genotypic relative risks 4-20)with consistent replication have been described (e.g., 16p11.2 and 22q11.21); however, these associations are generally not disease-specific and can also be associated with autism, mental retardation, or epilepsy. 8Initial exome sequencing studies have not yet identified specific variants of unequivocal genome-wide significance 9-13 although larger studies are in progress. Prior GWAS for common variation have yielded statistical evidence for ~10 genomic regions8 including the major histocompatibility complex (MHC) 14-16 along with MIR137 and targets of miR-137. 17
The prior studies contained indications that more common variant associations were likely to be discovered with larger sample sizes. 13,17,18 We therefore sought to increase substantially the number of cases using a multistage GWAS.
We analyzed genome-wide data in 5,001 schizophrenia cases and 6,243 controls from a population-based sampling frame in Sweden (N=11,244, Table 1). Most subjects (57.4%) have never been previously reported. Following genotyping and imputation with the 1000 Genomes Project Phase 1 reference panel, the genetic data consisted of allelic dosages for 9,871,789 high-quality polymorphic SNPs. Given that this imputation panel is based on >800 chromosomes of European ancestry and includes the detail afforded by genome sequencing, we anticipated increased power in finding and describing association signals. Indeed, we observed 10,201 SNPs and 187 genomic regions with P < 1×10−5 using 1000 Genomes imputation compared with 1,594 SNPs and 133 regions for HapMap3 imputation (counts include only one region from the MHC).
The resulting λGC was 1.075 and λ1000 (references 19-21) was 1.013. Quantile-quantile and Manhattan plots are given in Supplemental Figures 5-6. For association with schizophrenia, 312 SNPs met a genome-wide significance threshold of 5×10−8 (reference 22). These SNPs were in two genomic regions (Supplemental Figure 7): 241 SNPs in the MHC region (chr6:28,502,794-32,536,501, minimum P=4.07×10−11 at rs115939516) and 71 SNPs from chr2:200,715,388-201,040,981 (minimum P=3.33×10−10 at rs35220450). We replicated the MHC association reported in prior studies. 14-17 The chr2 association with schizophrenia is novel, shows highly consistent effects in the Sw1-6 genotyping batches and encompasses C2orf69, C2orf47, C2orf60, and TYW5.
We re-analyzed the PGC schizophrenia data using 1000 Genomes imputation (8,832 cases and 12,067 controls, excluding Swedish samples). 17 Five regions met genome-wide significance: the MHC locus (chr6:27,261,324-32,610,445, minimum P=2.18×10−10), AS3MT-CNNM2-NT5C2 (chr10:104,635,103-104,960,464, minimum P=4.29×10−10), MAD1L1 (chr7:2,005,747-2,098,238, minimum P=2.40×10−8), RP11-586K2.1, (chr8:89,585,639-89,760,620, minimum P=2.37×10−8), and SNPs nearTCF4 (chr18:53,311,001-53,423,307, minimum P=3.00×10−8).
We then conducted a meta-analysis of the Swedish and independent PGC schizophrenia samples using the same quality control, imputation, and analysis pipeline. This GWAS meta-analysis of 13,833 schizophrenia cases and 18,310 controls (Table 1) afforded power to detect genotypic relative risks of 1.10-1.14 for reference allele frequencies 0.15-0.85 (power=0.8, α=5×10−8, log-additive model). We evaluated the comparability of the Swedish and PGC studies using sign tests: of 608 SNPs selected from the PGC results with P < 0.0001 and in approximate linkage equilibrium, 62.6% had logistic regression beta coefficients with the same sign in the Swedish results, an observation highly inconsistent with the null (P=2.2×10−10). λGC was 1.186 and λ1000 was 1.012, values consistent with a polygenic pattern of association but not gross inflation due to technical artifacts. 20 Quantile-quantile and Manhattan plots are shown in Supplemental Figure 11 and Figure 1, and genome-wide significance was exceeded by 3,538 SNPs in 12 genomic regions.
We used risk score profiling14,17 to evaluate the capacity of 130K SNPs derived from the PGC to predict case-control status in the Swedish samples. These SNPs were selected for high-confidence and approximate linkage equilibrium but without regard to association P value. As shown in Figure 2, PGC risk scores had a highly significant capacity to predict case-control status in the independent Swedish samples (P values from 10−26 – 10−114). The increased sample size allowed improved risk profile prediction as more of the SNPs in the lower bins are replicable signals. The threshold at which the explanatory power of these risk profile SNPs plateaus has decreased with increasing sample size: PT=0.1 in Figure 2, 0.2 in the PGC report, and no plateau in the ISC study). 14,17 Although the mean risk profiles were highly significantly different between cases and controls, the distributions overlap substantially (Supplemental Figure 9) and are insufficient for diagnostic purposes (area under the receiver operating characteristic curve 0.65). However, these results strongly support the comparability of the Swedish and PGC samples and the validity of the meta-analysis.
GWAS often omit the X chromosome (chrX). This omission is problematic as chrX is approximately as large as chromosome 8 and is enriched for genes important in brain development. Using a previously described approach, we imputed genotyped chrX SNPs to the 1000 Genomes reference panel. 23 Joint analysis of all subjects as well as males and females separately revealed no association exceeding genome-wide significance. The strongest association (rs12845396, chrX:6,029,533, P=3.46×10−7) was in an intron of NLGN4X (neuroligin 4), a gene previously implicated in mental retardation and autism, and there were multiple possible signals nearMECP2(causal to Rett syndrome, P=9.3×10−6).
GWAS results generally do not lie in protein coding regions. 24 A recent report suggested that most SNPs in the NHGRI GWAS catalog 24 were in or in perfect LD with DNase 1 hypersensitive sites. 25 We thus evaluated whether the Sweden + PGC results had significant overlap with DNase 1 hypersensitive sites generated as part of the ENCODE project. 26 We did not find evidence of enrichment (Supplemental Table 8 and Supplemental Figure 10). However, this negative result is strongly qualified by the lack of DNase 1 hypersensitivity data directly relevant to psychiatric disorders.
We then obtained association results for SNPs in 194 genomic regions in six independent samples for a total sample size of over 21,000 cases and 38,000 controls(Table 1). The genomic regions for which replication genotypes were sought were identified using LD clumping defined by LD (r2> 0.5) and a minimum P < 1×10−5 in the Sweden-PGC meta-analysis. Only one MHC SNP was included. The Sweden-PGC meta-analysis and replication results were highly concordant with 76.3% of the logistic regression beta coefficients having the same direction of effect (sign test P=1.5×10−17). Indeed, of the top 100 SNPs in the Sweden-PGC meta-analysis, 90% had the same sign in the replication results. This result strongly suggests that many more loci will achieve genome-wide significance with further increases in sample size.
Table 2 shows the combined results in which 24 regions reached genome-wide significance. As two pairs of these regions overlap (chr1:243Mb and chr5:152Mb), there are associations with schizophrenia in 22 genomic regions. Three additional regions nearly met genome-wide significance (rs4380187 near ZNF804A P=5.66×10−8, rs4523957 in SRR P=5.69×10−8, and rs6550435 near TRANK1 P=5.86×10−8 which also had P=9×10−6 in a bipolar disorder GWAS). 27
Of these 22 regions (Table 3), five regions have been reported previously as meeting genome-wide significance for schizophrenia (MHC, C10orf26, DPYD-MIR137, SDCCAG8, and MMP16) and two for schizophrenia, bipolar disorder, and a combined phenotype (CACNA1C and ITIH3-ITIH4). 14-17,27-29 For the remaining 15 regions, we now find genome-wide significance for a locus previously implicated only for bipolar disorder (NCAN)30 along with 14 novel regions.
We highlight four themes from these results (see also Supplemental Table 9). First, these results implicate calcium signaling in the etiology of schizophrenia. As in prior studies of bipolar disorder and schizophrenia, 17,27,28 we found genome-wide significant support for CACNA1C (Cav1.2, P=5.2×10−12 at the intronic SNP rs1006737). Intriguingly, we identified a novel genome-wide significant association for CACNB2 (P=1.3×10−10 at the intronic SNP rs17691888) which encodes the β2 subunit of L-type calcium channels (Cav β2). A gene-set test supported the involvement of calcium channel subunits in the etiology of schizophrenia (Supplemental Table 7).
In L-type calcium channels, the α1c subunit forms the transmembrane pore, and directly interacts with the intracellular β2 subunit. 31 The β2 subunit also antagonizes an endoplasmic reticulum retention motif on the α1c subunit to facilitate transport to the plasma membrane. 32Additional genes with genome-wide significant evidence were implicated based on membership in a proteomic network centered on Cav2 (reference 33): the protein products of ACTR1A (α-centractin), the divalent metal cation transporter CNNM2 (P=3.7×10−13, chr10:103,009,986-105,512,924), and CACNB2. A broad genomic region containing the calcium binding protein troponin C (TNNC1) also met genome-wide significance (P=1.1×10−8) as well as three calcium homeostasis modulator genes (CALHM1, CALHM2,and CALHM3 in same chr10 region as CNNM2).
The genetics and biology of calcium channels have been the subject of considerable investigation owing to their importance in fundamental neuronal processes and human diseases. L-type voltage-gated calcium channels are involved in learning, memory, and synaptic plasticity, and CACNA1C knock-out mice show notable deficits in long term potentiation. 34-37 Calcium “channelopathies” include mutations in CACNA1C and CACNB2 that cause Brugada syndrome types 3 and 4 (OMIM #611875 and #611876). 38In addition, Timothy syndrome (OMIM #601005), caused by mutations in CACNA1C, is a multisystem disorder including cognitive impairment and autism spectrum disorder. 39 Although Mendelian disorders are usually characterized by persistent pathological features, Mendelian calcium channelopathies can have episodic phenomena perhaps reminiscent of the episodic nature of psychotic disorders – for example, intermittent hypoglycemia and hypocalcemia in Timothy syndrome (CACNA1C), episodic ataxia (CACNA1A, CACNB4), migraine (CACNA1A), epilepsy (CACNA1H, CACNB4), periodic paralysis (CACNA1S), and malignant hyperthermia (CACNA1S, CACNA2D1). 31,39
GWAS findings for schizophrenia have converged on genome-wide significant evidence for a calcium channel functional complex that has also been implicated in bipolar disorder and autism. These genomic results support increased attention to this pathway, and suggest hypotheses for clinical translation. Multiple approved medications act at calcium channels including some antipsychotics (e.g., pimozide) along with adjuvants for treatment non-response for schizophrenia and bipolar disorder (e.g., the calcium channel blockers verapamil and nifedipine). It is possible that drugs that act on the protein products of CACNA1C and CACNB2 for a different therapeutic indication could be “re-purposed” for the treatment of schizophrenia. For example, there has been at least one clinical trial of the efficacy of isradipine in bipolar disorder (an approved antihypertensive acting at the protein product of CACNA1C, R Perlis, personal communication). In addition, given that many approved antipsychotics increase the cardiac QT interval, genetic variation in calcium channel genes might identify individuals at higher risk of sudden cardiac death. 40,41
Second, as reported previously, 14-17 the strongest association P=9.1×10−14)with schizophrenia is in the extended MHC (chr6:25-34Mb), a region of both exceptional importance and complexity. The MHC comprises 0.3% of the genome but contains 1.5% of the genes in OMIM 42 and 6.4% of genome-wide significant SNP associations in the NHGRI GWAS catalog. 24 It is the second most gene-dense genomic region and has high LD over its extent. We speculate that these features (high gene density and strong LD) combined with the polygenicity of schizophrenia lead to the strong association but will also complicate efforts to identify causal variation. Genome-wide significant associations with schizophrenia extend over 7Mb, but Supplemental Figure 12 suggests that larger samples may resolve this association into sub-regions near TRIM26 (tripartite motif containing 26, chr6:30.1Mb) and the HLA-DRB9 unprocessed pseudogene (chr6:32.4Mb, intergenic HLA-DRA – HLA-DRB5).
Third, multiple genomic lines of evidence support a role for MIR137 in the etiology of schizophrenia. We provide increased support for a common variant association located upstream of the MIR137 transcript (P=1.7×10−12, Supplemental Figure 13). Fourteen genes in the regions in Table 3 have miR-137 target sites predicted by TargetScan (v6.2) 43 (C6orf47, HLA-DQA1, TNXB, VARS, C10orf26, CACNA1C, DPYD, CACNB2, TSSK6, NT5DC2, PITPNM2, SBNO1, ZEB2, and PRKD3). Using gene-set analysis, we evaluated whether genes with predicted miR-137 target sites were enriched for smaller association P values. We confirmed the PGC result 17 and extended the finding by showing more robust enrichment in afar larger set of genes with predicted miR-137 target sites (Supplemental Table 7). In addition, our unpublished work shows enrichment for smaller GWAS P-values in genes down-regulated following over-expression of miR-137 in human neural stem cells (Collins, in preparation). Given the role of miR-137 in fundamental neuronal processes, 44-46 these results support investigation of pathways influenced by miR-137in regard to a role in the pathogenesis of schizophrenia.
The SNP with the strongest association to schizophrenia (rs1198588) is 39kb upstream of MIR137, and might regulate the transcription of MIR137. However, this has not been proven experimentally and there is another candidate gene in the region. rs1198588 is in an LD block that includes DPYD (169kb upstream of rs1198588),and rs1198588 is a significant local expression quantitative trait locus (eQTL) with DPYD. We note that DPYD also contains a predicted miR-137 target site. An exome sequencing study reported two putative functional de novo variants in DPYD in cases with schizophrenia. 11
Fourth, 13 of the 22 regions in Table 3 contain long intergenic non-coding RNAs (lincRNAs). lincRNAs have multiple known or suspected functions including epigenetic regulation and development. 47 Using pathway analysis,48 there was modest enrichment (P=0.06) for smaller association P values in a conservative set of lincRNAs derived from sequencing of poly-A RNA from multiple tissues. 47 This observation is consistent with a general role for GWAS findings in the regulation of gene expression rather than alteration of protein sequence. eQTLs 49,50 overlap with SNPs implicated by GWAS over all traits 51-53 as well as for specific traits like height, adiposity, cardiovascular risk factors, chemotherapy-induced cytotoxicity, autism, schizophrenia, and Crohn's disease. 54-61An estimated 55% of eQTL SNPs lie in DNase I hypersensitivity sites (a marker for open chromatin subject to transcriptional regulation) and 77% of SNPs implicated in GWAS are in or in high LD with SNPs inDNase I hypersensitivity sites. 25,62,63
There has been considerable debate about the genetic architecture of schizophrenia. We estimated the proportion of variance in liability to schizophrenia explained by SNPs using GCTA. 64 Traditional genetic epidemiological studies use the phenotypic resemblance of relatives to estimate the proportion of variance in liability using theoretic resemblance assumptions. GCTA uses genome-wide SNP genotypes to calculate the heritability in the population from the identity-by-state relationships for each pair of individuals. Using the PGC schizophrenia data, we previously estimated the SNP heritability of schizophrenia at 0.23 (SE 0.01) using HapMap3 imputation and assuming a population risk of 0.01. 7 Using the same imputation reference and population risk, SNP heritability was substantially higher in the Swedish samples (0.32, SE 0.03) possibly due to the greater phenotypic and genetic homogeneity in the Swedish sample compared to the PGC samples of mixed European ancestry. We obtained a similar estimate of SNP heritability using 1000 Genomes imputed data (0.33, SE 0.03, population risk 0.01). For a population risk of 0.004, 4,65 SNP heritability was 0.26 (SE 0.02) using HapMap3 and 0.27 (SE 0.02) using 1000 Genomes imputation. Partitioning of the SNP-heritability by minor allele frequency is consistent with 80% of the signal reflecting causal variants with MAF > 0.1 (Supplemental Table 5).
To complement the GCTA analyses, we also applied ABPA (approximate Bayesian polygenic analysis) 66 to the Sweden + PGC results. Compared to GCTA, ABPA yielded somewhat larger but generally congruent estimates of variance in liability to schizophrenia using HapMap3 data: 0.43 for population risk of 0.01 (95% credible interval 0.38-0.48) and 0.34 for population risk of 0.004 (95% credible interval 0.31-0.37).
The Bayesian framework used by ABPA also allows simultaneous estimation of the number of independent SNP loci that contribute to risk for schizophrenia. Here, we assume that the number of genome-wide significant SNP associations and the amount of variance they explain in the Sweden + PGC results reflect only partly the underlying genetic architecture of schizophrenia due to inadequate sample size. Using 1000 Genomes results for Sweden + PGC and assuming population risk of 0.01, we estimated that 8,300 independent SNPs contribute to the genetic basis of schizophrenia and that these SNPs account for 50% of the variance in liability to schizophrenia (95% credible intervals 6,300-10,200 for the number of SNPs and 0.45-0.54 for total variance explained). We stress that these estimates must be interpreted in the context of the assumptions of ABPA and the strengths and weaknesses of the input data. Additional analyses (not shown) indicate that most of the signal was derived from SNPs with allele frequencies > 0.1; low-frequency imputed SNPs were not generally inferred to be associated with schizophrenia. Figure 3 compares ABPA estimates of the genetic architecture of schizophrenia and four biomedical diseases. 66 There are similarities across the estimates for these complex traits as all are relatively highly polygenic, and common SNPs explain substantial proportions of variation. However, these results suggest that the genetic architecture of schizophrenia is left-shifted with greater numbers of SNPs with smaller effects.
We previously estimated the heritability of schizophrenia in Sweden to be 0.64 (95% CI 0.617-0.675) using a national pedigree sample of 9.0M individuals,5 and a Danish national pedigree study of 2.6M individuals reported a similar estimate (0.67, 95% CI 0.65-0.71). 5,67Using the 1000 Genomes data with population risk of 0.01, the variance in liability estimate from GCTA accounts for 52% of the heritability (0.33/0.64) and ABPA accounts for 78% of the heritability (0.50/0.64). Imprecision is inherent to these estimates and future work or the use of a twin meta-analytic estimate of the heritability of schizophrenia (0.81, 95% CI 0.73-0.90) 6 could revise these estimates downward. However, despite the use of different assumptions and methods, these estimates converge on a crucial qualitative implication: causal variants tagged by common SNPs make substantial contributions to the risk for schizophrenia.
These results provide deeper insight into the genetic architecture of schizophrenia than ever before. We find support for 22 common variant loci (14 novel) that highlight biological hypotheses for further evaluation. Some findings have immediate translational relevance. Larger studies are highly likely to uncover more common variant associations as argued elsewhere. 8,18,68,69
Common variation is an important (and perhaps predominant) genetic contributor to risk for schizophrenia. We estimated that 6,300-10,200 independent and mostly common SNPs contribute to the etiology of schizophrenia. As one gene or structural element could contain multiple independent associations, that the number of number of genes ultimately determined to harbor causal variation for schizophrenia will be smaller, and we expect that these genes will implicate one or more biological pathways fundamental to disease risk.
Moreover, these thousands of independent loci appear to account for a considerable fraction of the heritability of schizophrenia. It is possible that the commonly used phrase “missing heritability” lacks precision. Indeed, if thousands of SNPs underlie schizophrenia, a statistical models containing a handful of SNPs is unlikely to account for more than a small fraction of the heritability. 70 Ourresults imply that the genetic architecture of schizophrenia is not dominated by uncommon variation. However, a balanced plan of attack should include well-powered searches for rare, private, or de novo genetic variation of strong effect given that such variants are probably more tractable to current molecular methods.
Power calculations are a fundamental component of the design of genetic studies. However, relatively extensive knowledge of genetic architecture is essential for power calculations to have maximum utility for study planning. We used the ABPA estimates of the posterior distribution of genotypic relative risks (Figure 3) to inform power calculations by estimating the numbers of independent loci that could be detected for different sample sizes (Supplemental Table 6 and Supplemental Figure 8). For example, for 60,000 schizophrenia cases and 60,000 controls, ABPA results project that hundreds of independent SNP loci would reach genome-wide significance (mean of 794 SNPs, 95% credible interval 362-1154 SNPs).
Thus, for the first time, we now have a clear path to increased knowledge about the etiology of schizophrenia via application of standard, off-the-shelf genomic technologies for elucidating the effects of common variation. We suggest that a relatively thorough enumeration of the genomic loci conferring risk for schizophrenia (the “parts list”) should be a priority for the field. 8 Identifying all loci would surely be an exercise in diminishing returns. However, we propose a goal for the field: identification of the top 2,000 loci (for example) might be sufficient confidently and clearly to reveal the biological processes that mediate risk and protection for schizophrenia. Achievement of this goal would provide a strong empirical impetus for targeted biological and genetic research into the precise molecular basis of risk for schizophrenia, stratification of at-risk populations (e.g., psychotic prodrome), and appropriate cellular measure for evaluation of novel therapeutics. As indicated by our findings, greater knowledge of the genetic basis of schizophrenia can converge on increasingly specific neurobiological hypotheses that can be prioritized for subsequent investigation.
We present here the pre-planned principal analyses for this project. In order to advance knowledge of schizophrenia, a minority of samples were included in prior reports. Genotyping was conducted in six batches (denoted Sw1-Sw6) with total sample sizes of 464, 694, 1498, 2388, 4461, and 2345. Genotypes were generated as sufficient numbers of samples accumulated from the field work in Sweden. The 2009 International Schizophrenia Consortium report contained GWAS data from the Sw1-2 subjects (N=1158, 9.8% of the sample before quality control). 14. The 2011 PGC schizophrenia paper also contained GWAS data from the Sw1-2 subjects plus ~80 SNPs from Sw3-4 in the replication phase. 17 The 2012 Bergen et al. paper had a particular focus contrasting schizophrenia with bipolar disorder and reported GWAS results from Sw1-4 (N=4044, 42.6% of the full sample). 75 Thus, of the total sample of 11,850 Swedish subjects before quality control (5,351 cases, 6,509 controls), 57.4% have never been reported previously.
All procedures were approved by ethical committees at the Karolinska Institutet and University of North Carolina, and all subjects provided written informed consent (or legal guardian consent and subject assent). Sample collection was from 2005-11.
Cases with schizophrenia were identified via the Swedish Hospital Discharge Register 76,77 which captures all public and private inpatient hospitalizations. The register is complete from 1987 and augmented by psychiatric data from 1973-86. The register contains ICD discharge diagnoses 78-80 made by attending physicians for each hospitalization. 81-84 Case inclusion criteria: ≥2 hospitalizations with a discharge diagnosis of schizophrenia, both parents born in Scandinavia, and age ≥18 years. Case exclusion criteria: hospital register diagnosis of any medical or psychiatric disorder mitigating a confident diagnosis of schizophrenia as determined by expert review, and included removal of 3.4% of eligible cases due to the primacy of another psychiatric disorder (0.9%) or a general medical condition (0.3%) or uncertainties in the Hospital Discharge Register (e.g., contiguous admissions with brief total duration, 2.2%).
The validity of this case definition of schizophrenia is described at length in the Supplement, and validity is strongly supported by clinical, epidemiological, genetic epidemiological, and genetic evidence.
Controls were selected at random from Swedish population registers with the goal of obtaining an appropriate control group and avoiding “super-normal” controls. 85 Control inclusion criteria: never hospitalized for schizophrenia or bipolar disorder (given evidence of genetic overlap with schizophrenia), 5,14,86 both parents born in Scandinavia, and age ≥18 years.
Of the potential cases and controls who were alive and contactable, refusal rates were higher for cases than for controls (46.7% versus 41.7%). However, these proportions compare favorably with modern refusal rates in epidemiology (59% for cross-sectional and 44% for case-control studies), 87,88 and in a recent large Norwegian longitudinal study (58%). 89 For cases, comorbidity with drug/alcohol abuse or dependence did not predict participation nor did any subtype of schizophrenia (e.g., paranoid or disorganized types). The sample was approximately representative of the Swedish populace in regard to county of birth (Supplemental Figure 4).
DNA was extracted from peripheral blood samples at the Karolinska Institutet Biobank. Samples were genotyped in six batches at the Broad Institute using Affymetrix 5.0 (3.9%), Affymetrix 6.0 (38.6%), and Illumina OmniExpress (57.4%) chips according to the manufacturers' protocols (Supplemental Table 3). Genotype calling, quality control, and imputation were done in four sets corresponding to data from Affymetrix 5.0 (Sw1), Affymetrix 6.0 (Sw2-4), and the OmniExpress batches (Sw5, Sw6). Genotypes were called using Birdsuite (Affymetrix) or BeadStudio (Illumina). The quality control parameters applied were: SNP missingness < 0.05 (before sample removal); subject missingness < 0.02;autosomal heterozygosity deviation; SNP missingness < 0.02 (after sample removal);difference in SNP missingness between cases and controls < 0.02; and deviation from Hardy-Weinberg equilibrium (P < 10−6 in controls or P < 10−10 in cases).
After basic quality control, 77,986 autosomal SNPs directly genotyped on all four GWAS platforms were extracted and pruned to remove SNPs in LD (r2> 0.05) or with minor allele frequency < 0.05, leaving 39,239 SNPs suitable for robust relatedness testing and population structure analysis. Relatedness testing was done with PLINK90 and pairs of subjects with π(x00302) > 0.2 were identified and one member of each relative pair removed at random. Principal component estimation was done with the same collection of SNPs. We tested 20 principal components for phenotype association (using logistic regression with batch indicator variables included as covariates) and evaluated their impact on the genome-wide test statistics using λ 19 after genome-wide association of the specified principal component, and 11 principal components were included in all association analyses.
Genotype imputation was performed using the pre-phasing/imputation stepwise approach implemented in IMPUTE2 / SHAPEIT (chunk size of 3 Mb and default parameters). 91,92 The imputation reference set consisted of 2,186 phased haplotypes from the full 1000Genomes Project dataset (March 2012, 40,318,245 variants). Evaluation of λGC led to the removal of SNPs with control allele frequencies < 0.005 or > 0.995, imputation “info” values < 0.2, or that were genotyped only in the smallest sample set (Sw1). Given that male sex is a risk factor for schizophrenia, 93 chromosome X imputation was conducted for subjects passing QC for the autosomal analysis (excluding chrX SNPs with missingness ≥ 0.05 or HWE P < 10−6 in females). Imputation was performed separately for males and females, gene dosages tested for association under an additive logistic regression model using the same covariates as for the autosomal analysis. All genomic locations are given in NCBI build 37/UCSC hg19 coordinates.
We first analyzed Swedish cases and controls (N=11,244), and then conducted a meta-analysis with the PGC results for schizophrenia to evaluate our results with respect to the world's literature (N=20,899 after removing 954 subjects from Sw1-2). 17 To maximize comparability, the Swedish samples were run through the same analytical pipeline used for the PGC samples. Association testing was carried out in PLINK using imputed SNP dosages and the principal components described above as covariates. 22 Meta-analysis was conducted using an inverse-weighted fixed effects model. 21 To evaluate the comparability of the Swedish results with those from the PGC schizophrenia study, we used sign tests and risk score profiling based on sets of carefully selected SNPs. 17
Many GWAS findings implicate an extended region containing multiple significant SNPs. These are not independent associations but result because of high LD between associated SNPs. It is useful to summarize these associations in terms of the index SNP with the highest association and other SNPs in high linkage disequilibrium with the index SNP. To summarize GWAS findings, we used the following settings in PLINK:
--clump-p1 1e-4 --clump-p2 1e-4 --clump-r2 0.2 --clump-kb 500
to retain SNPs with association P < 0.0001 and r2 < 0.2 within 500 kb windows.
We used sign tests to compare the overall patterns of results between the Swedish and PGC schizophrenia samples. We used the clumping settings above to derive a filtered set of SNPs. Due to the strong signal and high linkage disequilibrium in the MHC, only one SNP was kept from the extended MHC region. We then determined the number of SNPs whose logistic regression beta coefficient signs were the same between two independent samples. Under the null, the expectation is that 50% of the signs of these SNPs will be the same between two independent sets of results. The significance of the observed proportion was evaluated using the binomial distribution.
The significance test was done in two ways: selecting SNPs from Sw1-6 results and evaluating the signs in the independent PGC results, and by reversing the procedure (select from PGC, evaluate signs in Sw1-6). Similar results were obtained selecting SNPs for: (a) P < 1×10−5, (b) P < 1×10−6, (c) keeping one SNP every 3 Mb (effectively removing or greatly minimizing the effects of residual linkage disequilibrium).
We used RPS 14 as an alternative and complementary way to compare the overall patterns of results from the PGC schizophrenia analysis (discovery sample) with the independent Swedish results (target sample). We began by selecting a high-quality, relatively independent SNPs with unambiguous directions of effects: from the PGC imputed results file, we made a subset of results containing SNPs with allele frequency 0.02-0.98 and imputation INFO scores > 0.9. We then removed SNPs in high LD using via clumping (i.e., retain all SNPs with r2< 0.25 within 500 kb windows):
--clump-p1 1 --clump-p2 1 --clump-r2 0.25 --clump-kb 500
For RPS, we wished to evaluate SNP effects across the p-value spectrum. Again, due to the strong signal and high linkage disequilibrium in the MHC, only one SNP was kept from the extended MHC region.
We used the resulting list from the PGC to calculate schizophrenia risk profile scores in the independent Swedish samples using the
function in PLINK. We did this 10 times using different subsets of the PGC SNPs selected by increasing P value thresholds. From the set of filtered SNPs from the PGC, we evaluated 10 different association P thresholds (PT): 0.0001, 0.001, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, and 1.0 (i.e., include all SNPs). For each of these 10 sets of SNPs derived from the PGC, the schizophrenia risk profile score (the number of schizophrenia risk alleles weighted by the logistic regression beta) was calculated for each case and control in Sw1-6. Logistic regression was then used to test whether Swedish cases had significantly different burden of schizophrenia risk alleles in comparison to controls (including ancestry principal components as covariates). To estimate the proportion of variance of case-control status in the Swedish samples accounted for by the risk profile score from the PGC, we used the difference in the Nagelkerke pseudo R2 contrasting a logistic regression model containing the risk profile score plus ancestry covariates with a logistic regression model containing the covariates alone.
One way to understand polygenic associations for a complex trait is if the implicated genetic variants are in genes that comprise a biological pathway. Gene-set analysis includes evaluation of genetic variants in genes that are grouped based on their interacting role in biological pathways (biological pathway analysis) and genes that share similar cellular functions (functional gene-set analysis).
We used JAG (Joint Association of Genetic variants, http://ctglab.nl/software) to conduct gene-set analyses. This method has previously been applied to the International Schizophrenia Consortium data by Lips et al. 94 JAG tests for the association of specified gene-sets with schizophrenia as applied to individual-level genotype data which tends to be more powerful than using summary statistics. JAG constructs a test-statistic for each gene-set. JAG includes both self-contained and competitive tests. These two approaches evaluate different null hypotheses. Statistical significance (Pself and Pcomp) are determined using permutation. First, the self-contained test evaluates the null hypothesis that a defined set of genes is not associated with schizophrenia while accounting for the some of the properties of the SNPs being studied (e.g., LD structure). Second, the competitive test evaluates whether a specific set of genes has evidence for stronger associations with schizophrenia than randomly selected sets of control genes (with the latter matched to the former using the same effective number of SNPs per gene-set). Thus, a competitive test is of the null hypothesis is that these genes are not more strongly associated than a similar but randomly-selected set of genes. That is, the comparison is more one to the average degree of association across genes. The principal comparison is the competitive test, and we present self-contained tests for completeness. Competitive gene-set tests are more appropriate for a polygenic disease like schizophrenia because they explicitly prioritize gene-sets that show a greater average degree of association, over and above the polygenic background, rather than prioritizing larger but more weakly-enriched gene-sets (as self-contained tests would tend to do).
We obtained replication association results from six independent samples totaling 7,452 cases, 20,404 controls, and 581 trios (Supplemental Table 4). These subjects are not included in the Swedish samples or in the PGC mega-analysis. 17 The independent samples were from SGENE+, 16, CLOZUK, 29 the Irish Schizophrenia Genomics Consortium, 95 the Psychosis Endophenotype Consortium, 96, and the Multicenter Family Study. 97 After selecting for P < 1×10−5 in the Sweden and PGC meta-analysis and accounting for linkage disequilibrium, we requested association results for 194 genomic regions.
We are deeply grateful for the participation of all subjects contributing to this research, and to the collection team that worked to recruit them: Emma Flordal-Thelander, Ann-Britt Holmgren, Marie Hallin, Marie Lundin, Ann-Kristin Sundberg, Christina Pettersson, Radja Satgunanthan-Dawoud, Sonja Hassellund, MalinRådstrom, BirgittaOhlander, Leila Nyrén, and Isabelle Kizling. Funding support was provided by the NIMH R01 MH077139 (Sullivan), NIMH R01 MH095034 (Sklar),Stanley Center for Psychiatric Research, the Sylvan Herman Foundation, the Friedman Brain Institute at Mount Sinai School of Medicine, the Karolinska Institutet, Karolinska University Hospital, the Swedish Research Council, the Swedish County Council, the SöderströmKönigska Foundation, and the Netherlands Scientific Organization (NWO 645-000-003).SGENE was supported by EU Grant HEALTH-F2-2009-223423 (Project PsychCNVs). The study of the Aarhus sample was supported by grants from The Danish Strategic Research Council, H. Lundbeck A/S, The Faculty of Health Sciences at Aarhus University, Lundbeck Foundation, and The Stanley Research Foundation.The Wellcome Trust Case Control Consortium 2 project collection was funded by the Wellcome Trust (085475/B/08/Z and 085475/Z/08/Z).The funders had no role in study design, execution, analysis, and manuscript preparation.
|Prof Douglas F Levinson MD||ude.drofnats@velfd||Psychiatry and Behavioral Sciences, Stanford University, Stanford, California, USA|
|Prof Pablo V Gejman MD||moc.liamg@namjegp||Psychiatry and Behavioral Sciences, NorthShore University HealthSystem and University of Chicago, Evanston, Illinois, USA|
|Dr Claudine Laurent MD PhD||rf.oohay@45tnerualenidualc||Child and Adolescent Psychiatry, Pierre and Marie Curie Faculty of Medicine and Brain and Spinal Cord Institute (ICM), Paris, France|
|Prof Bryan J Mowry MD FRANZCPemail@example.com||Psychiatry, Queensland Brain Institute and Queensland Centre for Mental Health Research, University of Queensland; Brisbane, Queensland, Australia|
|Prof Ann E Pulver PhD||ude.imhj@revlupea||Psychiatry, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA|
|Prof Sibylle G Schwab PhDfirstname.lastname@example.org||Psychiatry, Friedrich-Alexander University, Erlangen-Nuremberg, Erlangen, Germany|
|Prof Dieter B Wildenauer PhDemail@example.com||Psychiatry and Clinical Neurosciences, Western Australian Institute for Medical Research & Centre for Medical Research, The University of Western Australia, Nedlands, Australia|
|Dr Frank Dudbridge PhDfirstname.lastname@example.org||Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK|
|Dr Jianxin Shi PhD||vog.hin.liam@snixnaij||Biostatistics, National Cancer Institute, Bethesda, MD, USA|
|Prof Margot Albus MDemail@example.com||State Mental Hospital, Haar, Germany|
|Dr Madeline Alexander PhD||ten.tsacmoc@1agmm||Psychiatry and Behavioral Sciences, Stanford University, Stanford, California, USA|
|Prof Dominique Campion PhDfirstname.lastname@example.org||INSERM U614, University of Medicine, Rouen, France|
|Prof David Cohen MD PhD||rf.soon@55nehocd||Child and Adolescent Psychiatry, Pierre and Marie Curie Faculty of Medicine, Institute for Intelligent Systems and Robotics (ISIR), Paris, France|
|Prof DimitrisDikeos MD||rg.aou.dem@soekidd||First Department of Psychiatry, University of Athens Medical School, Athens, Greece|
|Dr JubaoDuan PhD||moc.liamg@96naudj||Psychiatry and Behavioral Sciences, NorthShore University HealthSystem and University of Chicago; Evanston, Illinois, USA|
|Prof Peter Eichhammer MD PhD||ed.obdem@remmahhciE.reteP||Psychiatry, University of Regensburg, Regensburg, Germany|
|Stephanie Godardemail@example.com||Psychiatry and Genetics, INSERM, Institut de Myologie, Hôpital Pitié Salpêtrière, Paris, France|
|Dr Mark Hansen PhD||moc.animulli@nesnahm||Illumina, Inc., La Jolla, California, USA|
|Prof F Bernard Lerer MD||li.ca.ijuh.cc@rerel||Psychiatry, Hadassah-Hebrew University Medical Center, Jerusalem, Israel|
|Prof Kung-Yee Liang PhD||ude.hpshj@gnailyk||Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA|
|Prof Wolfgang Maier MDfirstname.lastname@example.org||Psychiatry, University of Bonn, Bonn, Germany|
|Prof Jacques Mallet PhD||rf.ueissuj.spuhc@tellam||Centre National de la Recherche Scientifique, Laboratoire de Génétique Moléculaire de la Neurotransmission et des Processus Neurodégénératifs, Hôpital Pitié Salpêtrière, Paris, France|
|Deborah A Nertney||ua.ude.qu.rhmcq@yentren_bed||Psychiatry, Queensland Brain Institute and Queensland Centre for Mental Health Research, University of Queensland, Brisbane, Queensland, Australia|
|Prof Gerald Nestadt MD||ude.imhj@tdatseng||Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA|
|Dr Nadine Norton PhD||ku.ca.ffidrac@nnotron||Psychological Medicine and Neurology, MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Cardiff University, Cardiff, Wales, UK|
|Prof George N Papadimitriou MD||rg.aou.dem@dapapng||First Department of Psychiatry, University of Athens Medical School, Athens, Greece|
|Robert Ribble||ude.ucv@elbbircr||Psychiatry, VIPBG, VCU, Richmond, Virginia, USA|
|Dr Alan R Sanders MDemail@example.com||Psychiatry, NorthShore University HealthSystem and University of Chicago, Evanston, Illinois, USA|
|Prof Jeremy M Silverman PhDfirstname.lastname@example.org||Psychiatry, Mount Sinai School of Medicine, New York, NY and VAMC,Bronx, New York, USA|
|Prof Dermot Walsh MD||ei.brh@hslawd||The Health Research Board, Dublin, Ireland|
|Dr Nigel M Williams PhD||ku.ca.fc@mnsmailliw||Psychological Medicine, MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Cardiff University, Cardiff, Wales, UK|
|Brandon Wormley||ude.ucv.csh@yelmrowb||Psychiatry, VIPBG, VCU, Richmond, Virginia, USA|
URLs: Results can be downloaded from the Psychiatric Genomics Consortium website (http://pgc.unc.edu) and visualized using Ricopili (http://www.broadinstitute.org/mpg/ricopili). Genotype data are available upon application from the NIMH Genetics Repository (https://www.nimhgenetics.org).
Author Contributions: SR, COD, EAS, MF, NRW, NS, SB, SHL, ABS, ALR, BKBS, BMN, CdL, DP, DR, FB, JP, KL, MLH, MV, PH, SS, SM, SP, and PFS conducted statistical analyses. ADB, DMH, DR, ES, JS, MPM, ND, OM, PBM, ST, TS, and VG ascertained subjects. ALC, JJC, SW, YK, KX, and PFS performed bioinformatic analyses. KC, JLM, and SA managed the project. BPR, DWM, FAON, HS, JTW, KSK, MG, MJO, NC, PC, MGS, PEC, WTCCC2, APC, EB, KS, and MCOD provided replication samples and genotypes. AKK interfaced with Swedish national registers. The manuscript was written by PKEM, SM, SP, PS, CMH, and PFS. The study was designed by SP, PS, CMH, and PFS. Funding was obtained by ES, PS, CMH, and PFS.
Conflicts of Interest: Dr Sullivan was on the SAB of Expression Analysis (Durham, NC, USA). Dr Sklar is on the Board of Directors of Catalytic, Inc. The other authors report no conflicts.