|Home | About | Journals | Submit | Contact Us | Français|
Schizophrenia (SCZ) and bipolar disorder (BD) are highly heritable psychiatric disorders with overlapping susceptibility loci and symptomatology. We conducted a genome-wide association study (GWAS) of these disorders in a large Swedish sample. We report a new and independent case–control analysis of 1507 SCZ cases, 836 BD cases and 2093 controls. No single-nucleotide polymorphisms (SNPs) achieved significance in these new samples; however, combining new and previously reported SCZ samples (2111 SCZ and 2535 controls) revealed a genome-wide significant association in the major histocompatibility complex (MHC) region (rs886424, P = 4.54 × 10−8). Imputation using multiple reference panels and meta-analysis with the Psychiatric Genomics Consortium SCZ results underscored the broad, significant association in the MHC region in the full SCZ sample. We evaluated the role of copy number variants (CNVs) in these subjects. As in prior reports, deletions were enriched in SCZ, but not BD cases compared with controls. Singleton deletions were more frequent in both case groups compared with controls (SCZ: P = 0.003, BD: P = 0.013), whereas the largest CNVs (>500 kb) were significantly enriched only in SCZ cases (P = 0.0035). Two CNVs with previously reported SCZ associations were also overrepresented in this SCZ sample: 16p11.2 duplications (P = 0.0035) and 22q11 deletions (P = 0.03). These results reinforce prior reports of significant MHC and CNV associations in SCZ, but not BD.
Schizophrenia (SCZ) and bipolar disorder (BD) are severe and chronic psychiatric disorders.1 A persistent debate in psychiatry has been the extent to which SCZ and BD are distinct.2 Although the classical definitions of these illnesses are relatively discrete,3 there are multiple shared features: age of onset in late teens/early adulthood, prevalence (~1%), high heritability (59–64% in the Swedish population) and some symptomatology (e.g., psychosis, chronicity and cognitive deficits).4-9 Genetic overlap between these disorders was validated in a large Swedish study in which the shared genetic contribution was estimated to be >60%.9
Genome-wide association studies (GWAS) have discovered robust and replicable risk loci for SCZ. Genetic markers in the major histocompatibility complex (MHC) region and TCF4 have demonstrated replicated associations,10-14 and the Schizophrenia Psychiatric Genomics Consortium (PGC-SCZ) has recently identified other significant loci at 1p21.3, 2q32.3, 8p23.2, 8q21.3 and 10q24.32–q24.33.
GWAS of BD have identified significant associations to DGKH,15 ANK316 and NCAN,17 and the largest BD GWAS, to date, discovered significant loci in the calcium channel gene CACNA1C and ODZ4.18 In addition, a small number of loci were jointly analyzed by the PGC in SCZ and BD. These analyses demonstrated strong contributions from both disorders to the genome-wide significant results in CACNA1C, ANK3 and the ITIH3–ITIH4 region, supporting the idea of shared genetic risk.14,18
In contrast to common genetic variants (single-nucleotide polymorphisms (SNPs)), which have been implicated in these disorders, several large, rare copy number variants (CNVs) have been associated with SCZ, although the associations are less notable for BD. Deletions at 1q21.1, 2p21–16.3, 3q29, 15q11.2, 15q13.3 and 22q11, and duplications at 7q36.3, 16p11.2 and 16p13.11 confer substantial risk for SCZ in a small percentage of cases, but the risk is not specific to SCZ. These CNVs have also been observed at elevated rates in autism, epilepsy and mental retardation.19-22 Duplications at 16p11.2 have demonstrated risk for SCZ, autism, and to some extent, BD.23-25
Few GWAS have examined SCZ and BD concurrently, and this represents the largest combined study of these disorders in a genetically homogeneous population to date. In this unique sample, we conducted association analyses of common SNPs and explored the relationship of CNVs to BD and SCZ. Additional analyses assessing the aggregation of associated loci in biological pathways, explorations of common copy number polymorphisms (CNPs) in SCZ, and comparison of our SCZ association results with the PGC SCZ results through meta-analysis were also performed.
Comprehensive analyses of most SCZ cases (N = 1507) and controls (N = 2093) are newly reported here, and some SCZ cases (N = 560) and controls (N = 400) from this study have been reported previously.10,34 Additionally, 44 SCZ cases and 42 controls previously removed from analyses owing to Finnish ancestry were reintegrated into the current analyses. All subjects were born in Sweden, and SCZ cases were identified via the Swedish Hospital Discharge Register containing all individuals hospitalized in Sweden since 1973. The Hospital Discharge Register has high agreement with psychiatric diagnoses.26-28 Diagnoses were established by the attending physician and confirmed in a subset of subjects by medical record review. Cases must have had at least two hospitalizations with an SCZ diagnosis. Control subjects, also selected through registers, were group-matched by age, sex and county of residence, and must not have been hospitalized with a psychiatric diagnosis. All subjects were at least 18 years old and gave written informed consent to participate. The study was approved by the Ethical Committee at Karolinska Institutet.
The previously unreported BD subjects (N = 836) were only collected in the most recent recruitment wave through three channels. Some BD cases (N = 256) were identified using the Swedish Hospital Discharge Register with at least two hospitalizations with a BD diagnosis and confirmatory diagnostic review in a subset of subjects.29 Additional subjects were recruited from the Affective Center at St Goran Hospital in Stockholm, Sweden, following physician’s referral for BD (N = 216). The diagnostic instrument used was a Swedish adaptation of the Affective Disorder Evaluation,30 which includes the affective module of the Structured Clinical Interview for DSM-IV.31 A further 571 BD cases were recruited from the Stockholm County catchment area. Diagnoses were made according to the DSM-IV criteria, and cases were not reported previously. The control subjects used were the same as for the SCZ analyses described above. All ascertainment procedures were approved by the Regional Ethical Committees in Sweden.
Blood samples were obtained and DNA extracted from whole blood using standard methods at the Karolinska Institutet. Samples were genotyped in four waves (Supplementary Table 1). Most subjects (92.6%) were genotyped with Affymetrix 6.0 chips (Affymetrix, Santa Clara, CA, USA) with the remainder (7.4%) genotyped with the Affymetrix 5.0 chip. All genotyping was conducted at the Broad Institute of Harvard and MIT, and genotypes and CNVs were called using the Birdsuite algorithm.32 The quality control exclusionary measures for subjects were: genotype call rates <95%; ancestry outliers via multidimensional scaling; a randomly selected member of any pair of subjects with high relatedness (); and suspected sample error or contamination. SNPs were excluded for marked departure from Hardy–Weinberg equilibrium (P< 1 × 10−6), low minor allele frequencies (<1%), and non-random genotyping failure, inferred from the flanking haplotype background using the PLINK ‘mishap’ test (P< 1 × 10−10). Plate-based associations of P< 1 × 10−6 were taken as evidence of non-random plate failure, based on a comparison of allele frequency of each plate to all others and were removed on a plate-by-plate basis.
Following quality control, the analysis data set consisted of 745 006 SNPs genotyped in 1507 SCZ cases, 836 BD cases and 2093 controls. GWAS results for these samples have not been previously reported. Additional SCZ cases (N = 604) and controls (N = 442) from earlier collection waves of this study11,34 bring the total sample size to 2947 combined cases and 2535 controls.
Imputation was conducted using BEAGLE.33 First, as with the PGC-SCZ data, we imputed our genotypes against autosomal genotype data from HapMap334 and retained only SNP dosages imputed with very high confidence (Info >0.8). Second, we imputed MHC Class 1 and 2 SNP and amino acid variants.35 Following standard definitions for human leukocyte antigen (HLA) allele sequences from the EMBL-EBI Immunogenetics HLA Database, we imputed against the 2767 unrelated founder individuals of European descent, with high-resolution MHC genotyping data for 263 HLA alleles, 3852 SNPs and 372 amino acid positions. The genomic window imputed was 29.3–33.9 MB on chromosome 6.
All association analyses were conducted using logistic regression in PLINK.36 Multidimensional scaling was performed on the entire data set, and each collection wave was analyzed separately using the first four multidimensional scaling components as covariates, to control for population substructure. Results were then combined by meta-analysis in PLINK. Four main association analyses were conducted using the HapMap3 imputed data for: BD and SCZ separately (new subjects), SCZ (full sample), and all SCZ and BD cases combined (Supplementary Table 1). We used a genome-wide significance threshold of P<5 × 10−8.37 Imputed HLA alleles or amino acid dosages were tested for association using a logistic regression model with covariates to correct for population substructure and genotyping batch. For amino acid positions with >2 alleles, we used the omnibus test in the conditional haplotype analysis module in PLINK. Conditional analyses in the MHC region were performed by conditioning on the genotyped SNP with the lowest P-value in the full SCZ results using the condition function in PLINK, which includes the additive effect of the SNP in the association model.
Some Swedish SCZ subjects (N = 558) and controls (N = 396) were among the 17 samples included in the PGC-SCZ report,14 and we combined the individual non-Swedish PGC sample results with all Swedish SCZ results using the meta-analysis function in PLINK. This yielded P-values and odds ratios (ORs) for ~1.1 million markers in the combined sample of 11 271 cases and 14 601 controls. This is the largest SCZ GWAS meta-analysis yet reported.
Analyses using INRICH software38 and the set screen test implemented in PLINK39 were conducted to assess the joint action of common SNPs within pathways. For these tests, genomic regions encompassing the top results as well as surrounding SNPs in linkage disequilibrium were compared against gene pathways. Full methods are in the Supplementary Material.
We carried out polygenic scoring analyses10,14 using the PGC-SCZ stage 1 samples. After excluding Swedish samples, the training data set consisted of 9160 cases and 12 066 controls, and 82 437 SNPs. The testing data set consisted of 2111 SCZ cases or 836 BD cases, and 2535 controls, and quantitative scores were computed for each subject based on the pT (P-value threshold), the proportion of SNPs with P-values < pT in the training data set. For each SNP set defined by pT, we calculated the proportion of variance explained (R2) by subtracting the Nagelkerke’s R2 attributable to ancestry covariates alone from the R2 for polygenic scores plus covariates. Full methods are described in the Supplementary Material.
Intensity data from both SNP and structural variation probes were used to identify CNVs based on a hidden Markov model.32 Only subjects who passed SNP quality control filters were considered for CNV analyses. We excluded CNVs with >0.01 frequency, common CNV regions identified in the HapMap samples,40 and any CNVs spanning large genomic gaps (e.g., centromeres). Analyses were restricted to autosomal CNVs >100 kb, and subjects were removed for having either CNVs totaling >10 Mb or >20 total CNVs. This resulted in 4147 CNVs in 1505 SCZ cases, 834 BD cases and 2087 controls remaining for analysis.
The number of events in cases and controls, the rate per subject, the proportion of cases/controls to have at least one event, the total distance spanned per person and the average event size per person were evaluated for each CNV burden test across a range of frequencies (1, 2–6, or all), and sizes (100–200 kb, 200–500 kb and >500 kb) for duplications and deletions, separately and combined in both case groups. Statistical tests were one-sided, presuming an increased CNV burden in cases, and significance was evaluated using 10 000 permutations. In addition to the number of CNVs, the proportion of genes intersecting these regions in cases versus controls was tested. Large CNVs with previously reported associations with SCZ or BD were specifically examined, and novel loci were explored. For specific CNVs, only deletions and duplications spanning the full CNV region were reported.
Common CNP deletions 50–260 000 bp in length and present at >1% frequency were identified in the 1000 Genomes data.41 These CNPs were in linkage disequilibrium with 40 767 SNPs present in the HapMap3-imputed results for our full SCZ sample and were cross-referenced with these results. When multiple SNPs tagged a CNP, the SNP with the greatest R2 and lowest P-value was retained.
We performed genome-wide association analyses of SCZ and BD, and both disorders considered together using a common set of controls in a large sample from Sweden. Our basic analyses were new independent SCZ samples, all SCZ samples (new and previously reported), BD samples (all new), and SCZ and BD combined. The genomic control, λ (the median observed χ2-statistic divided by expectation under the null), showed little evidence for inflation of the test statistics. For SCZ (new and full samples) and SCZ–BD combined λ = 1.04 and for BD, λ = 1.05 (Supplementary Figure 1). Regions of association with P<1 × 10−5 for the four main analyses are reported in Table 1. The quantile–quantile and Manhattan plots for all analyses can be found in Supplementary Figures S1 and S2.
In the new and previously unreported sample of 1507 SCZ cases and 2093 controls, none of 1 105 620 imputed SNPs were genome-wide significant (Table 1). The most highly associated SNP was in an intron of GALNT13 (rs12998068, P = 1.50 × 10−6; OR = 1.40). Other noteworthy results include rs11150863 (P = 3.49 × 10−6; OR = 0.78) in an intron of RPTOR and many SNPs in the MHC region (top SNP: rs886424, 8.80 × 10−6; OR = 0.71).
We next combined the new and previously reported SCZ samples ascertained as part of the same study (2111 SCZ cases and 2535 controls). One SNP in the MHC region attained genome-wide significance (rs886424, P = 4.54 × 10−8; OR = 0.68). This marker was directly genotyped, and conditioning on this SNP revealed no independent signals in the 10-Mb surrounding region with P<1 × 10−4 (Supplementary Figure 3). Some evidence of heterogeneity in ORs for this region across collection waves was observed (detailed in Supplementary Text). After the MHC, the next strongest association signal was from a SNP in RGS7 (rs984402, P = 3.43 × 10−7; OR = 0.79). Association testing of 836 BD cases and 2093 controls yielded no genome-wide significant findings. The SNP with the lowest P-value, rs17746001 (P = 3.22 × 10−7; OR = 2.03), was in a gene-poor region of chromosome 4.
We next tested for association using SCZ and BD as a combined phenotype in 2949 cases (2111 SCZ and 836 BD) and 2535 controls. The smallest P-value in the combined analysis was also detected for rs17746001 (P = 2.83 × 10−7; OR = 1.75) and it was strongly represented in the SCZ results (P = 2.23 × 10−5; OR = 1.64). This was closely followed by a large cluster of MHC associations (top SNP: rs2524005, P = 4.95 × 10−7; OR = 0.76).
The highly polymorphic MHC region presents challenges to standard imputation methods. An imputation approach specifically tailored for this region was used to refine the signal at this locus through prediction of classical HLA alleles, and their constituent single-nucleotide variants and corresponding amino acids separately. The MHC-specific imputation in the full SCZ sample revealed an additional SNP with genome-wide significance (rs1264353, P = 3.89 × 10−8). Two classic HLA alleles were also among the top results (HLA_B*0801, P = 2.36 × 10−6, HLA_A*0101, P = 2.83 × 10−6; Supplementary Figure 4).
With the PGC samples, we conducted a meta-analysis of 11 271 SCZ cases and 14 601 controls (Table 2). The MHC region manifests the most significant association, and was driven by strong signals in both PGC-SCZ and the Swedish subjects (rs17693963, P = 3.08 × 10−11, OR = 1.24). This was followed by a chromosome 10 locus containing CNNM2, C10orf32 and other genes (rs11191580, P = 1.73 × 10−9, OR = 1.23), and a chromosome 7 locus encompassing MAD1L1 (rs12666575, P = 1.75 × 10−9, OR = 0.89), which had not attained genome-wide significance in the PGC-SCZ samples alone. Other regions of convergent and genome-wide association include the NT5C2 locus on chromosome 10 (rs1926034, P = 1.37 × 10−8, OR = 0.89) and a chromosome 5 region lacking in annotated genes (rs7709645, P = 3.80 × 10−8, OR 1.11), which was not previously noted in the results from either sample.
Top PGC-SCZ results did show evidence of enrichment in the SCZ full-sample results (P = 0. 044). The 301 targets of MIR137 were not overrepresented in the top results for the SCZ new (P = 0.97) or full samples (P = 0.94). In the synaptic gene tests, the endocytosis-related set was significantly enriched in the full SCZ (P 0.046) and combined SCZ–BD (P = 0.027) results. Calcium channel genes did not show enrichment in the BD or combined SCZ–BD results (P = 1). No single Gene Ontology term was significant in the SCZ–BD results following multiple testing correction; however, more genes contained within Gene Ontology categories demonstrated nominal associations of P<0.05 than expected by chance (P = 0.0055; Supplementary Table 2).
Using the PGC-SCZ sample minus the Swedish subjects as the discovery set, we demonstrated enrichment of putatively associated SCZ score alleles in both the SCZ and BD target case groups, compared with controls for all discovery pT. In the full Swedish SCZ sample, we were able to predict 3.6% of variance explained (P< 2 × 10−16 for all pT). A smaller, but still significant, percent of variance explained (1.4%) was evident for the BD sample, supporting evidence for shared genetic risk between these two disorders (P< 5.92 × 10−4 for all pT). These results corroborate prior reports indicating a polygenic contribution of multiple common SNPs to risk for both SCZ and BD10,14 (Supplementary Table 3).
We investigated the role of CNVs> 100 kb in our new SCZ and BD samples (Table 3). Overall, deletions were enriched to a greater extent in the SCZ sample than in the BD subjects compared with controls. Deletions observed only once were more common in both case groups compared with controls (SCZ: P = 0.003; BD: P = 0.013), whereas the largest size category CNVs>500 kb, were only significantly enriched in SCZ (P = 0.00048). However, it is of note that the gene count in deletions >500 kb was significantly enriched in both SCZ and BD. Duplications did not show significant overrepresentation for either case group for any size or frequency examined.
Many large CNVs previously implicated in SCZ, BD, and other psychiatric and neurological conditions were also observed in this sample (Supplementary Table 4). SCZ subjects possessed significantly more duplications at the 16p11.2 locus (Supplementary Figure 5; nine cases, one control; P = 0.0035) and four SCZ cases had deletions at 22q11, and one control showed an atypical partial deletion (Supplementary Figure 6; P = 0.033). A novel, nominal association was also observed for duplications in SCZ and BD combined at 9q34.3 (31 cases, 11 controls; P = 0.0014, genome-wide P = 0.052; Supplementary Figure 7). Including the previously reported subjects (six SCZ duplications and one control) strengthened the association (P = 0.0008, genome-wide P = 0.023). However, no enrichment for duplications at this locus was observed in cases in another SCZ sample (International Schizophrenia Consortium, P = 0.37)42 or a combined SCZ–BD sample (Genetic Association Information Network, P = 0.41).43,11
CNPs represent an additional source of genetic variation that is rarely investigated, but potentially etiologically important. The top SNP in the full sample SCZ analyses, rs886424, (4.54 × 10−8), tags two CNPs (chr6: 30 994 409–30 994 549, R2 0.91; chr6:30 338 498–30 338 570, R2 = 0.80), whereas another SNP in this region (rs3129953; P = 1.20 × 10−5) is in perfect linkage disequilibrium with a third CNP (chr6: 32 667 947–32 668 383, R2 = 1). These results indicate that at least three sub-kilobase deletions segregate with SCZ-associated alleles in the MHC (Supplementary Table 5).
In a large Swedish sample, we found a genome-wide significant association in the MHC region for SCZ. This signal was attenuated by addition of subjects with BD. Our results are consistent with a growing body of research supporting MHC involvement in SCZ, but not in BD.10-12,14,44 This is the second most gene-dense region in the genome, and the association encompasses a multi-megabase region. In an effort to localize this signal, we imputed the MHC region using methods specifically tailored to this highly polymorphic region, and revealed nominal associations with HLA-B*0801 and HLA-A*0101 previously observed in the partially overlapping International Schizophrenia Consortium sample.10 Meta-analysis with the PGC-SCZ results also implies a role for the MHC in SCZ, and additional investigations of CNPs uncovered further potentially relevant genetic variation in this region. Association of the MHC region in this genetically similar Swedish SCZ sample provides additional evidence that a false positive due to population stratification is unlikely to underlie this signal.
Although the MHC region contained the only genome-wide significant finding, several other loci attained nominal significance and merit further investigation. For the SCZ analysis in the full sample, the regulator of G-protein signaling 7 (RGS7) gene, functioning in inhibition of signal transduction and synaptic vesicle exocytosis, demonstrated the second strongest association. The top SNP for the new SCZ sample analysis falls in the neuronally expressed gene GALNT13, identified as one of the genes driving the significance of the ‘glycan structures biosynthesis 1’ pathway in a prior report.45 The same SNP was most significant in both the BD and the combined SCZ–BD analyses; however, this region lacks known genes, and association is not supported by the PGC–BD results (rs17746001, P = 0.083).18 The larger number of SCZ subjects confers greater power for detection of associations compared with the smaller BD sample, and the lack of genome-wide significant results for the latter should be considered in this context.
Despite the smaller sample and inherent decrease in power for BD, we generally observed similar (and higher) rates of CNVs in SCZ and BD compared with controls; however, statistical significance was more often attained in the much larger SCZ sample. Direct comparison of the two case groups revealed a significant increase for large CNVs (>500 kb) in the SCZ group (P = 0.007), but not for any other size or frequency classes. This is compatible with the growing literature implicating increased burden of large, rare CNVs in SCZ;46,42 however, the role of CNVs in BD has been much less clear. We detected a significant excess of singleton deletions in BD as previously reported,47 but our results are also consistent with a prior observation of lower CNV rates in BD for other size and frequency categories,48 or no CNV enrichment in BD.49
The most significant specific CNV we observed, 16p11.2, showed more duplication events in SCZ than controls. This is not a novel finding in SCZ,23,25 nor is this locus only implicated in SCZ. Most notably, 16p11.2 duplications are one of the few CNVs with some evidence of association with BD,23 although only one subject in our BD sample had a duplication at this locus. CNVs at this locus have also shown repeated associations with autism as well as occurrences in cases of developmental delay and mental retardation, syringomyelia, and congenital kidney and urinary abnormalities.23,24,50-52 The novel, nominally associated 160-kb duplications in the combined case group at 9q34.3 encompass one uncharacterized open-reading frame and are 60 kb from the brain-expressed gene KIAA0649, which may regulate cell proliferation.
The relative homogeneity of this Swedish sample is one of its main strengths; however, replication of these findings in other (particularly non-European) samples is important. It is becoming increasingly clear that both rare CNVs and common SNPs contribute to the etiology of SCZ, although for BD, common genetic variation appears more salient. The mechanisms of action by which the genetic associations reported here and elsewhere confer risk for SCZ and BD are yet unknown. Establishing the genetic and environmental background predisposing carriers of shared risk loci to a particular diagnosis is also of critical importance. As genetic associations are identified and validated, their biological ramifications will become increasingly important to understand as we strive toward developing better diagnostic and therapeutic strategies.
We are deeply grateful for the participation of all subjects contributing to this research, and to the collection team that worked to recruit them: Emma Flordal-Thelander, Ann-Britt Holmgren, Marie Hallin, Marie Lundin, Ann-Kristin Sundberg, Christina Pettersson, Radja Satgunanthan-Dawoud, Sonja Hassellund, Malin Rådstrom, Birgitta Ohlander, Leila Nyrén, Isabelle Kizling, Louise Frisén, Inger Röhmer, Catharina Lavebratt, Malin Kärn, Martina Wennberg and Agneta Carswärd-Kjellin. We would also like to thank Professor Martin Schalling for facilitating collection of many subjects with BD. Funding support was provided by the Stanley Center for Psychiatric Research, Broad Institute from a grant from Stanley Medical Research Institute, NIMH MH077139 (PFS), the Karolinska Institutet, Karolinska University Hospital, the Swedish Research Council, ALF grant from Swedish County Council and Söderström Königska Foundation.
CONFLICT OF INTEREST The authors declare no conflict of interest.