|Home | About | Journals | Submit | Contact Us | Français|
Y.S. and L.H. conceived and designed the study. Y.S. supervised all the experiments and data analysis. Y.S., Z.L. conducted data analyses and drafted the manuscript. Y.S., Z.L., F.Z., D.S.C, S.S, D.R. and L.H. revised the manuscript. Y.S., G.F., Q.X., J.C., Y.X., D.L., P.W., P.Y., BX.L., W.S., G.Z., W.J. recruited samples. Ti.W., J.J., T.L., J.S., J.C., Q.W., W.L., L.Z., H.Z., BJ.L., C.W., S.Q., G.H. performed or contributed to the experiments. S.S., S.C., T.W., E.S., S.T., A.P., M.M.N., M.R., R.A.O., D.A.C., D.R., D.S.C., H.S., K.S. provided the SGENE-plus data. All authors critically reviewed and approved the manuscript.
Schizophrenia is a severe mental disorder affecting ~1% of the world population, with heritability of up to 80%. To identify new common genetic risk factors, we performed a genome-wide association study (GWAS) in the Han Chinese population. The discovery sample set consisted of 3,750 patients and 6,468 healthy controls (1,578 cases and 1,592 controls from the Northern Han; 1,238 cases and 2,856 controls from the Central Han; 934 cases and 2,020 controls from the Southern Han); and we followed up the top association signals in an additional independent cohort of 4,383 cases and 4,539 controls from the Han Chinese. Meta-analysis identified genome-wide significant association of common SNPs with schizophrenia on chromosome 8p12 (rs16887244, P=1.27×10−10) and 1q24.2 (rs10489202, P=9.50×10−9). Our findings provide new insights into the pathogenesis of schizophrenia.
Schizophrenia is a severe neuropsychiatric disorder, characterized by psychotic features (delusions and hallucinations), disorganization, dysfunction in normal affective responses, and altered cognitive functioning1. Recent genome-wide studies in European ancestry populations have revealed both common low risk SNPs and rare high-penetrance copy number variants predisposing to schizophrenia2-8. However much of the heritability in schizophrenia remains unaccounted for and the pathways or biological mechanisms that underlie susceptibility are still largely unknown.
To search for common variants associated with schizophrenia, we performed a genome-wide association study (GWAS) of 3,750 patients with schizophrenia and 6,468 healthy controls genotyped using Affymetrix SNP 6.0 genechips in Han Chinese from three geographic locations (Northern, Central and Southern). In total, 546,561 SNPs were used for statistical analysis after quality control (QC) in initial studies. Logistic regression was used to test the additive effects of minor allele dosage for each SNP. Potential population stratification was adjusted for using eigenvectors from principal component analysis (PCA) (Online Methods).
We carried out a combined analysis of 3 independent genome-wide cohorts as the discovery stage, and the most significant associated SNPs were followed up in additional samples. The initial studies included, i) a genome-wide association study of 1,578 schizophrenia patients and 1,592 healthy controls from the Northern Han population (Beijing and Shandong Province); ii) a genome-wide analysis of 1,238 cases and 2,856 controls from the Central Han population (Shanghai and Anhui Province); iii) a genome-wide scan of 934 cases and 2,020 controls recruited from the Southern Han population (Guangdong and Guangxi Provinces). Sample descriptions and characteristics can be found in Online Methods and Supplementary Table 1. A meta-analysis (Online Methods) was carried out based on the PCA-adjusted association results of the three cohorts (Supplementary Figure 1). For all the reported SNPs, the p values of the meta-analysis are identical for fixed- and random-effects models of meta-analysis, and no heterogeneity between the groups was found (Supplementary Table 2). Though no SNP showed genome-wide significance (P<5.0×10−8), four SNPs in the chromosome 8p12 (rs1488935, Pgwas-meta=2.81×10−6) and 1q24.2 (rs10489202, Pgwas-meta=2.65×10−6; rs1060041, Pgwas-meta=4.11×10−6; rs11586522, Pgwas-meta=1.53×10−6) regions showed strong association (Pgwas-meta<5.0×10−6, Table 1, Supplementary Figure 2, Supplementary Table 2). We followed up the top SNPs in these regions (3 SNPs for 1q24.2 and 2 SNPs for 8p12) in an independent cohort of 4,383 schizophrenia cases and 4,539 healthy controls, which, together with the discovery sample set, we defined as BIOX. We successfully replicated the association findings on 8p12 and 1q24.2, and the combined results of the initial associations and replications showed genome-wide significance (8p12: rs16887244, PBIOX-meta = 1.27×10 −10, G allele odds ratio (OR)=0.84, 95% confidence interval (CI)=0.79-0.88; rs1488935, PBIOX-meta = 5.06×10−9, T allele OR=0.85, 95% CI=0.81-0.90; & 1q24.2: rs10489202, PBIOX-meta =9.50×10−9, A allele OR=1.23, 95% CI=1.15-1.32; Table 1). To search for additional associations of common variants, we also carried out imputation analysis in the selected regions (Figure 1).
We further validated our top association signals by obtaining results from a GWAS of the SGENE-plus project (aims to identify genetic variants associating with schizophrenia, and study their impact on phenotype and their interactions with environmental factors contributing to the pathogenesis of the disease); this included 3,830 cases and 14,724 controls of European ancestry. Both SNPs rs16887244 (PSGENE-plus=0.026, G allele OR=0.92, 95% CI=0.85-0.99) and rs1488935 (PSGENE-plus=0.027, T allele OR=0.92, 95% CI=0.85-0.99) showed nominal association with schizophrenia in the European ancestry population, and the direction of the effect was consistent with our findings (Supplementary Table 2 and 3). The effect size of rs10489202 (A allele OR=1.01, 95% CI=0.94-1.09) was small in the SGENE-plus data (Supplementary Table 3). A statistical power comparison between the two datasets was also carried out (Supplementary Table 4).
On 8p12, both SNPs rs16887244 and rs1488935 demonstrated genome-wide significance in the combined analysis (Table 1, Figure 1a). Controlling for rs16887244, conditional logistic regression analysis revealed that there were no additional association signals (Supplementary Table 5). SNP rs16887244 is located in intron 1 of LSM1 (MIM: 607281); rs1488935 is located in intron 23 of WHSC1L1 (MIM: 607083), a gene which is related to the Wolf-Hirschhorn syndrome 1 candidate-like1 gene. We used data from two published expression quantitative trait loci (eQTL) datasets (derived from lymphoblastoid cell lines9,10 and the brain11) to determine whether rs16887244 is associated with the expression of the genes nearby. In the expression data of the lymphoblastoid cell lines (Supplementary Table 6), rs16887244 is nominally associated with the expression of several genes (ASH2L, LSM1, BAG4, DDHD2, and PPAPDC1B; P < 0.05), and in the expression data of the brain (Supplementary Table 7), it is nominally associated with genes ASH2L, DDHD2, PPAPDC1B and LETM2 (P < 0.05).
Notably, rs1488935 is located ~135kb upstream of FGFR1 (MIM: 136350), the fibroblast growth factor receptor 1 gene. Hippocampal FGFR1 mRNA expression has been reported to be upregulated in schizophrenia and major depression patients12 and transgenic mice expressing a dominant negative mutant [FGFR1(TK-)] from the catecholaminergic, neuron-specific tyrosine hydroxylase (TH) gene promoter have been found to exhibit a schizophrenia-like syndrome13. Terwisscha et al. in a review found several lines of evidence, including functional plausibility, positional and functional genetic studies, knockout mouse models, effects of using FGF in animals and humans, and associations between FGFs and environmental risk factors of schizophrenia, supporting a role for FGFs in schizophrenia14. What is more O’Donovan et al. have also reported another fibroblast growth factor receptor gene FGFR2 associated with schizophrenia (P=0.0009)15.
On 1q24.2, rs10489202 is located in intron 1 of BRP44 gene, the brain protein 44 gene (Figure 1b), rs1060041 is a coding-synonymous SNP in DCAF6 (MIM: 610494) and rs11586522 is located in intron 2 region of GPR161 (MIM: 612250). Controlling for the most significant signal rs10489202, logistic regression analysis indicated that no additional signal was observed in the genomic region around rs10489202 (Supplementary Table 5). In the expression data of the lymphoblastoid cell lines (Supplementary Table 6), rs10489202 is nominally associated with the expression of several genes (MPZL1, DCAF6, and TIPRL; P<0.05), however, in the expression data of the brain, we observed no association (Supplementary Table 7). Interestingly, rs10489202 is also ~140kb downstream of MPZL1 (MIM: 604376), the myelin protein zero like 1 or protein zero-related gene. Tkachev et al.16 found that MPZL1/PZR was significantly upregulated in schizophrenia patients compared with healthy controls (1.29-fold, P=0.0263). Common SNPs in MPZL1/PZR gene were recently reported to be associated with schizophrenia in 523 case-parent triad family of Han Chinese (P=0.0017)17.
Population stratification is a major concern in GWAS that uses case-control design. It has been demonstrated using a set of GWAS datasets that there is a “north-south” population structure in China18. The PCA analysis of all our initial study samples also revealed such a structure (Supplementary Figure 3 and 4).We therefore stratified our samples into Northern, Central and Southern groups according to their geographical regions, and carried out GWAS on individual groups, then obtained the combined association using meta-analysis.
The MHC region on chromosome 6p21.3-22.1 has previously been associated with schizophrenia in GWASs of European ancestry populations4-6, and these findings have been replicated in a large sample of Han Chinese population19. In our initial study, no common SNP in these regions met the selection criteria (Pgwas-meta<5.0×10−6). However, 149 SNPs out of 1,786 (8.34%, Pgwas-meta<0.05) showed nominal association signals with schizophrenia in the MHC region (Supplementary Table 8), and the most significant signal was observed at rs2394514 (Pgwas-meta=1.16×10−5), which is located at 6p22.1. We found no significant association at rs131940534,5 and rs69325906 from 6p22.1, and rs31312966 from 6p21.32, which was previously reported as genome-wide significant associations (P<5.0×10−8) with schizophrenia in European ancestry populations (derived from the NHGRI GWAS Catalog20, full SNPs are listed in the Supplementary Table 9). We compared the allele frequencies of these SNPs between HapMap CEU and HCB samples (Supplementary Table 10). The allele frequencies for most of the SNPs obviously differ between the European and Chinese populations. In addition, we measured linkage disequilibrium (LD), r2 values of these SNPs with the 149 SNPs (in the MHC region with Pgwas-meta < 0.05 in our study) in HapMap CEU and HCB samples (Supplementary Table 11 and 12). SNP rs7749823 which is in LD with rs13194053 (r2=0.644) and rs6932590 (r2=0.623) in the European populations shows association with schizophrenia in our study (Pgwas-meta=4.18×10−5, A allele OR=1.64, Risk allele frequency (RAF)=0.97). Interestingly, haplotype analysis of the HapMap CEU data shows the frequency of haplotype TTA of rs13194053-rs6932590-rs7749823 is 0.808. As both T alleles of rs13194053 and rs6932590 are risk alleles for schizophrenia in the European population studies, our association signal with the A allele of rs7749823 of the three marker haplotype in the Han Chinese population is consistent with the European findings.
We also examined other known schizophrenia susceptibility loci (NRGN, 11q24.2; and TCF4, 18q21.2) which were reported as genome-wide significant in the published studies of schizophrenia (Supplementary Table 9). However, none of the SNPs within the loci in our dataset show association with schizophrenia. However, genetic heterogeneity obviously exists for schizophrenia risk variants across ethnicities (Supplementary Table 10). This reinforces the need for genome-wide association studies of schizophrenia to be carried out in diverse populations.
In summary we have identified common genetic variants in 8p12 and 1q24.2 to be associated with schizophrenia in the Han Chinese population. Several promising candidate genes are implicated in these two regions. This makes it difficult to decide precisely which genes contain the causative variants in these two chromosomal regions. Nevertheless identification of these novel common genetic risk variants that predispose to schizophrenia is an encouraging first step in a process that has the potential to be translated into improved methods for the prediction/treatment of the disease.
All schizophrenia patients analyzed in the BIOX study were interviewed by 2 independent psychiatrists, diagnosed according to DSM-IV criteria and had a 2-year history of schizophrenia. All cases met 2 criteria: preoccupation with one or more delusions and frequent auditory hallucinations; however, none of the following symptoms were prominent: disorganized speech, disorganized or catatonic behavior, or flat or inappropriate affect. The schizophrenia cases in the initial and replication stages of our study were recruited using the same diagnostic criteria.
Healthy controls were randomly selected from Chinese Han volunteers who were requested to reply to a written invitation to evaluate their medical history. Practice lists of controls were screened for potentially suitable volunteers by excluding subjects with major mental illness.
The discovery phase (BIOX GWAS) includes three datasets: the Northern Han sample set of 1,578 cases and 1,592 controls were recruited from Beijing and Shandong Province; the Central Han sample set of 1,238 cases and 2,856 controls were recruited from Shanghai and Anhui Province; and the Southern Han sample set of 934 cases and 2,020 controls were recruited from Guangdong and Guangxi Provinces. Sample descriptions can be found in the Supplementary Table 1.
The replication stage (BIOX FOLLOW-UP) included 4,383 schizophrenia cases and 4,539 healthy controls recruited from Shanghai. All subjects in the replication stage were unrelated and born in Shanghai and their parents had to be local residents of Shanghai as well. This replication sample is expected to have minimal population stratification given that Shanghai used to have the strictest resident-registration system in China before the 1980s. Sample descriptions and characteristics can be found in the Supplementary Table 1.
The SGENE-plus samples included 513 patients and 471 controls from Denmark/Copenhagen; 93 schizophrenia patients and 88 unrelated controls from England; 182 schizophrenia patients and 197 controls from Finland; 1048 patients and 971 controls from Germany; 531 cases and 11,615 controls from Iceland; 84 patients and 89 controls from Italy; 693 patients and 629 controls from The Netherlands; 658 schizophrenia cases and 661 controls from Scotland. All case samples were diagnosed according to ICD-10 or DSM-IV criteria. These samples are the same as the genome-wide typed samples used in a previous Nature publication6 with two exceptions: slight changes in the Icelandic schizophrenia patients used (~10% difference), and two newly genome-wide typed groups from Netherlands and Denmark.
All participants provided written informed consent. Approval was received for our study from the local Ethics Committee of Human Genetic Resources. Other details of sample description were listed in the Supplementary Sample Information.
EDTA anti-coagulated venous blood samples were collected from all participants. Genomic DNA was extracted from peripheral blood lymphocytes by standard procedures using Flexi Gene DNA kits (Fuji), diluted to working concentrations of 50 ng/μl for SNP chip genotyping, and 10~20 ng/μl for replication genotyping.
The genome-wide scan was performed using the Affymetrix Genome-Wide Human SNP Array 6.0. Quality control (QC) filtering of the GWAS data was performed by excluding arrays with a Contrast QC < 0.4 from further data analysis. Genotype data were generated using the birdseed algorithm21. For sample filtering, arrays that generated genotypes at < 95% of loci were excluded. For SNP filtering (after sample filtering), SNPs with call rates < 95% in either cases or controls were removed in each geographical group. SNPs with minor allele frequency (MAF) < 1% or significant deviation from Hardy Weinberg Equilibrium (HWE; P ≤ 1×10−3) in controls were also excluded. SNPs passing QC in the Northern, Central and Southern Han groups were used for further analysis. After QC, there were 546,561 SNPs remaining for the combined initial study.
We selected all SNPs with Pinitial-meta<5.0×10−6 for the replication study, according to a previous GWAS22. To ensure that there were at least 2 SNPs included for each region, we added the 2nd most significant SNP among the 8p12 findings for replication. Genotyping for the replication study was performed using the Ligation Detection Reaction (LDR) method23,24, with technical support from Shanghai Biowing Applied Biotechnology Company.
Population substructure was evaluated using principal components analysis as implemented in the software EIGENSTRAT25,26. 20 principal components (PCs), some of which will reflect ancestry differences among subjects, were generated for each subject. Logistic regression was used to determine whether there was a significant difference in PC scores between cases and controls; significant PCs were used as covariates in the association analysis to correct for population stratification. We identified 6 significant PCs (for the Northern data set), 8 significant PCs (for the Central data set), and 4 significant PCs (for the Southern data set).
We selected the two new identified potential susceptibility loci (1q24.2, rs10489202 ± 350kb and 8p12, rs16887244 ± 350kb) for imputation. Hapmap SNPs in the 2 regions were imputed using MACH 1.0. Phased haplotypes for 90 CHB+JPT subjects (180 haplotypes) were used as the reference for imputing genotypes. Any SNP imputed with information content r2<0.3 was excluded from association analysis because of lack of power. The criteria for SNP QC filtering were the same as the genotyped ones.
Association of single SNPs with schizophrenia was tested by logistic regression using PLINK27, separately for the Northern, Central and Southern data sets, correcting for principal component scores that had statistically significant differences between cases and controls. The Hardy-Weinberg-Equilibrium analysis was performed using PLINK, and Haploview28 was used for the genome wide P value plot. The Q-Q plot was created using the R package. The regional plots were generated using LocusZoom (see URLs). In the replication study, allelic association analysis was conducted using SHEsis29. A meta-analysis using the random-effect model was carried out based on the PCA-adjusted association results of the three cohorts in the initial GWAS, the inflation factor λ value was 0.97. The meta-analysis was conducted using PLINK. Heterogeneity across the data sets was evaluated using Cochran’s Q test. The meta-analysis was carried out using the Mantel-Haenzel method with a random-effects model30. Conditional logistic regression was used to test for independent effects of an individual SNP22,31. Based on the sample size of the BIOX and SGENE-plus datasets, we compared the statistical power between BIOX and SGENE-plus (Supplementary Table 4; alpha=5×10−8)5,32.
Expression profiles were analyzed within two eQTL datasets (lymphoblastoid cell lines and brain). We downloaded the expression data sets from NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/). The first dataset, published by Stranger and colleagues9,10, consists of gene expression profiles generated using RNA extracted from lymphoblastoid cell lines of all 210 unrelated HapMap individuals from four populations (CEU: 60 Utah residents with ancestry from northern and western Europe; CHB: 45 Han Chinese in Beijing; JPT: 45 Japanese in Tokyo; YRI: 60 Yoruba in Ibadan, Nigeria). Expression analysis was performed using Sentrix Human-6 Expression BeadChip (Illumina, San Diego, California, United States). The SNP genotypes from phase II HapMap were used in the analysis. The second eQTL dataset11 consists of gene expression profiles for frozen tissue samples obtained from four brain regions (cerebellum, pons, frontal and temporal cortices) of 143 neurologically normal subjects of European ancestry. Expression analysis was performed using Illumina HumanRef-8 Expression BeadChips, and SNP genotyping was performed using Infinium HumanHap550 BeadChips. The expression data were normalized and log transformed as described in the original reports9-11. The eQTLs were tested by linear regression of normalized expression levels on SNP genotypes (coded as the number of minor alleles at each SNP: 0, 1 or 2). For the lymphoblastoid cell lines dataset, the analyses were conducted for each population and the combined dataset. For the brain dataset, the analysis of each tissue region was performed separately.
We are deeply grateful to all the participants as well as to the doctors working on this project. The authors also thank editors and anonymous reviewers for their valuable comments on the manuscript. This work was supported by the 863 Program (2006AA02A407, 2009AA022701), the Natural Science Foundation of China (31000553), the 973 Program (2010CB529600), the Funding Awarded to Authors for National Excellence of Doctoral Dissertation Research (201026), the Program for New Century Excellent Talents in University (NCET-09-0550), the Shanghai Municipal Health Bureau program (2008095), the Shanghai Changning Health Bureau program (2008406002), the Shanghai Municipal Commission of Science and Technology Program (09DJ1400601), and National Key Technology R&D Program (2006BAI05A09), and the Shanghai Leading Academic Discipline Project (B205).
Competing Financial Interests
The authors declare no competing financial interests.
Haploview program, http://www.broad.mit.edu/mpg/haploview
The International HapMap project, http://www.hapmap.org
NCBI GEO, http://www.ncbi.nlm.nih.gov/geo/