|Home | About | Journals | Submit | Contact Us | Français|
A recent genome wide association study reported evidence for association between rs1344706 within ZNF804A (encoding zinc finger protein 804A) and schizophrenia (P=1.61 ×10−7), and stronger evidence when the phenotype was broadened to include bipolar disorder (P=9.96 ×10−9). Here we provide additional evidence for association through meta-analysis of a larger dataset (schizophrenia/schizoaffective disorder N = 18945, schizophrenia plus bipolar disorder N =21274, controls N =38675). We also sought to better localize the association signal using a combination of de novo polymorphism discovery in exons, pooled de novo polymorphism discovery spanning the genomic sequence of the locus and high density LD mapping. Meta-analysis provided evidence for association between rs1344706 that surpasses widely accepted benchmarks of significance by several orders of magnitude for both schizophrenia (P=2.5 ×10−11, OR=1.10, 95% CI 1.07–1.14) and schizophrenia and bipolar disorder combined (P=4.1 ×10−13, OR=1.11, 95% CI 1.07–1.14). After de novo polymorphism discovery and detailed association analysis, rs1344706 remained the most strongly associated marker in the gene. The allelic association at the ZNF804A locus is now one of the most compelling in schizophrenia to date, and supports the accumulating data suggesting overlapping genetic risk between schizophrenia and bipolar disorder.
Genetic epidemiology reveals that genes account for more than 80% of the population variance in risk (1,2) of schizophrenia (SCZD[MIM 181500]). Numerous genetic associations to schizophrenia have been reported, but it remains widely disputed which of these, if any, are true associations (3). The advent of genome-wide association study (GWAS) technology has recently proven to be successful in allowing the identification of strongly supported genetic associations for many other common phenotypes, and this is now true for schizophrenia(4,5,6,7)
Recently, we undertook a GWAS of 479 UK schizophrenia cases and 2937 controls(4) with follow up of the strongest findings in approximately 17,000 subjects. In the combined dataset of around 20,000 subjects, the SNP with the strongest evidence for association (P=1.61 ×10−7) to schizophrenia was rs1344706 within ZNF804A (encoding zinc finger protein 804A). While this falls short of a widely accepted threshold for genome wide significant association of P < 7.2 × 10−8(8), we did obtain evidence that surpasses this (P = 9.96 × 10−9) when the phenotype was broadened to include patients with bipolar disorder, a phenotype for which there is considerable overlap in clinical features and increasing epidemiological and molecular genetic evidence for shared genetic risk with schizophrenia(9). Subsequently independent associations between schizophrenia and the same allele of rs1344706 have been reported by the International Schizophrenia Consortium(5), The Irish Case/Control Study of Schizophrenia(10), and the SGENE-plus consortium(11). In of a total of over 5000 patients with psychiatric disorders, the latter(11) also reported three copy number variants (CNVs) at the ZNF804A locus, a deletion in an individual with schizophrenia, a duplication in an individual with bipolar disorder, and a duplication in an individual with anxiety disorder. This contrasted with no CNVs at the locus in almost 40,000 controls (P=0.0016). From additional reported datasets, they also noted duplication CNVs in 3 people with autism, and an additional 3 CNVs out of a total of about 12000 controls, two of whom were under 18 and were therefore not past the characteristic age at onset for schizophrenia. Overall, the CNV data suggest that rare structural variants at the ZNF804A locus may also be involved in risk of a range of psychiatric disorders, though support for this is not unequivocal.
Since our GWAS(4) was based upon fewer than 1/20th of all common SNPs in the genome, it seemed unlikely that the best associated SNP in our study was the true functional variant. In the present study, we tried to localize the association signal using a combination of de novo polymorphism discovery targeted at exons, pooled de novo polymorphism discovery spanning the genomic sequence of the locus, and high density LD mapping. rs1344706, the SNP originally highlighted by the original GWAS study, remained the most strongly associated SNP at the locus. The evidence for association for rs1344706 was further tested by a meta-analysis of around 60,000 subjects (schizophrenia/schizoaffective disorder N = 18945, schizophrenia plus bipolar disorder N =21274, controls N =38675) derived from the original publication and additional datasets. The overall statistical support in schizophrenia surpassed genome wide significance (P=2.5 ×10−11) by more than 3 orders of magnitude, as did the support in schizophrenia and bipolar disorder combined (P=4.1 ×10−13). Moreover, the evidence for association in replication samples (i.e. after exclusion of the discovery GWAS) was also genome wide significant, making rs1344706 in ZNF804A the most robustly supported common allele for schizophrenia reported to date. The strength of the findings also strongly support accumulating data that genetic risk for schizophrenia and bipolar disorder(12) overlap.
Coding sequences were screened for polymorphisms in 163 UK individuals with schizophrenia and 135 UK blood donor controls. Genomic sequences corresponding to all exons were extracted from the UCSC Genome Browser (March 2006), based upon Refseq transcript NM_194250. Amplicons with a maximum length of 500 bp (n=13) were designed to span coding exons plus a minimum of 72 bases of flanking sequence. Where multiple amplicons were required to cover an exon, these overlapped by a minimum of 81bp. High Resolution Melting Analysis (HRMA) was performed for 11 amplicons using a LightScanner™ (Idaho technologies) according to the manufacturer’s instructions. Potential DNA variants identified by this approach were characterised by sequencing the relevant PCR product using BigDye chemistry on a 3100 capillary sequencer (Applied Biosystems). Two amplicons failed to optimise in the presence of the HRMA detection reagent and were screened for variants by sequencing. Identified SNPs were genotyped in the CEU HapMap trios to assess their LD relationships, and to detect Mendelian inconsistencies (of which there were none).
To identify SNPs that might be eQTLs, we extracted genotype and ZNF804A lymphoblastoid expression data from the CEU HapMap samples deposited in GeneVar (http://www.sanger.ac.uk/humgen/genevar/).To analyse those data, we used the method described by the group that developed the database(13).
A two stage fatSNP study was designed to extract a high proportion of the known genetic variation spanning ZNF804A (chr2:184958600-185722680, NCBI b36), with putative functional variants given particular priority. Details of all samples used in these two stages of the study are given in our previous manuscript(4).
SNPs: We aimed to tag at r2=1 all SNPs with MAF ≥0.01 in the HapMap CEU samples plus SNPs identified by de novo polymorphism discovery of exons. This panel of SNPs was derived using Tagger (Haploview, version 4.1(14)) with forced inclusion of all synonymous and non-synonymous variants identified by de novo discovery and from public databases, and all SNPs associated with expression of ZNF804A mRNA in the GeneVar database. We also force included SNPs previously genotyped in the previous GWAS(4).
Samples: In fatSNP1, the resultant set of SNPs (excluding those for which data were already available from the GWAS study) were genotyped in a subset of the GWAS samples from the earlier study(4). This fatSNP1 sample comprised 479 UK cases with schizophrenia and the 1958 birth cohort control sample (N=1445) used by the Welcome Trust Case Control Consortium(15) that we had also used in our GWAS study. The results of fatSNP1 were used to select SNPs for follow up in additional samples using criteria detailed below.
SNPs: For follow up in fatSNP2, we identified all markers that were associated at P ≤ 1×10−4 in fatSNP stage 1, or that had surpassed that threshold in the GWAS study. We additionally followed up non-synonymous SNPs that were associated at P ≤ 0.10 in those analyses. To prune the marker set, we applied a highly conservative definition of non-redundancy, removing one of any pair of markers that that were highly correlated (r2≥0.97) in the fatSNP1 sample.
Samples: For fatSNP phase 2, we genotyped the UK National Blood Service (NBS) controls (N=1428) used by the Welcome Trust Case Control Consortium(15) that we had also used in our GWAS study, and additional UK schizophrenia samples (N=163) for whom GWAS array data were not available, that we had used as part of the follow up sample in the previous publication(4). The combined set of UK cases (N=642) and blood donor/1958 birth cohort controls (N = 2873 available for in house genotyping, N = 2937 with Affymetrix 500K data) are subsequently referred to for brevity as the Cardiff Full sample. fatSNP2 also included 3 additional case control series from Bulgaria, Ireland (referred to as the Dublin sample) and Germany (referred to as the Munich sample), the combination of which is referred to for brevity (and consistency with the previous paper) as REP1 and comprises an additional 1664 cases and 3541 controls. All cases met DSMIV criteria for schizophrenia; full details of these samples are given in our previous study(4). The results of the REP1 sample were combined with those of the Cardiff Full sample to derive an overall fatSNP2 P-value as described below.
Most SNP assays for the fatSNP studies were genotyped in the Cardiff laboratory using the Sequenom iPlexGold system. Additional SNPs were genotyped in Cardiff using TaqMan (on demand) probes (Applied Biosystems; rs17584522) or Amplifluor AssayArchitect (Millipore; rs12476147) using an Analyst™ AD fluorometer (Molecular Devices). Additionally, rs17584522 was genotyped by the Dublin group with TaqMan (on demand) probes using a 7900HT Sequence Detection System (Applied Biosystems). For SNPs for which we could not design a reliable assay using either method, we genotyped the HapMap CEU sample using SNaPshot chemistry (Applied Biosystems) and a 3100 capillary sequencer (AppliedBiosystems). For rs5836928, a 3bp insertion deletion, we genotyped the HapMap CEU sample using fluorescently tagged primers with genotypes discriminated by size using a 3100 capillary sequencer (AppliedBiosystems).
For Quality Control (QC) estimates we determined call rates for each SNP in all samples. Included in the Dublin and Munich samples were duplicated DNA samples. Within the UK, Bulgarian and Dublin sample, we included samples from the HapMap CEU population to determine genotyping congruence against the HapMap data. In the sample genotyped by the Dublin group for marker rs17584522, there were both duplicate samples and HapMap samples with which to check genotyping QC.
fatSNP statistical analyses were as in the previous study(4). We used a trend test for within sample analysis and to combine the data across samples, a Cochran Mantel Haenszel test conditioning by site as implemented in PLINK version 1.05(16).
After completion of fatSNP stage 2, we aimed to identify SNPs within the wider genomic regions not targeted by the exon-focussed de novo discovery phase. This was performed using 60 cases drawn randomly from the Cardiff Full sample. From these, we generated 6 pooled samples each containing DNA from ten samples. PCR primers were designed to span 381,988 bp of the genomic sequence in and around ZNF804A (figure 1). Long range PCRs were generated using the Roche Expand long template PCR system. The PCR products were gel purified after electrophoresis, and sent to Illumina Inc (SanDiego) for sequencing using Solexa technology. The resultant single-end reads were assembled and analysed for putative SNPs using both DNAStar (http://www.dnastar.com) and MAQ (http://maq.sourceforge.net). We used the default parameters for DNAStar analysis. The MAQ output was filtered using previously published thresholds(18).
Putative SNPs identified after genomic resequencing were annotated according to the Illumina OPA (Oligo Pool All) design criteria, and the files were submitted for assay design (http://illumina.com). Designed SNPs were genotyped in the CEU trios by the Cardiff laboratory using the GoldenGate® Genotyping assay for VeraCode® and analysed using a BeadXpress instrument using VeraScan 1.1 (Ilumina. Inc). BeadStudio V3.2 (Illumina Inc) was used to make genotype calls and compile the data output.
To isolate cis-effects from trans-effects on gene expression in human brain, we measured the relative expression of each parental copy of ZNF804A in post mortem brain mRNA taken from individuals heterozygous for expressed SNP rs4667001, a marker that is in strong LD (D’=1) with rs1344706 in that brain series (D’=0.95 in our association sample). We have described the methodology and the samples (derived from cerebral cortex of 149 unrelated anonymous individuals (86M, 63F; mean age = 58, SD = 19)) in detail elsewhere(19). Since this method uses the level of mRNA expressed from one parental chromosome to control for that expressed by the other, the assay is controlled for (and therefore requires no adjustment for) common confounders that affect more standard measurements of mRNA in native human tissue (e.g. drug exposure, agonal state, post-mortem delay, pH, age etc).
To further test the evidence for association, we combined the data for rs1344706 from our earlier publication(4) with data from 4 published GWAS studies, one unpublished Swedish GWAS dataset (termed SW3, from P Sullivan/C Hultman), data from additional samples we genotyped in house, and data from a ZNF804A candidate gene study from a group who had contacted us prior to genotyping their sample(10) and whose data are therefore unbiased with regard to the result.
The numbers of cases and controls for each of the samples we included in the meta-analysis are given in supplemental table 5. Details of the samples from the published GWAS studies of the International Schizophrenia Consortium(5), the Molecular Genetics of Schizophrenia (MGS) consortium(7), the SGENE plus consortium(6),and of Lencz and colleagues(20) are given in the primary GWAS publications. The SW3 sample had been genotyped using an Affymetrix 6 chip at the Broad Institute of MIT and was part of the same on-going patient collections as samples SW1 and SW2 from the ISC data set(21). Details of the samples used in the candidate gene study of ZNF804A based upon the Irish Case/Control Study of Schizophrenia (ICCSS) sample are also available in a primary manuscript(10). It should be noted that those samples do not overlap with any of the Irish samples provided by the Dublin group.
Further to their GWAS study, the SGENE-plus and GROUP consortia obtained genotypes on additional samples from Europe and China (2865 schizophrenia cases; 4493 controls) using a Centaurus assay (Nanogen). Details of these are given in reference (11) but briefly, these comprise schizophrenia case/control samples from China (460/466), Denmark/Aarhus (236/500), Denmark/Copenhagen (513/1338), Germany/Bonn (275/510), Germany/Munich (178/ 320), Hungary (264/ 223), Norway (201/357), Russia (483/ 487) and Sweden (255/292).
We additionally obtained genotype data on a sample (488 cases 540 controls) collected by the Pittsburgh group, details of which are given in reference (22). The group from Dublin (Corvin and colleagues) provided genotypes based upon a Taqman assay for an extra set of 352 Irish cases and 178 Irish controls that had not been included in fatSNP2. The diagnostic and ascertainment practices were as for the part of their sample described in reference (4).
Subjects that overlapped between studies were removed as follows. In our previous paper, we reported data from the MGS European American sample. However, as the GWAS sample from that group was larger than we had used we include here the MGS GWAS data rather than our earlier data. The Bulgarian and Irish samples included in fatSNP2 substantially overlap with samples used by the ISC; the ISC data for those populations were excluded. A dataset from Aberdeen was included in the GWAS of both the ISC and SGENE-plus. The SGENE plus data were excluded. It should be additionally noted that the samples from Bonn, Munich, and China that were included in the SGENE-plus study(11) did not overlap with the samples we used in our previous paper(4). We did not include any data from the GWAS of Need et al. (2009)(23) as all those data are subsumed in other datasets included here. We did not include data from the CATIE study(24) as the controls from that study substantially overlap with those used by the MGS study(7).
The use of unadjusted genotype counts is not appropriate for some samples (e.g. SGENE-plus because of relatedness among the Icelandic Groups and MGS because of extensive structure in the dataset), therefore, we used the inverse variance method of meta-analysis. For each study, variance was calculated based on the 2×2 contingency table counts, or the 95% confidence intervals values provided by SGENE plus, the MGS, and the unpublished Swedish GWAS. Heterogeneity between studies was assessed using Cochran’s Q statistic.
Given that we had previously reported stronger evidence for association findings with the inclusion of bipolar samples, to the above meta-analysis, we added the UK bipolar dataset of the WTCCC(15). Data for additional bipolar samples from Iceland (n=404) and Norway (n=205) were also provided by the SGENE plus/GROUP consortia. To avoid the problem of shared controls, as before(4) the WTCCC bipolar data were combined with the UK schizophrenia cases before testing against the UK controls. Similarly, the Icelandic and Norwegian cases were combined with the respective schizophrenia cases from those countries.
We detected no synonymous or non-synonymous variants in exons 1–3. In exon 4, we detected 21 variants, 14 non-synonymous and 7 synonymous (supplemental table 1).
In the GeneVar CEU database, we found 3 SNPs in high LD (r2=1) associated with expression of ZNF804A mRNA (P=0.006). An additional 16 SNPs, including the best GWAS SNP in ZNF804A (rs1344706), were associated with expression at p<0.05 (supplemental table 2). The disease associated allele of rs1344706 was associated with higher expression.
Within the target region, 887 SNPs were listed in HapMap (Rel 23a/phase II Mar08 dbSNP b126) of which 508 had MAF≥0.01. To these, we added the genotype data for 21 SNPs from our exonic de novo polymorphism detection. From these, we identified 209 non-redundant (r2=1 in CEU) markers. Of the 209 markers, we were able to obtain genotypes for 176 in fatSNP phase 1. Overall, the genotyped markers provide coverage of 91% of target alleles with MAF ≥0.01 at r2=1 and 96% coverage at r2≥0.9. For alleles with MAF ≥0.05, the respective figures are 93% and 97%. Association results are summarized in Figure 1 and in supplemental table 3. A total of 12 markers (in addition to rs1344706) met our criteria for follow up (table 1. note rs12613195 was inc1uded as it attained the threshold in the full GWAS sample. Also note rs6726421 was taken forward as a perfect proxy for the non-synonymous SNPs described in table 1). Notably, in this sub-sample of our GWAS (fatSNP1), 3 markers yielded slightly stronger evidence for association than our original GWAS marker (table 1). Of these, two (rs1583048 and rs3931790, r2=1) were the putative eQTLs with strongest evidence for association to expression in Genvar (rs7593816, an equally strong eQTL that is perfectly correlated with those SNPs failed genotyping), with the schizophrenia-associated allele being associated with higher gene expression (supplemental table 2). The other SNP, rs17584522, is intronic.
Of the 12 SNPs (table 1) meeting the association criteria for follow up, we selected 8 SNPs based upon r2<0.97 (table 2). In the Cardiff Full sample, only intronic marker rs17584522 was associated with schizophrenia at a level of significance within 1 order of magnitude of that of rs1344706, the initial ‘hit-SNP’ from our earlier paper (table 2). Table 2 shows that all SNPs tested in fatSNP phase 2 are weakly-moderately correlated with rs1344706 (r2max =0.43) but are in moderately strong LD (D’min=0.7). To test whether any of these associations are independent of rs1344706, we performed forward stepwise logistic regression analysis in the Cardiff Full sample, including all fatSNP phase 2 SNPs. Only one SNP (rs12613195) was nominally significant (P=0.021) after allowing for the effects of rs1344706 (data not shown). Given the multiple testing of SNPs, we conclude that there is no convincing evidence for an association signal independent of rs1344706. Moreover, haplotype analysis based upon fatSNP phase 2 markers did not produce results more significant than that of single marker analysis (data not shown). A combined analysis of the fatSNP2 samples revealed rs1344706 as the most significantly associated SNP (P=8.31×10−6). Single locus data for each of the individual samples are given in supplemental table 4.
In each fatSNP2 sample, SNPs had call rates in cases and controls ≥97% with the exception of rs12476147 in the UK (92% in cases ), rs12613195 in the Munich (96% in cases) and both rs1344706 (93% in cases) and rs17584522 in the Dublin sample (95% and 93% in cases and controls respectively) (supplemental table 4). In the UK samples, the more complete imputed genotypes for rs12476147 gave similar results to the array SNP. No marker in any of the populations studied had a HWE P-value <0.05. Of ~1300 genotype pairs (HapMap or duplicates) for the 8 independent markers genotyped in our follow-up study, we observed only 1 genotyping discrepancy. For rs17584522, which was genotyped by the Dublin group, the comparison of 72 duplicate samples and 83 HapMap samples revealed no discrepancies.
Meta-analysis of rs1344706 in all available schizophrenia datasets provided strong evidence for association to schizophrenia (P=2.54×10−11) with an estimated odds ratio of 1.10 (95% CI; 1.07, 1.14) and no evidence for heterogeneity across studies (Cochran’s Q statistic, P=0.35, 22df). A forest plot of for the individual studies is given in figure 2 with further detail in supplemental table 5. A sensitivity analysis (supplemental table 6) revealed that attaining genome-wide significant evidence for association was not dependent on the inclusion of any one sample, though not unexpectedly, the least significant evidence (P=1.19×10−8) for association was observed when the discovery sample was removed, consistent with the expected inflation in the estimated effect size in that sample, though it is notable that a genome wide significant threshold was surpassed even after removal of this discovery sample.
For a subset of the non-fatSNP2 samples we were able to obtain data for additional fatSNP2 markers (or high quality proxies). Those data (plus for comparison the data for rs1344706 in the same subset) are presented in supplemental table 7. The significance of association at rs1344706 in the meta-analysis of these particular samples was more than two orders of magnitude better than those of the next best marker (rs17584522).
We designed 55 PCR amplimeres with 1kb overlaps (average amplimere length 7908 bases, range 6967–8362). The region spanned ~382.3kb (chr2:185163292-185545280, NCBI b36) (Figure 1). We were not successful in amplifying 5 of the amplimeres, representing a total of ~8% of the target sequence (29,341 bases).
We identified 825 putative SNPs. Of these, 245 were either present in the CEU HapMap or had been genotyped by us in that sample, leaving 580 potential variants whose LD relationships were unknown. We were able to design assays for 34% of these SNPs (n=198). The remaining 66% putative SNPs were comprised mainly of sites in repetitive elements (49%), with the rest (17%) in unique sequence. We attempted to confirm 19 of those SNPs that were in repetitive elements by more routine sequencing methods (BigDye chemistry, Applied Biosystems). Of these, 16 (85%) were not confirmed suggesting that the vast majority of the sites in highly repetitive elements were sequencing artefacts.
In the CEU sample, of the 198 SNP assays in our Illumina panel, 18 did not yield readable genotypes and an additional 8 assays had a call rate <0.95. In total, we were able to design assays (or obtain data from HapMap) for a high proportion (~71%) of the likely genuine SNPs (supplemental table 8). Of 172 SNPs we obtained new data for, only 22 (13%) with MAF ≥0.01 were not previously tagged at r2=1 and only 16 (9%) at r2=0.9. This low yield of untagged SNPs suggests further sequencing endeavours (or efforts to genotype the SNPs for which we could not design Illumina assays) are unlikely to extract much additional genetic information from the region. Information about the 172 confirmed SNPs is provided in supplemental table 9.
Based upon a combination of HapMap and our in house polymorphism discovery the target region contains a total of 651 confirmed SNPs with MAF ≥0.01. Of these, 87% are tagged at r2=1 by the markers we had already genotyped, while 94% are tagged at r2=0.9. Of the 13% of markers (n=80) that were not tagged at r2=1, we were able to impute 74% (n=59) at the recommended ‘good practice’(16) value of INFO ≥0.8 using PLINK and 60% (n=48) with an R2 score of ≥0.9 using Beagle, in the fatSNP1 sample. None was associated with schizophrenia at a threshold P<1×10−4 and none was highly correlated with rs1344706 (supplemental table 10). Including these successfully imputed SNPs as if they were directly genotyped, we estimate that our coverage of all known SNPs in the region with MAF ≥0.01 is increased to 97% (r2=1) and to 99% for r2=0.9. Importantly, of the markers (21 out of a total of 651 confirmed markers) that we could not impute, none was even moderately highly correlated with rs1344706 (max r2 =0.13) making it very unlikely they could account for the strong association signal seen at that locus.
To estimate the effectiveness of common variant discovery in the pooled sequencing analysis, we determined the number of known HapMaP SNPs with a frequency >0.10, and >0.20 that were detected by pooled genomic sequencing in our sample. At these thresholds, we detected respectively 85% and 87% of known CEU HapMaP SNPs. These data suggest that the efficiency of mutation discovery was good for alleles with MAF >0.10. Moreover, as discussed above, given the coverage we already had for the novel SNPs we detected, it is unlikely that detection of additional SNPs would have provided much additional genetic information with respect to common alleles.
To identify cis-acting eQTL effects on expression of ZNF804A, we assayed mRNA samples from 34 individuals heterozygous for non synonymous SNP rs4667001 (figure 3). The G allele at this locus was associated with a 1.13 fold increase (SD =0.08) in ZNF804A expression (t-test P=2.59×10−7, unequal variance). rs4667001 has D’=1 with rs1344706 in this sample. The higher expression G allele is always in phase with risk allele rs1344706T, this haplotype representing ~75% of all rs1344706T risk alleles. The under-expressed A allele at rs4667001 resides on a mixture of haplotypes of which the majority (>70%) carry the non-risk allele rs1344706G. Thus, consistent with our analysis of the GeneVar data, the risk allele at rs1344706 is generally carried by a higher ZNF804A expression haplotype. If rs1344706 is per se the eQTL responsible for association between that SNP and ZNF804A expression, we expect only those subjects that are heterozygous for it to show unequal allelic expression (since homozygotes for rs1344706 would carry two functionally equivalent eQTL alleles). Our observed data were not compatible with this, there being no difference in the degree of differential expression in rs1344706 homozygotes compared with heterozygotes (t-test P=0.84, figure 3). This suggests that while rs1344706 is associated with expression, it is not responsible for it, or at least that it is not the only cis-acting eQTL. Similarly, analyses of the most strongly associated eQTLs from GeneVar revealed they are also associated with expression, but the comparison of homozygotes and heterozygotes for each putative eQTL suggest none are likely the causal eQTLs (supplemental figure 1).
Following our original GWAS (4), association between schizophrenia and rs1344706, which lies in intron 2 of ZNF804A, has been independently replicated in three studies(5,10,11). Since our study was based upon fewer than 5% of all common SNPs in the genome, we assumed that rs1344706 was unlikely to be the susceptibility variant. Thus, we undertook a fine-mapping study, the aim of which was to identify the variant that was directly responsible for association. In spite of extensive investigation, we were unable to detect a more strongly associated variant. However, meta-analysis based upon ~60,000 subjects provided very strong evidence for association between rs1344706 and schizophrenia (P=2.54×10−11) and also a combined schizophrenia bipolar phenotype (P=4.1×10−13).
De novo polymorphism discovery based upon mutation scanning of all ZNF804A coding exons did not reveal evidence for the existence of a common non-synonymous variant (MAF≥0.01) that explains the original signal in our sample (table 1), in the fatSNP2 meta-analysis (table 2) or in the larger meta-analyses of those samples for which data were available for these SNPs (supplemental table 7). The second approach we applied was to test SNPs that, in the GeneVar database, were associated with ZNF804A expression. In fatSNP phase 1, two of these markers were more significantly associated with schizophrenia than rs1344706 (table 1). Since those two markers were in perfect LD, we followed up only one, rs1583048. As was the case for the non-synonymous variants, the evidence for association in fatSNP2 (table 2) and in the larger meta-analyses of those samples for which data were available for these SNPs (supplemental table 7) was much weaker than for rs1344706. Therefore it is likely that the signal observed at rs1583048 derives from LD (D’=1) with rs1344706. That the signals are not independent was also supported by the fact that rs1583048 was not required in the regression model.
Detailed tagging analysis of the ZNF804A locus based upon HapMap SNPs did not identify any variants that were more strongly associated in the Cardiff Full sample or in fatSNP2. Addressing the possibility that there exists a common variant that might be more strongly associated than rs1344706, but that is not present in the HapMap, we undertook sequencing across most of the genomic region. Although this uncovered a number of additional variants not present in the HapMap, these additional SNPs were well covered by the existing genotyped SNPs. Despite the fact that the vast majority (99%) of all the known variation across ZNF804A could be imputed and/or tagged at least at r2>0.9, no additional markers were more strongly associated than rs1344706. Moreover, few markers were even moderately correlated with rs1344706. In supplemental table 11, we list all markers with r2>0.2 in relation to rs1344706 and their imputed p values. If rs1344706 was only weakly or moderately correlated with a true susceptibility variant, we would expect the association signal at that second SNP to be considerably stronger than observed at rs1344706. The absence of such a signal suggests none of these moderately correlated SNPs are likely to be responsible for the signal detected at rs1344706. On this basis, we conclude rs1344706 is the most likely susceptibility variant.
Several caveats should be mentioned. These are a) we identified a number of putative SNPs that we were unable to genotype, b) ~8% of the target sequence was refractory to sequencing and c) our genomic sequencing was not 100% sensitive as we identified a high proportion of, but not all, known variation in the region. Thus, we cannot conclude with certainty that the true functional variant did not elude us, but given that it was not captured in the 651 SNPs (de novo plus HapMap) we do have good coverage of, and that very little additional genetic information was extracted by the novel SNPs we did detect through sequencing the genomic region, this does not seem likely.
At the outset, we assumed it unlikely that a GWAS study would identify the true pathogenic variant. However, a recent study (5) supports the existence of very large numbers of common alleles of weak effect. Power considerations suggest few of these can be expected to achieve high levels of significance even in samples substantially larger than those we used in our discovery GWAS. However, while power to identify any one specific risk allele is low, power to identify one of many alleles is enhanced if there are very many of these to be detected. One factor dictating this will be the degree to which true risk variants are tagged by array SNPs, weak effects requiring very high LD both for detection and reliable directional replication(25). It follows that those risk alleles that by chance are included on arrays are most likely to be detected and replicated. Thus, in the context of a highly polygenic disorder with weak genetic effects and a underpowered discovery sample, those true associations that are first detected by GWAS studies are, as we have observed here, likely to either correspond to susceptibility alleles, or be in perfect LD with them.
In terms of functional mechanisms, we observed that the risk allele of rs1344706 is associated with higher ZNF804A expression in our analysis of the GeneVar data. That finding is compatible with the analysis of Riley and colleagues who reported that the risk allele at rs1344706 is associated with higher ZNF804A expression in human brain(10). However, while our analysis of allele specific expression confirms that rs1344706 is generally carried on a higher expression haplotype, it does not appear to be the eQTL responsible for higher expression, suggesting that the relationship between that SNP and expression is not relevant to disease. Alternatively, there may be additional eQTLs (and susceptibility alleles) at the locus that we have failed to uncover.
In the absence of evidence for a non-synonymous SNP that explains the association or for direct effects of rs1344706 on expression, if rs1344706 is the true causal variant, its influence on gene function remains to be elucidated. It remains possible that it exerts effects through expression, but like many eQTLs, these are cell specific(26), tissue-region specific, or specific to certain developmental phases. Further expression studies in samples not available to the authors will be required to test these hypotheses. However, the observations of deletion and partial duplication CNVs at the ZNF804A in a schizophrenia and bipolar proband respectively(11) similarly suggests that simple up-regulation of ZNF804A may not be the mechanism relevant to risk to schizophrenia and other major psychiatric disorders
The establishment of ZNF804A as a risk factor for schizophrenia and bipolar disorder is one of several successes arising from recent large-scale genetic studies of major psychiatric disorders. These have implicated common alleles for psychiatric disorders at much higher levels of confidence than previous genetic approaches among which the most robust current findings in schizophrenia are ZNF804A, Neurogranin (NRGN), Trancription Factor 4 (TCF4) and a locus spanning several megabases of chromosome 6 in the region of the Major Histocompatibility Complex (4–7). In bipolar disorder, calcium channel, voltage-dependent, L type, alpha 1C subunit (CACNA1C) and ankyrin 3, node of Ranvier (ANK3) have been strongly supported as susceptibility genes (27). At least two of the specific loci, ZNF804A and CACNA1C, influence risk for both disorders (28) a finding that supports the hypothesis that schizophrenia and BD are not aetiologically distinct. Like ZNF804A, in general, the common risk alleles identified by GWAS have small effect sizes (OR<1.25), although the associated allele at ANK3 may have a somewhat larger, but still weak effect (OR~1.45) (27). It is, however, clear that many more common risk alleles remain to be identified, a substantial component (at least 30%) of the variance in risk of schizophrenia being attributable to risk alleles of very small effect, and many of these also influence risk of bipolar disorder (5). In the case of schizophrenia, several rare susceptibility CNVs have also been detected; in contrast to common alleles, these have fairly large effect sizes (OR>3) on disease risk. However, like the common alleles, these CNVs are not specific to individual disorders as defined by widely used classification systems; the same CNVs associated with schizophrenia additionally influence risk of other neurodevelopmental disorders such as autism, epilepsy, and mental retardation(12). Although very little of the genetic risk of either schizophrenia or bipolar disorder is currently explained, there are grounds for optimism that larger studies will reveal more about the origins of these disorders. Moreover, the existing findings already challenge current concepts of disease classification (9) and point to some pathophysiological mechanisms, for example the involvement of calcium channels in bipolar disorder (CACNA1C).
As for the pathophysiological implications of the present finding, ZNF804A is presently a protein of unknown function. The amino acid sequence contains a C2H2-type domain characteristic of the classical Zinc-Finger (ZnF) family of proteins. These were originally identified as DNA binding molecules with a role in transcription, but proteins with this classic zinc finger domain are now known to interact with many other types of molecule, including RNA and protein, and in doing so, play many roles in cellular function (29). Until ZNF804A is functionally characterized, it is not possible to propose specific cellular processes that link the current genetic finding to disease risk.
At the whole-organism level, two recent studies have associated rs1344706 with variation in function. Esslinger and colleagues (2009) reported that the schizophrenia risk allele at this locus was associated with reduced connectivity both within dorsolateral pre-frontal cortex (DLPFC) and between the right and left DLPFC as indexed by the extent to which activation of those brain structures was temporally correlated during the N-back task (a probe for executive function). In contrast, connectivity was increased between the DLPFC and the hippocampal formation as well between a number of other structures. More recently, Walters and colleagues found that the schizophrenia risk allele at rs1344706 was associated with better episodic and working memory in individuals with schizophrenia, but not controls, a finding they replicated in a German sample. Given that schizophrenia is associated with reduced cognitive function, association with better function seems counter-intuitive. However, given the absence of association with better function in controls, rather than being associated with cognitive ability, ZNF804A may be associated with a subtype of schizophrenia in which cognition is relatively spared. This hypothesis is in keeping with association between this locus not just to schizophrenia but also to bipolar disorder (31). To what extent, if at all, the results from the study of Walters and colleagues reflect the observations of altered connectivity in the earlier study (30) is unclear.
Given that the function of the product of this gene is currently unknown, determining this is now a priority in understanding how this genetic association translates into pathophysiology. The identification of its binding partners, be they DNA, RNA or protein, will offer the opportunity to identify a set of further candidate genes, each of which will benefit from down-stream genetic analysis based upon a much higher prior probability than typical candidate genes for schizophrenia.
We would like to thank all the families that contributed to the sample collections we used. We also thank The MRC London Neurodegenerative Diseases Brain Bank, UK; The Stanley Medical Research Institute Brain Bank, USA; and The Karolinska Institute, Sweden, that supplied the post mortem brain tissue. This study makes use of control data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113. The UK research was supported by grants from the MRC, the Wellcome Trust and by a NIMH (USA) CONTE: 2 P50 MH066392-05A1.
The following authors are included under:
Molecular Genetics of Schizophrenia Collaboration
PV Gejman (Evanston Northwestern Healthcare and Northwestern University, IL, USA), AR Sanders (Evanston Northwestern Healthcare and Northwestern University, IL, USA), J Duan (Evanston Northwestern Healthcare and Northwestern University, IL, USA), DF Levinson (Stanford University, CA, USA), NG Buccola (Louisiana State University Health Sciences Center, LA, USA), BJ Mowry (Queensland Centre for Mental Health Research, and Queensland Institute for Medical Research, Queensland, Australia), R Freedman (University of Colorado Denver, Colorado, USA), F Amin (Atlanta Veterans Affairs Medical Center and Emory University, Atlanta, USA), DW Black (University of Iowa Carver College of Medicine, IA, USA), JM Silverman (Mount Sinai School of Medicine, New York, USA), WJ Byerley (University of California at San Francisco, California, USA), CR Cloninger (Washington University, Missouri, USA).
H Stefansson (deCODE genetics, Reykjavik, Iceland), S Steinberg (deCODE genetics, Reykjavik, Iceland), E Strengman (Universiteitsweg, Utrecht, The Netherlands), T Hansen (Copenhagen University Hospital, Roskilde, Denmark), HB Rasmussen(Copenhagen University Hospital, Roskilde, Denmark), O Gustafsson (University of Oslo, Oslo, Norway), S Djurovic (University of Oslo, Oslo, Norway), I Giegling (Ludwig-Maximilians-University, Munich, Germany), M Nyegaard (Aarhus University, Arhus C, Denmark), OP Pietiläinen (Institute of Molecular Medicine, Helsinki, Finland and Wellcome Trust Sanger Institute, Cambridge UK), A Tuulio-Henriksson (National Public Health Institute, Helsinki, Finland), E Sigurdsson (National University Hospital, Reykjavik, Iceland), H Petursson (National University Hospital, Reykjavik, Iceland), B Glenthøj (Copenhagen University Hospital, Glostrup, Denmark), G Jürgens (Bispebjerg University Hospital, Copenhagen, Denmark), I Melle (University of Oslo, Oslo, Norway), M Rietschel (University of Heidelberg, Mannheim, Germany), AD Børglum (Aarhus University Hospital, Risskov, Denmark), A Ingason (deCODE genetics, Reykjavik, Iceland), U Thorsteinsdottir (deCODE genetics, Reykjavik, Iceland), A Kong (deCODE genetics, Reykjavik, Iceland), P Muglia (GlaxoSmithKline R&D,Verona, Italy), LA Kiemeney (Radboud University, Nijmegen, The Netherlands), M Ruggeri (University of Verona, Verona, Italy), S Tosato (University of Verona,Verona, Italy), TE Thorgeirsson (deCODE genetics, Reykjavik, Iceland), O Mors (Aarhus University Hospital, Risskov, Denmark), PB Mortensen (Aarhus University, Aarhus, Denmark), I Bitter (Semmelweis University, Budapest, Hungary), EG Jönsson (Karolinska Institutet and Hospital, Stockholm, Sweden), S Cichon (University of Bonn, Bonn, Germany), MM Nöthen (University of Bonn, Bonn, Germany), OA Andreassen (University of Oslo, Oslo, Norway), V Golimbet (Russian Academy of Medical Sciences, Moscow, Russia), T Li (Institute of Psychiatry, London, UK), T Werge (Copenhagen University Hospital, Roskilde, Denmark), RA Ophoff (UCLA, Los Angeles, USA and University Medical Center Utrecht, Utrecht, The Netherlands), D St Clair (University of Aberdeen, Aberdeen UK), DA Collier (Institute of Psychiatry, London, UK), L Peltonen (Institute of Molecular Medicine, Helsinki, Finland and Wellcome Trust Sanger Institute, Cambridge, UK), D Rujescu (Ludwig-Maximilians-University, Munich, Germany) and K Stefansson (deCODE genetics, Reykjavik, Iceland).
Genetic Risk and Outcome in Psychosis (GROUP)
RS Kahn (Rudolf Magnus Institute of Neuroscience, Utrecht, The Netherlands), DH Linszen (Academic Medical Centre University of Amsterdam, Amsterdam, The Netherlands), J van Os (Maastricht University Medical Centre, Maastricht, The Netherlands), D Wiersma (University of Groningen, Groningen, The Netherlands), R Bruggeman (University of Groningen, Groningen, The Netherlands), W Cahn (Rudolf Magnus Institute of Neuroscience, Utrecht, The Netherlands), L de Haan (Academic Medical Centre University of Amsterdam, Amsterdam, The Netherlands), L Krabbendam (Maastricht University Medical Centre, Maastricht, The Netherlands) and Inez Myin-Germeys (Maastricht University Medical Centre, Maastricht, The Netherlands).
International Schizophrenia Consortium (ISC)
Michael C. O’Donovan (Cardiff University, Cardiff, UK), George K. Kirov (Cardiff University, Cardiff, UK), Nick J. Craddock (Cardiff University, Cardiff, UK), Peter A. Holmans (Cardiff University, Cardiff, UK), Nigel M. Williams (Cardiff University, Cardiff, UK), Lyudmila Georgieva (Cardiff University, Cardiff, UK), Ivan Nikolov (Cardiff University, Cardiff, UK), N. Norton (Cardiff University, Cardiff, UK), H. Williams (Cardiff University, Cardiff, UK), Draga Toncheva (University Hospital Maichin Dom, Sofia, Bulgaria), Vihra Milanova (Alexander University Hospital, Sofia, Bulgaria), Michael J. Owen (Cardiff University, Cardiff, UK), Christina M. Hultman (Karolinska Institutet, Stockholm, Sweden and Uppsala University, Uppsala, Sweden), Paul Lichtenstein (Karolinska Institutet, Stockholm, Sweden), Emma F. Thelander (Karolinska Institutet, Stockholm, Sweden), Patrick Sullivan (University of North Carolina at Chapel Hill, North Carolina, USA), Derek W. Morris (Trinity College Dublin, Dublin, Ireland), Colm T. O’Dushlaine (Trinity College Dublin, Dublin, Ireland), Elaine Kenny (Trinity College Dublin, Dublin, Ireland), Emma M. Quinn (Trinity College Dublin, Dublin, Ireland), Michael Gill (Trinity College Dublin, Dublin, Ireland), Aiden Corvin (Trinity College Dublin, Dublin, Ireland), Andrew McQuillin (University College London, London, UK), Khalid Choudhury (University College London, London, UK), Susmita Datta (University College London, London, UK), Jonathan Pimm (University College London, London, UK), Srinivasa Thirumalai (West Berkshire NHS Trust, Reading, UK), Vinay Puri (University College London, London, UK), Robert Krasucki (University College London, London, UK), Jacob Lawrence (University College London, London, UK), Digby Quested (University of Oxford, Oxford, UK), Nicholas Bass (University College London, London, UK), Hugh Gurling (University College London, London, UK), Caroline Crombie (University of Aberdeen, Aberdeen, UK), Gillian Fraser (University of Aberdeen, Aberdeen, UK), Soh Leh Kuan (University of Aberdeen, Aberdeen, UK), Nicholas Walker (Ravenscraig Hospital, Greenock, UK), David St Clair (University of Aberdeen, Aberdeen, UK), Douglas H. R. Blackwood (University of Edinburgh, Edinburgh, UK), Walter J. Muir (University of Edinburgh, Edinburgh, UK), Kevin A. McGhee (University of Edinburgh, Edinburgh, UK), Ben Pickard (University of Edinburgh, Edinburgh, UK), Pat Malloy (University of Edinburgh, Edinburgh, UK), Alan W. Maclean (University of Edinburgh, Edinburgh, UK), Margaret Van Beck (University of Edinburgh, Edinburgh, UK), Naomi R. Wray (Queensland Institute of Medical Research, Queensland, Australia), Stuart Macgregor (Queensland Institute of Medical Research, Queensland, Australia), Peter M. Visscher (Queensland Institute of Medical Research, Queensland, Australia), Michele T. Pato (University of Southern California, California, USA), Helena Medeiros (University of Southern California, California, USA), Frank Middleton (Upstate Medical University, New York, USA), Celia Carvalho (University of Southern California, California, USA), Christopher Morley (Upstate Medical University, New York, USA), Ayman Fanous (University of Southern California, California, USA and Washington VA Medical Center, Washington, USA and Georgetown University School of Medicine, Washington DC, USA and Virginia Commonwealth University, Virginia, USA), David Conti (University of Southern California, California, USA), James A. Knowles (University of Southern California, California, USA), Carlos Paz Ferreira (Department of Psychiatry, Azores, Portugal), Antonio Macedo (University of Coimbra, Coimbra, Portugal), M. Helena Azevedo (University of Coimbra, Coimbra, Portugal), Carlos N. Pato (University of Southern California, California, USA); Massachusetts General Hospital Jennifer L. Stone (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Douglas M. Ruderfer (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Andrew N. Kirby (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Manuel A. R. Ferreira (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Mark J. Daly (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Shaun M. Purcell (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Jennifer L. Stone (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Kimberly Chambert (The Broad Institute of Harvard and MIT, Massachusetts, USA), Douglas M. Ruderfer (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Finny Kuruvilla (The Broad Institute of Harvard and MIT, Massachusetts, USA), Stacey B. Gabriel (The Broad Institute of Harvard and MIT, Massachusetts, USA), Kristin Ardlie (The Broad Institute of Harvard and MIT, Massachusetts, USA), Jennifer L. Moran (The Broad Institute of Harvard and MIT, Massachusetts, USA), Edward M. Scolnick (The Broad Institute of Harvard and MIT, Massachusetts, USA), Pamela Sklar (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA).
The authors declare no conflicts of interest.