|Home | About | Journals | Submit | Contact Us | Français|
A recent genomewide association study reported association between schizophrenia and the ZNF804A gene on chromosome 2q32.1. We attempted to replicate these findings in our Irish Case-Control Study of Schizophrenia (ICCSS) sample (N=1021 cases, 626 controls). Following consultation with the original investigators we genotyped 3 of the most promising SNPs from the Cardiff study. We replicate association with rs1344706 (trend test one tailed p=0.0113 with the previously associated A allele) in ZNF804A. We detect no evidence of association with rs6490121 in NOS1 (one tailed p=0.21), and only a trend with rs9922369 in RGRIP1L (one tailed p=0.0515). Based on these results, we completed genotyping of 11 additional LD-tagging SNPs in ZNF804A. Of 12 SNPs genotyped, 11 pass QC criteria and 4 are nominally associated, with our most significant evidence of association at rs7597593 (p=0.0013) followed by rs1344706. We observe no evidence of differential association in ZNF804A based on family history or sex of case. The associated SNP rs1344706 lies in ~30 bp of conserved mammalian sequence and the associated A allele is predicted to maintain binding sites for the brain-expressed transcription factors MYT1L and POU3F1/OCT-6. In controls, expression is significantly increased from the A allele of rs1344706 compared to the C allele. Expression is increased in schizophrenic cases compared to controls, but this difference does not achieve statistical significance. This study replicates the original reported association of ZNF804A with schizophrenia and suggests that there is a consistent link between the the A allele of rs1344706, increased expression of ZNF804A and risk for schizophrenia.
Schizophrenia (MIM 181500) is a severe psychiatric illness with complex etiology and lifetime prevalence of ~1% worldwide. Family, twin and adoption studies have consistently demonstrated an important genetic component to schizophrenia, together with developmental and environmental influences1,2.
The genomewide association study (GWAS), with up to 1 million single nucleotide polymorphisms (SNPs) genotyped, is currently the most powerful, systematic and unbiased genetic approach to the study of the common disease/common variant (CDCV) hypothesis of complex disorders like schizophrenia. Since the initial suggestion by Risch and Merikangas of the power of association methods for human diseases3, GWAS have successfully identified novel associations and/or replicated prior susceptibility genes for a number of complex traits, including age-related macular degeneration4, Type 15 and Type 26–11 diabetes, episodic memory12, Crohn’s disease13–16, prostate cancer17–19 and obesity20.
So far, 3 independent schizophrenia GWAS using individual genotyping have been published21–23. Sample sizes (and power) are modest given the range of effect sizes and allele frequencies expected. These studies have not provided either convergence between new loci identified or strong support for loci previously identified (for example, through positional cloning). Direct replication studies of current GWAS findings have provided mixed results.
The first of these used a sample of 178 cases and 144 controls, assessed 500K SNPs and reported one experiment-wide significant association at rs4129148 (P= 3.7×10−7) in the X/Y pseudoautosomal region near the CSF2RA and IL3R genes21. Cytokines have been suggested as possible candidate genes previously, and one replication attempt supported association in IL3R24. The second used the CATIE25 sample of 738 cases and 733 controls and reported no genomewide significant results in stage 1 analysis22. The third used a multi-stage design of discovery in a newly genotyped sample of 479 cases compared with existing data from the 2937 UK controls used in the Wellcome Trust Case/Control Consortium studies26 and targeted replication in 6666 cases and 9897 controls. This study reported consistent evidence of association with 1 SNP in the zinc-finger protein ZNF804A gene23. Although the discovery sample in this study was modest, it is drawn from an Anglo-Celtic population (from Wales and the UK) broadly similar to the population from which we have sampled cases and controls (from Ireland and Northern Ireland) and the total sample size was substantial due to the use of existing control data. We therefore set out to replicate the findings from the Cardiff study in our Irish case/control sample.
Cases were ascertained from in-patient and out-patient psychiatric facilities in Ireland and Northern Ireland. Cases completed the same diagnostic instrument used in our prior family study, the Structured Clinical Interview for DSM III-R, Patient version with an expanded psychosis section27; DSM-III-R criteria were used for consistency with our family collection28. Detailed personal interviews and hospital record rating forms were completed for each proband. Subjects with a field diagnosis of schizophrenia or poor-outcome schizoaffective disorder were eligible if all four grandparents were born in Ireland or the UK. The ICCSS sample includes 1021 cases.
Controls (N=626) were recruited from donors at the Northern Ireland Blood Transfusion Service (N=554) and from the Irish national police (N=38) and army reserve (N=34). Controls were briefly screened and were eligible if they reported no history of schizophrenia and all four grandparents born in Ireland or the UK. All participants gave appropriate informed consent. Recorded sex was verified by X/Y genotypes; cases were 68% male and 32% female, controls were 55% male, 45% female. This sample has ≥78% power to detect effects with minor allele frequency (MAF) ≥20% and allelic odds ratio (OR) ≥1.3.
Ireland is relatively isolated at the North Western extreme of Europe, which may be advantageous in investigating a complex genetic disorder such as schizophrenia. The current Irish population, though not a genetic isolate, is more homogeneous than the US population29–31. Fewer ancestral haplotypes increase LD and the power to detect association. Studies of Y chromosome and mtDNA point to a common genetic legacy in Ireland that probably extends back to the repopulation of the island, after the last Ice Age approximately 15,000 years ago, from population centres in the Iberian peninsula and south western France32–34. The genetic structure of the population has been minimally influenced by more recent human migrations over the last three millennia34. In the experience of the blood bank staff, non-Irish donors are very rare, in agreement with the history of minimal in-migration to Ireland, and would have been excluded on the basis of questions about their grandparents.
Family history research diagnostic criteria (FH-RDC)35 were assessed by having affected participants complete the FH-RDC interview36, the best validated instrument available, to report on psychotic illness in relatives. Detailed study of the reliability, sensitivity and specificity of the FH-RDC versus personal interview of first-degree family members in the Roscommon Family Study37 showed that between informants, the kappa±SE was 0.49±0.02; between FH-RDC and best-estimate diagnoses, the kappa±SE was 0.43±0.02. Validated against the best-estimate diagnosis (which included personal interview and hospital records in most cases), the sensitivity was 0.37 and the specificity was 0.996. The FH-RDC criteria38 and instrument36 thus miss true illness in relatives but almost never produce a false positive.
Family history information is available for 739 cases (72.4%). We defined positive family history (FH+) by report of 1 or more first-degree relatives with schizophrenia or unspecified functional psychosis. We restrict the definition to first-degree relatives because we expect greater reliability of reporting for immediate family members. Unspecified functional psychosis required positive report of one or more specific symptoms (delusions, hallucinations, incoherence, or bizarre behavior) not due to a mood disorder. Under this definition, there are 196 FH+ cases. We compare these to the most conservatively defined 478 FH− cases reporting no psychotic illness in either first or second degree relatives. This approach has ≥65% power to detect differences between the FH+ and FH− subsamples assuming the same parameters as in our case/control power calculation (MAF ≥20%, allelic OR ≥1.3 and alpha=0.05).
Following discussion with the Cardiff group, we genotyped 3 of the most promising SNPs from the report of genomewide association in the Cardiff sample, rs1344706 in ZNF804A, rs6490121 in NOS1 and rs9922369 in RGRIP1L. Based on the results in these markers, we then completed genotyping of 11 additional LD-tagging SNPs (tSNPs) in ZNF804A. We first manually included rs1344706 (associated in the original study) in the test alleles set and then selected additional tSNPs using the TAGGER39 algorithm with default criteria of r2≥0.8 and minor allele frequency (MAF)≥0.2 as implemented in Haploview 4.040 using HapMap data release 22/Phase II, Apr 07. Including rs1344706, we genotyped a total of 12 SNPs in ZNF804A. Markers are shown in Table 1; orientation and alleles are reported on the genomic (+) strand (rs1344706 is reported here as A/C, not T/G as in the original report23).
All markers were genotyped with Taqman Assays-on-Demand (Applied Biosystems, Foster City, CA). To ensure uniformity and accuracy, all reaction steps were performed using the Eppendorf 5075 automated liquid handling platform. Genotypes were called using an automated allele scoring platform41. For quality control, we exclude any individual with >50% missing genotypes. We analyzed duplicate genotype data for 35 duplicate pairs (11 cases and 24 controls) and use discordant genotypes in these duplicates to estimate genotyping error rates.
Single marker analyses were performed in SAS42 and followed the original study in assessing the Cochran-Armitage trend test. For GWAS data, the trend test is widely thought to be the most robust and broadly appropriate primary hypothesis test currently available and here it also provides the closest possible test of replication. Previous work demonstrated 1) validity of the trend statistic in the presence of Hardy Weinberg disequilibrium, and 2) asymptotic equivalence of allele and genotype-based trend statistics when Hardy Weinberg equilibrium holds43. We present uncorrected results from one-tailed tests of the three direct replication SNPs (rs1344706 in ZNF804A, rs6490121 in NOS1 and rs9922369 in RGRIP1L). We treat the remaining analyzed SNPs in ZNF804A (10 after one SNP dropped during QC) as independent (based on their selection as tags) and apply Bonferroni correction for 10 tests.
Haplotype analyses were performed assessing Χ2 in Haploview v4.040. The Χ2 is also appropriate given the ethnic homogeneity of our sample and the absence of observed substructure in the Irish population44. We also use HAPLOVIEW to identify significant (p<0.001) departures from Hardy-Weinberg Equilibrium (HWE) and compare linkage disequilibrium (LD) against HapMap data. Haplotypes were analyzed within blocks defined by the default confidence-interval method45. In one additional analysis undertaken on the basis of single marker association results, we analyzed rs7597593 (not included in any confidence-interval defined block) with the confidence-interval defined SNP set that includes rs1344706. We assessed empirical significance of haplotype results with 5000 permutations of case and control status.
To assess potential gender differences in the association signal, we tested for SNP × Sex interaction using logistic regression, coding each SNP in an additive framework, including gender and a SNP × Sex interaction term. To test for differences in evidence for association between sets of cases defined by FH status, we stratified the sample by the presence (FH+) or absence (FH−) of first degree relatives with schizophrenia or unspecified psychosis as described above (1st degree broad definition). Genotypes at rs1344706 were missing for 23 individuals with FH data, and We tested for allelic (1df) association by assessing Χ2 between FH+ and FH− cases. Both analyses were performed using SAS42.
The Stanley Medical Research Institute (SMRI) provided genomic DNA and postmortem brain tissue samples from dorsolateral prefrontal cortex (Brodmann’s area 46) from 35 individuals with schizophrenia (SCH) and 35 controls (CON). Exclusion criteria for these samples included: (1) significant structural brain pathology, (2) history of pre-existing central nervous system disease, (3) poor RNA quality, (4) documented IQ < 70, (5) age less than 30 and (6) substance abuse within one year of death46. RNA was isolated using the Ambion miRvana kit (Applied Biosystems, Foster City, CA) according to manufacturer’s instructions. We obtained poor yields of RNA from 1 CON sample, which was excluded from further study, leaving 35 SCH and 34 CON samples.
cDNA was reverse transcribed using the Ready-To-Go Kit from Amersham Bioscience (Piscataway, NJ) following manufacturer’s instructions. ZNF804A expression was analyzed on the StepOne Plus instrument (ABI, Foster City, CA) using a Taqman quantitative real-time PCR expression assay (ABI, Foster City, CA). Each sample was assayed in triplicate and mean values from the triplicates were used for all analyses. Gene expression was normalized against two reference genes47 (HPRT and TBP), and analyzed by the 2−ΔΔCT algorithm48. The 2−ΔΔCT algorithm, widely used to compare differences in gene expression, is valid only if PCR efficiency of studied genes is equal48. PCR efficiencies (measured by standard curve) of all three genes assayed were very similar (ZNF804A: 91%, HPRT: 93%, TBP: 90%).
To achieve a normal distribution, ZNF804A expression values were log transformed, and normality of the transformed data was verified by the Anderson-Darling empirical distribution function test49,50, a powerful statistic to detect departure from normality even in a relatively small (N≤100) sample. Samples (N=7, 3 SCH and 4 CON) with values greater than ±2SD from the group mean were omitted from further analyses, reducing the totals to 32 SCH and 30 CON. Mean expression levels in SCH and CON groups were compared using Welch’s corrected unpaired t-test. The potential confounding effects of diagnosis, age, gender, pH, post-mortem interval, refrigerator interval, smoking and drug abuse in the analysis of ZNF804A expression were assessed by analysis of variance (ANOVA).
We assessed sequence around SNP rs1344706 in the UCSC Genome Browser (http://genome.ucsc.edu). Based on evidence of conservation, we sought to identify transcription-factor bindingsites altered by the alternative rs1344706 alleles. A 53 bp sequencesurrounding rs1344706 (chr2:185,486,646-185,486,698) was analyzed using MatInspector v126.96.36.199 and SNPinspector v2.251 (http://www.genomatix.de/). In both cases the vertebrate transcription factor matrices and optimized matrix threshold were used to reduce the incidence of false-positives.
Based on the results of these predictions, the SMRI CON samples were genotyped for rs1344706 for analysis of expression differences due to variation at this position. Genotypes are collected exactly as described above using the same instrumentation and reagents.
For the allelic test, expression values were binned by rs1344706 allele and the means of the bins were then compared52. Each individual’s expression value is used twice: for homozygotes, twice in the same bin, for heterozygotes, once in each. As a result, the variation in the homozygotes may be less than that in heterozygotes, and we therefore compare the two group means using the unpaired t-test with Welch’s correction to account for possible heteroscedastic variances.
A total of 28 case (2.7%) and 56 (8.9%) control samples were excluded from analysis with >50% missing genotypes. All subsequent results are based on a total of 1563 samples (993 cases and 570 controls). Of these, 1499 (95.9%) are missing 2 or fewer genotypes (1270 samples (81.3%) missing zero, 195 (12.5%) missing one and 36 (2.3%) missing two genotypes). Higher missing data rates (3–7 genotypes) were present in 62 samples (4.0%).
The rates of sample exclusion due to missing genotypes are higher than usual due to technical issues during genotypic data collection. Degradation of a filter in a fluorescence reader reduced signal at the detector, thus artificially increasing the minimum signal necessary for detection. This attenuation of signal did not affect robustly performing samples, but did cause a higher fail rate among more weakly amplifying samples. Prior to exclusion of poorly genotyped samples, average genotyping completion by marker was 93.5% (90.5–96.4%); following exclusions, it was 97.3% (95.2–99.2%). Following identification of the problem and replacement of the filter, 4 SNPs (rs13393273, rs7597593, rs1344706 and rs4667001, including the two most strongly associated) were completely regenotyped. This repeated data collection yields a higher genotyping completion rate (mean 98.5%) and a low rate (0.52%) of discordant genotypes for these four markers.
We genotyped 14 SNPs × 35 samples in duplicate (N=490 genotype pairs); both genotypes were available for 465 (94.9%), of which 0 were discordant, estimating our overall genotyping error rate at less than 1 in 465 (<0.21%). All markers except rs1480481 (p=1.13×10−10 and not further analyzed) satisfied our HWE cut-off for inclusion. Linkage disequilibrium patterns in this sample are virtually identical to those in HapMap data (Figure S1).
Results of single marker analyses are shown in Table 1. In the data from ZNF804A, rs1344706 (associated in the original report in discovery and combined replication samples) is also associated in the ICCSS (one-tailed p=0.0113). The same allele (A) is increased in frequency in cases compared to controls as in the original study. In total, we observe nominal evidence of association between schizophrenia and 4 of the 11 SNPs passing QC criteria. The T allele of rs7597593 yields substantially more significant evidence of association (two-tailed p=0.0013), but this SNP lacks the evidence of possible functional significance observed for rs1344706 (see below). The single marker results from rs7597593 (p=0.0013) remain significant after Bonferroni correction for 10 tests. Allele and genotype counts are shown in Table S1.
No significant difference between cases and controls was observed for rs6490121 in NOS1. Results for rs9922369 in RGRIP1L show only a trend towards significance (one-tailed p=0.0515), with the same allele as originally reported observed to be more common in cases in our study. Allele and genotype counts are shown in Table S1.
Results of haplotype analyses within confidence interval defined blocks are shown in Table 2. Compared to single marker analyses, we observe an increased association in block 3 (containing rs1344706) on a haplotype (frequency 0.188 in controls) bearing the common alleles of rs7605689 (T) and rs3931790 (T), the rare allele of rs7603001 (A) and the previously associated common (A) allele of rs1344706 (Χ2= 8.626, p<0.003). The increased significance is due to a larger case/control difference (5%) in the frequency of this TTAA haplotype compared to the difference for the A allele of rs1344706 (4%).
In a single targeted secondary haplotype analysis, we included the associated SNP rs7597593 with the 4 markers in the confidence interval defined block 3, containing rs1344706. We observe a further increase in association for the haplotype defined by the associated allele of rs7597593 (T) in conjunction with the TTAA haplotype associated above. This comparison provides the largest observed haplotypic frequency difference between cases and controls (5.2%), and is the only haplotypic comparison to remain significant after permutation testing (P=0.017).
In logistic regression tests of SNP × Sex interaction (Table S2), 8 markers in ZNF804A yield nominally significant evidence of association including all 4 that were associated in primary analyses (rs17508595, rs13393273, rs7597593 and rs1344706). The SNP × Sex interaction term was not significant for any marker tested, providing no evidence for differential association depending on the sex of the case.
Based on the single marker results above, we limited FH analyses to the most significantly associated markers, rs7597593 and rs1344706 (Table S3). Using the categorical definitions described in Methods, there are 196 FH+ and 478 conservatively defined FH− cases (no illness reported in first or second degree family members) in the ICCSS sample. A genotype at rs1344706 was missing for 23 individuals with family history (15 FH− and 8 FH+), leaving a total analyzed sample N=651 (1302 alleles), 463 FH− and 188 FH+. There is no evidence of any difference in association of rs1344706 and rs7597593 between FH+ and FH− cases (Table S3).
We assessed gene expression differences for ZNF804A in RNA from the prefrontal cortex from the SMRI SCH and CON samples. We excluded 3 SCH and 4 CON outlier samples, as described above, leaving 32 SCH and 30 CON for analysis. Expression was higher in SCH compared to CON samples, but this difference did not achieve statistical significance (Welch-corrected t=1.640, df=52, p=0.107, Figure 1A).
Inspection of the sequence around SNP rs1344706 shows that it lies in a short region of conserved mammalian sequence in intron 2 of ZNF804A. MatInspector results for the 53 bp of sequence we assessed (chr2:185,486,646-185,486,698) include the A allele of rs1344706 in the predicted binding sites for two brain-expressed transcription factors, Myt1L53,54 and POU3F1/Oct-655, both implicated in multiple CNS developmental processes, particularly in oligodendrocytes54,55. Two additional binding sites are predicted in the presence of the C allele. One of these is for the ubiquitously expressed Homez56 and the other is for Hmx2, which, while expressed in the CNS, is thought to be primarily involved in the development of the inner ear57–59 (Figure S2).
The SMRI postmortem CON samples were genotyped for rs1344706. Allele frequencies in the N=30 CON samples were A: 32/60, 0.533; C: 28/60, 0.477, and were in good agreement with data from the HapMap (A: 0.567; C: 0.433) and Irish (A: 0.607; C: 0.393) samples. Expression is significantly higher for the associated A allele (t=2.129, df=39, p=0.033, Figure 1B). The observed difference was not due to effects of potential confounder variables assessed by ANOVA (age, post-mortem interval, refrigeration interval, brain pH or smoking, F=0.829; df=12; p=0.62).
We studied ZNF804A in a case/control sample appropriate for replication of the original results. First, our sample was drawn from Ireland and should have substantial genetic overlap with the Anglo-Celtic discovery sample. Second, our ascertainment and diagnosis methods were highly homologous with those used to collect the discovery sample. Third, the ICCSS sample has substantial power to detect genetic effects.
Our study provides further support for the association of schizophrenia with common variant alleles and haplotypes in ZNF804A. We replicate the previously reported observation of genetic association between schizophrenia and the A allele of rs1344706. In an extended set of tSNPs, we also observe association with the minor T allele of rs7597593, with a haplotype containing the A allele of rs1344706, and with a haplotype containing the A allele of rs1344706 and the T allele of rs7597593. Only the single marker results from rs7597593 remain significant after Bonferroni correction, and only the results from the longer haplotype including rs7597593 remain significant after permutation testing. LD and haplotypes agree closely with prior observations in other European-descended samples. There is no evidence for differential association with ZNF804A between male and female cases or between cases with and without a positive family history of schizophrenia-like illness.
Although it seems unlikely that SNP rs1344706 directly increases risk for schizophrenia, the association and expression data collectively are consistent with possible functional effects at rs1344706 itself. The A allele shows significantly higher expression than the C allele in control dorsolateral prefrontal cortex. The A allele is associated with schizophrenia, and expression in schizophrenic case dorsolateral prefrontal cortex is increased relative to that in controls, although this last difference does not achieve statistical significance.
Bioinformatic analysis of the source of the mammalian conservation around rs1344706 suggests it may be due to the presence of transcription factor binding sites. The alleles of rs1344706 result in differential prediction of the presence of binding sites for two brain-expressed transcription factors. The Myt1L zinc finger protein is expressed in neural progenitors along with the related family member Myt1 (which is known to modulate proliferation and terminal differentiation of oligodendrocyte progenitors54.) Of note, the MYT1L gene was included in a copy number variant observed in an affected individual in a recent study of structural variation in schizophrenia. The POU3F1/Oct-6 POU-domain transcription factor is involved in oligodendrocyte differentiation55 and the transition of pro-myelinating to myelinating Schwann cells60, and is also normally expressed in adult cortex and hippocampus61. Those transcription factors predicted to bind to the C allele of rs1344706 are widely expressed and not brain-specific (Homez) or involved primarily in the development of the inner ear, and so seem potentially less meaningful in this context.
The pattern of association in published GWAS findings for schizophrenia (and particularly their replication) has been highly variable. Although these studies have shown little support for the results of either other GWAS or prior studies, sample sizes in many cases have not been adequate to deliver reasonable power. In this case, although the discovery sample was somewhat small, the replication sample has reasonable power to detect effects in the ranges modeled. Additional results from a number of large, better powered studies and from meta-analyses of more than 10,000 cases and controls (which have substantial power to detect effects of the kind expected in schizophrenia) are expected in the near future. These results will be critical in determining which of the early results are widely supported. In conclusion, however, our results from the present study of ZNF804A and schizophrenia in a large, ethnically homogeneous sample add further support for the association of the A allele at rs1344706 with schizophrenia.
This work was supported by the National Institute of Mental Health Grant R01-MH41953 to KSK/BR. AF was supported by a Department of Veterans Affairs Merit Review award. We thank X. Chen for genotyping macros and sex determination data and the Northern Ireland Blood Transfusion Services for their invaluable assistance in control sampling.
CONFLICT OF INTEREST: The authors declare they have no biomedical financial interests or potential conflicts of interest.
Supplementary information is available at the Molecular Psychiatry website.