We have conducted a comprehensive gene-based association study of 327 genes on chromosome 20 in an Irish sample of 270 high-density schizophrenia families. This study sought to identify common variants conferring susceptibility to schizophrenia, following up reported linkage in this sample to clinical subtypes of psychotic illness
[16], as well as previous studies reporting linkage to chromosome 20. Because those clinical subtypes were derived from quantitative symptom dimensions, we also tested for association with these same dimensions. Although traditional single-marker tests failed to identify any SNPs meeting experiment-wide criteria for significance, application of gene-wide association metrics revealed two previously unimplicated loci,
R3HDML and
C20orf39, associated with depressive symptoms. Our findings support the power of gene-based association approaches. They also lend further support to previous evidence suggesting that genetic differences may underlie clinical heterogeneity in schizophrenia
[2],
[3].
One of the aims of this study was to identify genomic loci predisposing to a particular form of illness or which modifies clinical presentation amongst affected individuals. Such genes have been described previously as “modifier” or “susceptibility-modifier” loci and are reviewed elsewhere
[2]. Of the two loci showing the strongest associations, namely
R3HDML and
C20orf39, neither appears to affect the risk of the illness itself. That is, no single variant in either gene met even nominal significance criteria (
P<0.05) for association with narrow, intermediate, or broad diagnoses of schizophrenia. These two genes would therefore fulfill our definition of modifier genes
[2]. However, the strength of evidence we observed for
R3HDML is greater than that observed for
C20orf39.
R3HDML was identified by application of the minimum
P-value approach. Among affected individuals, those carrying the minor allele (G) of the corresponding SNP, rs3761184, had higher mean depression scores. On the other hand, for
C20orf39, empirical significance was attained using the truncated product of
P-values. This makes it more difficult to identify a specific risk genotype. This is because the truncated product method only considers all variation within a gene jointly. In , it is apparent that those markers contributing to the truncated product for
C20orf39 comprise a block of LD distinct from the surrounding region, with the majority showing association of the minor allele with higher depression scores. Whereas individually, none of the single-marker associations were significant after our permutation procedure, the degree of correlation between the SNPs may have been sufficient to produce an empirically significant association for
C20orf39 as a whole. In order to rule out a spurious gene-wise association due to higher LD, we analyzed a set of permutations using SNPSpD, then compared the distribution of estimated number of independent tests (SNPs) to that obtained for the actual data. If our gene-dropping simulations were found to consistently underestimate the extent of LD between adjacent markers—indicated by a larger number of independent tests—we would expect an inflation of the empiric test-statistic. Alternatively, if the observed LD within simulated datasets tended to overestimate pairwise LD, the corresponding distribution of truncated products would underestimate the empiric test-statistic. For
C20orf39, the observed SNPSpD estimate of ~ 26 tests was not found to differ significantly from the null distribution of simulated datasets, suggesting that our gene-dropping procedure was faithfully conserving LD-structure across our simulations. As discussed, increased gene-size, especially in the presence of higher LD between markers, might also contribute to over-estimation of the test statistic.
To our knowledge, neither
R3HDML nor
C20orf39 has been functionally characterized to date. Both are predicted genes identified on the basis of domain homology. The
R3HDML locus encodes a putative serine protease inhibitor belonging to the CRISP family of cysteine-rich secretory proteins, and contains evolutionarily conserved exonic and intronic regions bearing greater than 90% similarity to Rhesus macaque
[32]. Interspersed within the conserved intronic sequences are numerous stretches of simple tandem repeats (e.g. CG
n). Our SNP of interest in
R3HDML, rs3761184, falls just upstream (<50 bp) of the second exon and 150 bp downstream of one such repeat-rich region. Roles in fertilization, spermatogenesis, and pathogen response have all been proposed for CRISP proteins, but these mechanisms are not immediately supportive of
R3HDML as a schizophrenia candidate gene. However, recent implication of a number of HLA genes in large-scale GWAS suggest that genes involved in immune-related mechanisms, such as pathogen response, could be reasonable Scz candidates
[8]. The presence of specific sequence features in the vicinity of the associated SNP may warrant more thorough bioinformatic inquiry. Additionally,
R3HDML lies approximately 57 kb downstream of the
GDAP1L1 locus, which appears to encode a gluthionine S-transferase (GST). Cell-culture studies have demonstrated a relationship between gluthionine deficiency and oxidative stress, mechanisms frequently purported to contribute to schizophrenia pathophysiology
[33],
[34]. However,
GDAP1L1 was not significantly associated.
Our empirically significant finding for C20orf39 presents additional challenges for interpretation, given its provisional status as an “open reading frame”. Provisionally known as TMEM90B, this locus encodes a predicted transmembrane protein. Of 33 SNPs assayed within C20orf39, the nine included in the truncated product bounded a region of LD corresponding to the coding region of C20orf39. The upstream, untranslated region of C20orf39, which itself corresponds to a distinct set of ESTs, yielded no SNPs meeting local significance criteria. Whether the markers driving this association simply lie in joint linkage disequilibrium with nearby causal variation, or actually demarcate an etiologically relevant genomic region, is unknown.
Depressive symptoms, especially suicidal ideation, comprise a considerable portion of morbidity and mortality in schizophrenia
[35]. Therefore, follow up of these two genes could be important in the search for clues to more successful identification and treatment of this clinical dimension.
As demonstrated by Moskvina
et al., polymorphisms mapping to functional elements are more likely to be associated with complex disease than intergenic variation
[31]. Despite ongoing annotation and characterization of functional elements, however, our knowledge of genomic variation, functional or otherwise, remains incomplete. This is exemplified by
C20orf39 and
R3HDML, which are novel and unannotated.
A major benefit of gene-based approaches is that they are robust to allelic and haplotypic heterogeneity across samples. This makes them particularly suited for use in replication and meta-analysis. In traditional replication of single-marker associations, the associated SNP in the discovery sample is usually assayed in all subsequent replication samples. This could inflate Type-II error in the presence of population differences in haplotype structure and allele frequencies
[36]. Complex patterns of associations, whether spurious or due to genetic heterogeneity, have been more the rule rather than the exception in candidate gene studies of complex disease, as demonstrated by studies of
DTNBP1 [14],
[15]. For discovery-based approaches, adoption of a gene-based strategy may be of even more immediate benefit, specifically by providing a straightforward means of multiple-test correction. Furthermore, traditional methods to correct for multiple-testing, such as Bonferroni correction or the less overtly conservative SNPSpD method, may be less robust in detecting small genetic effects. However, in spite of the advantages of gene-based association studies intergenic causative variants or variants in unrecognized genes might have been missed in this study.
Given the poor spatial resolution of linkage and intrinsic differences between these methodologies, we are currently unable to fully relate our association findings with the results of our previously published linkage study of latent classes. However, it is notable that
R3HDML is located in a region which was linked to the “deficit syndrome” latent class, for which members were substantially more likely to fall below the median for depressive symptoms. Despite failing to demonstrate any evidence of association with a diagnosis of schizophrenia,
R3HDML may be associated with a disease subtype characterized by low levels of depression. Because subtyping precludes use of our full sample for association analysis, statistical power is insufficient to test this hypothesis. Other methods aiming to identify more clinically homogenous subgroups have been applied to linkage analysis of schizophrenia. In a study of 168 affected sibling pairs, Hamshere and colleagues
[37] demonstrated that inclusion of major depression as a covariate yielded suggestive evidence of linkage at 20q11.21, while schizophrenia as a whole did not. Taken together, these studies are compelling in their support of 20q11 harboring genes relevant to the affective component of schizophrenia. Emerging evidence supports a role for genetic variants conferring risk of both schizophrenia and bipolar disorder
[8],
[38]. Furthermore, genome scans of both disorders have consistently implicated regions of chromosome 20
[39]–
[44]. A recent study of 383 bipolar or schizoaffective relative pairs found suggestive linkage at 20q13.31 when conditioning on the presence of mood-incongruent psychosis, furthering the argument that chromosome 20 loci may have relevance to conditions containing admixtures of mood and psychotic symptoms
[45].
The findings presented here provide additional support to published findings suggesting that schizophrenia modifier loci may exist on chromosome 20 and, more generally, that genetic differences underlie clinical heterogeneity in schizophrenia
[46]. We await replication of the observed associations between these loci and either categorically defined illness or more or less distinct subtypes or clinical dimensions. There are two main limitations relevant to this study. First, the truncated product of
P-values is particularly sensitive to patterns of LD (unpublished results), since markers could be significant only due to their LD with other significant markers. Applied to family-based analysis of extended pedigrees, the validity of gene-based testing relies on the permutation method realistically maintaining LD across simulated datasets. As discussed, for
C20orf39, the LD structure for a random sample of simulated datasets did not differ significantly from the actual data (
P>0.05). Second, our analysis of multiple symptom dimensions may increase the Type-I error rate due to multiple testing. However, as we have previously shown, these dimensions are correlated
[47], making Bonferroni correction overly conservative. It remains unclear whether the failure of traditional approaches to detect experiment-wide significant loci reflects the spurious nature of these findings or simply the limited power of this sample. Ultimately, the genotype-phenotype correlations reported herein require confirmation in independent samples for which comparable symptom measures are available. We are unaware of other family-based schizophrenia samples in which OPCRIT data are readily available. However, this is likely to be attempted in case-control samples by the Psychiatric GWAS Consortium Cross-Disorders Group
[13].