To explore the role of genetic variation in DISC1 and 10 of its interaction partners in the etiology of schizophrenia, we sequenced the coding exons and splice junctions of the genes using massively parallel 454 sequencing in pooled samples. A selection of 80 early onset SZ patients and 80 control individuals was used as a discovery sample, resulting in the identification of 50 validated variants (). These 50 variants were subsequently genotyped in the complete association samples comprising 486 SZ patients and 514 control individuals recruited from an isolated northern Swedish population.
Six variants were found with a statistically significant frequency difference between patients and controls. These include two synonymous mutations, NDE1 Y279Y (
0.025) and TRAF3IP1 T23T (
0.006). Although the functional consequences of silent mutations may not be obvious, recent studies have shown that these variants might modify protein abundance, structure and/or activity via alterations in mRNA stability 
, splicing 
, or translation kinetics
. The other variants showing a significant effect in two or more of the inheritance models all involve UTR or splice site mutations: ATF5
rs2273890) and NDE1
rs2075511). Interestingly, ATF5
r.871G>A is located in a potential miRNA target site, and may therefore interfere with ATF5 expression and function. However, as none of these effects survived multiple testing correction, further independent replication of these findings will be required.
The observed scarcity of statistically significant main effects in our data set does not necessarily rule out the involvement of the DISC1 pathway in the susceptibility for SZ in our population, but may (at least partly) be attributed to the relatively high occurrence of rare variants, having frequencies too low to be able to run adequate statistical comparisons. Indeed, over 35% of all identified coding variants (18/50) have a MAF below 1%, and 50% (25/50) were present at a MAF smaller than 2%. Though the frequencies of these rare variants were not significantly different between patients and controls, they often have odds ratios higher than 1.5 (resp. lower than 0.67), and several are unique in one or the other group (, , ). As for DISC1
, none of rare variants identified here overlapped with the 5 ultra-rare cohort-specific variants previously described by Song et al. 
. Of the two rare variants we identified in this gene, one (W160L) was completely novel, and the other (E751Q) was also reported as rare by Song and colleagues (MAF<1%). Interestingly, this variant also had an OR of ~2 in their population, analogous with our results ().
To understand the potential role of the identified rare variants in SZ etiology, we evaluated the mutation burden (defined as the average number of mutations per person) in patients versus controls. Under a model in which rare mutations increase risk, we would expect to observe a greater burden in patients compared to control individuals. There was no difference in overall mutation burden between the genotyped patients and control individuals. Yet, when mutation burden was defined not simply as the total number of variants – including neutral polymorphisms – but evaluated as subgroups of variants (based on MAF and variant type), we found that schizophrenia patients were 1.85 times as likely as controls to harbor rare variants (MAF<0.01) causing amino acid substitutions (including frameshifts) (empirical P
0.018, ), indicating a role for these variants in SZ etiology in at least the northern Swedish population. Though this effect was no longer significant after stringent Bonferroni correction, it pointed us towards an even stronger effect in a subgroup of patients. Indeed, the observed increased burden of rare missense mutations seems to be related to the age at disease onset, being most pronounced in patients with a young onset age, having a 2.75-fold higher burden of rare missens variants compared to controls (empirical P
0.0004, Bonferroni corrected P
0.0076) (). This observation is in line with previous clinical, cognitive, genetic and imaging studies, implicating that early onset SZ is associated with greater genetic loading 
, overabundance of rare CNVs impacting on known genes 
and increased neurodevelopmental deviance 
, amongst others. These data emphasize the importance of studying subgroups of patients and identifying endophenotypes.
In addition, our findings support the hypothesis that multiple, individually rare mutations contribute to SZ risk 
and, given the distribution of the variants across different genes, also explain the allelic and locus heterogeneity typically observed in SZ. Replication of our findings in larger sample sets will however be required to further substantiate the observed effects.
Detailed analysis of the variants contributing to the increased burden, showed that 8 out of 9 identified rare non-synonymous mutations in this study had an increased abundance in patients versus controls. These 8 mutations are located in DISC1
(2 mutations), PDE4B
(1 mutation), ATF5
(1 mutation), TRAF3IP1
(3 mutations) and ZNF365
(1 mutation) (Table S4
). Though not statistically significant on a single gene level, each of these genes causes an individual increase in mutation burden with a factor ~2 in patients versus controls (average fold increase 2.24±0.51). Taking into account the number of coding bases in these genes, DISC1
were found to have highest mutation burden per base (Table S4
), and may thus be considered the strongest candidates for further detailed mutation analysis in a larger sample. Indeed, only a subset of our patient population (80 individuals out of 486) was sequenced in this study, enabling the detection of merely a fraction of all rare variants present in this population (see Supporting Text S1
section). Follow-up sequencing of the candidate genes in the complete patient sample may therefore uncover other rare (non-synonymous) mutations, possibly further contributing to the observed differences in mutation burden.
In order to estimate the potential risk associated with the identified missense variants, a range of protein structural and functional properties was investigated. Rather unexpectedly, we found that none of the 22 identified missense variants caused any significant effect on the various properties examined. This absence led us to the observation that 8 of 11 proteins under study showed a remarkably high occurrence of intrinsically disordered regions (IDRs). Indeed, all proteins except LIS1, GRB2 and YWHAE were found to have ≥40% of disordered residues by DisProt analysis (, panel A). Furthermore, we observed that ~90% of the identified missense variants were located in these IDRs (Figure S4
), while neither PAFAH1B1
contained a single missense variant.
IDRs are segments of proteins that do not definitively fold and remain flexible and unordered. These proteins take up different structures upon binding to different targets, and thereby exhibit functional flexibility 
. Disordered regions of proteins have been shown to have important physiological roles, including molecular recognition, cell regulation and signal transduction 
. It is therefore not surprising that protein disorder turns out to be very common in human diseases – being significantly enriched in a wide variety of disease-associated proteins, including neurodegenerative disease, cancer, cardiovascular disease and diabetes 
. Furthermore, it has been shown that IDRs are particularly prevalent in hub proteins and interaction networks, where their conformational flexibility is required to accommodate binding between the different interaction partners 
. Interestingly, our analyses revealed that this is also the case here, with the DISC1 pathway proteins clearly exhibiting a higher abundance of intrinsically disordered residues, compared to the human proteome , as well as a set of brain and schizophrenia-related protein sequences (p
0.018; 0.013 and 0.0098, respectively) (, panel B). This is an exciting new insight, which – to our knowledge – has never been reported in the literature, and may provide a new boost to the complex research field of psychiatric genetics. While alterations of disordered regions may not directly cause changes in protein structure, they are very well capable of interfering with the function of proteins
, e.g. by affecting the affinity for interaction with other proteins, or altering the coupled binding-folding mechanism of (one of) the binding partners. Importantly, it has been shown that intrinsically disorder is very sensitive to changes in amino acid sequence; as recently described 
, maintaining disordered regions through evolution (or sequence changes) appears very difficult, whereas helices and strands are maintained more easily. Neutral mutations with respect to disorder are therefore very unlikely 
. Certainly in a complex network, such as the DISC1 pathway, it is very well conceivable that mutations and/or changes in one of the proteins or its environment could reduce its ability to recognize appropriate binding partners and lead to partial or complete collapse of the protein network.
In this study, ~90% of all identified missense variants (including the rare mutations underlying the increased burden in patients versus controls) are located in an IDR. Interestingly, some of the (rare) variants indentified in this study fall into known binding regions on one or more of the interactors (Figure S3
). E.g., ATF5 R167C is located in the DISC1 binding region of this protein; DISC E751Q resides in the binding sites for ATF5, LIS1 and PDE4B; and TRAF3IP1 E260K is located in the DISC1 binding region of this protein. Although these observations are certainly very intriguing, they should be regarded with some caution, as the reported binding regions between the different interactors are often quite large, hence no clear conclusions can be drawn from them. Moreover, as not all of the binding regions for the different interactions have been described in literature, it is impossible to give a complete picture of this. The question whether one (or more) of these mutations might influence protein (or even pathway) function, by interfering with any of the key features associated with IDRs, will be one of the major challenges for future work. A first clue about potential effects of some of the variants may be provided by their amino acid conservation (, , ). Based on evolutionary conservation scores generated by 3 different algorithms, we found that three variants were predicted to be possibly damaging: ATF5
rs6675281) and S704C (
rs821616). Interestingly, two of these variants (DISC1
S704C and L607F) were recently reported to have an actual functional effect 
. The fact that the predicted outcome for DISC1
L607F and S704C corresponds to already known biological consequences greatly underlines the value of our in silico
predictions, also for other, unknown variants. This is especially interesting as to ATF5
R167C, which was also predicted to be damaging, but not previously reported. This variant corresponds to a novel, rare mutation, having an odds ratio of 2.6 (95% CI: 0.50–13.46). Further studies of this variant are warranted to clarify its relation to disease.
To our knowledge, this study is the first describing a comprehensive resequencing analysis of the DISC1 pathway in schizophrenia. Our results provide support for a model of SZ pathogenesis that includes the effects of multiple rare variants, residing in different vulnerable genes, which may in turn be functionally linked into pathways and networks. This model is consistent with the theory presented by Eyre-Walker 
, stating that rare alleles should explain most of the variance in complex traits if there is natural selection for the trait. Based on these findings, and as also suggested by McClellan and co-workers 
, we argue that rare risk alleles may be revealed by research strategies including extensive resequencing of genes previously shown to be informative (e.g. based on a chromosomal translocation, such as DISC1
) and, importantly, these genes' functional network.
Assigning potential functional significance to identified variants is a major challenge in genomics research. In this work, a wide array of functional properties was examined to predict possible deleterious effects of the variants. Using these tools, we were able to predict several potential effects on splicing and miRNA target motifs. Yet, alterations of protein structure or function were hard to track down using standard in silico
prediction programs, as a majority of the proteins encoded by our candidate genes contain large regions of intrinsically disordered residues. Though amino acid conservation analysis may provide a first hint of potential functional effects, it does not tell the whole story, as disorder-based signaling is a complex process, depending on multiple factors including alterations in protein context, alternative splicing and post-translational modifications 
. However, in our opinion, this high prevalence of IDRs in the DISC1 pathway is a very fascinating finding in se
, hopefully encouraging further research into this complex area, and providing new clues to our understanding of the complex etiology of SZ and other (psychiatric) disorders. Indeed, as an increasing amount of evidence is beginning to emerge that many important biological functions depend directly on the disordered state, alteration of this disorder may play a crucial role in the pathogenicity of many complex diseases (including SZ), thereby adding another level of complexity to the study of their molecular mechanism, and providing exciting new perspectives for future research.