|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: LNM DG JDF. Performed the experiments: LNM SDZ ASL. Analyzed the data: LNM JR IMC KVS JDF. Contributed reagents/materials/analysis tools: PDR MVDB WG AN LGN KFN RA. Wrote the paper: LNM PDR JR KVS JDF. Designed the software used in analysis: PDR.
In recent years, DISC1 has emerged as one of the most credible and best supported candidate genes for schizophrenia and related neuropsychiatric disorders. Furthermore, increasing evidence – both genetic and functional – indicates that many of its protein interaction partners are also involved in the development of these diseases. In this study, we applied a pooled sample 454 sequencing strategy, to explore the contribution of genetic variation in DISC1 and 10 of its interaction partners (ATF5, Grb2, FEZ1, LIS-1, PDE4B, NDE1, NDEL1, TRAF3IP1, YWHAE, and ZNF365) to schizophrenia susceptibility in an isolated northern Swedish population. Mutation burden analysis of the identified variants in a population of 486 SZ patients and 514 control individuals, revealed that non-synonymous rare variants with a MAF<0.01 were significantly more present in patients compared to controls (8.64% versus 4.7%, P=0.018), providing further evidence for the involvement of DISC1 and some of its interaction partners in psychiatric disorders. This increased burden of rare missense variants was even more striking in a subgroup of early onset patients (12.9% versus 4.7%, P=0.0004), highlighting the importance of studying subgroups of patients and identifying endophenotypes. Upon investigation of the potential functional effects associated with the identified missense variants, we found that ~90% of these variants reside in intrinsically disordered protein regions. The observed increase in mutation burden in patients provides further support for the role of the DISC1 pathway in schizophrenia. Furthermore, this study presents the first evidence supporting the involvement of mutations within intrinsically disordered protein regions in the pathogenesis of psychiatric disorders. As many important biological functions depend directly on the disordered state, alteration of this disorder in key pathways may represent an intriguing new disease mechanism for schizophrenia and related neuropsychiatric diseases. Further research into this unexplored domain will be required to elucidate the role of the identified variants in schizophrenia etiology.
Schizophrenia (SZ) is a severe psychiatric disorder affecting ~1% of the population worldwide. The disease is characterized by positive symptoms, including hallucinations, delusions and disturbances in thoughts, as well as negative symptoms such as lack of motivation and attention, asocial behavior and cognitive dysfunction. With a heritability of ~60–80% , SZ has a clear genetic component, but despite major efforts during the last decades to identify genetic risk factors, only a handful of candidate genes could be replicated independently – and even a smaller number demonstrated clear biological support.
One recent and promising exception is Disrupted in Schizophrenia 1 (DISC1), which was originally identified via a balanced t(1;11) chromosomal translocation segregating with a wide spectrum of psychiatric disorders in a large Scottish pedigree . Since its discovery, several independent linkage and association studies in diverse populations and various phenotype models – including SZ, bipolar disorder, major depression, as well as various neurophysiological, cognitive and structural traits – have confirmed the original findings, further supporting a central role for DISC1 genetic variation in conferring susceptibility to psychiatric illness – .
Nevertheless, except for the translocation no specific causal variant has yet been identified, , and most compelling evidence so far has come from functional genomic and cell biological analyses, suggesting an essential role of DISC1 in neuronal development, including adult neurogenesis and signaling –.
Research into DISC1's multiple interaction partners has proved very valuable in elucidating its biological functions, further implicating it in essential processes of brain development and adult neuronal function , , –. With over 15 in vivo confirmed protein interactors , –, and many more potential interactions (>60, identified by yeast two-hybrid screens) , –, DISC1 is considered a central ‘hub’ protein, connecting numerous functional systems, including neuronal migration, neurite outgrowth, cytoskeletal modulation and cAMP signaling, within the brain , . The observation that several of these interaction partners have been identified as independent (genetic) susceptibility factors for neuropsychiatric diseases , and the potential crosstalk with known schizophrenia risk factors such as dysbindin and neuregulin, which share multiple putative binding partners with DISC1 , indicate that not just DISC1, but a multidimensional ‘DISC1 pathway’ is involved in the etiology of psychiatric diseases.
At present, however, little is known on the role of genetic variation in the individual DISC1 interactors, or combinations of the different interaction partners, in the development of these disorders. Therefore, detailed studies of the various components of the DISC1 pathway will be essential to fully understand the role of this complex molecular network in conferring susceptibility to psychiatric diseases. Studying a set of candidate genes in convergent molecular risk networks – like the DISC1 pathway – is an attractive strategy, as it allows to investigate the cumulative effects of (moderately deleterious) mutations that, because of their molecular connection, could cause genetic impairment of pathway activity and thereby lead to phenotypic effects. Hereby, it is of particular importance to search for rare mutations (with a low population frequency (<1%), and a relatively high penetrance), as it has been suggested that these variants, or combinations thereof, could explain a substantial fraction of common disorders like SZ –.
Of course, identifying rare variants requires genotyping large populations of individuals, which is costly and time-consuming, especially when one seeks to study large candidate gene sets typically involved in complex diseases. An interesting approach to minimize cost and time, is through sample pooling that in combination with massively parallel sequencing allows the identification and simultaneous quantification of (rare) variants in multiple individuals –.
In this study, we have used multiplex PCR combined with next generation sequencing of pooled DNA, to explore the mutation burden in DISC1 and 10 of its interaction partners (ATF5, GRB2, FEZ1, LIS-1 (encoded by PAFAH1B1), PDE4B, NDE1, NDEL1, TRAF3IP1, YWHAE, and ZNF365) in schizophrenia patients versus control individuals. The candidate genes were first selected based on their convincingly reported interaction with DISC1. Next, both some more established/studied interactors – having (suggestive) evidence for involvement in psychiatric illness – and some new, potentially interesting candidates were included (Table S1) , .
454 sequence analysis was performed on 4 DNA pools and on 1 individual patient DNA sample. The average number of mapped reads per amplicon was comparable between patient and control pools (1356 versus 1368 reads/amplicon, respectively (~34 reads/individual)), justifying a comparison between the patient and control sequencing data.
The number of reads was homogeneously distributed over the different amplicon pools, except for multiplex reaction 12, where a lower number of reads was obtained (Figure S1 and Text S1, Results & Discussion).
To evaluate the performance of our pooling approach we compared the minor allele frequencies of the variants in the pooled samples (as determined by GS-FLX sequencing) with their actual frequencies for each of the validated variants. The observed and predicted frequencies correlated very well (R2=0.98) across a wide range of frequencies, demonstrating the accuracy of our DNA sample pooling (Figure S2). In addition, the incidence of false negatives was estimated by Sanger sequencing of a representative subset of amplicons (Table S3). Results indicated that the occurrence of type II errors was negligible as long as sequence coverage was sufficient (≥500 reads/amplicon). More details and additional performance measures of our experimental platform are provided in Supporting Text S1 (Results section).
Manual curation of the variants using NovoSNP 4.0, resulted in a total of 110 potential variants with a frequency ≥0.8% (in the discovery sample), of which 61 were located in the coding and untranslated regions of the target genes. These 61 variants were further validated by SNP genotyping in the complete association sample, (comprising 486 unrelated SZ patients and 514 unrelated control individuals) (57 variants), or Sanger sequencing in the original subject population (80 patient and 80 control samples) (4 variants), resulting in a final set of 50 confirmed coding and UTR variants (Table 1).
Variant frequencies of the 50 confirmed variants are listed in Table 1.
The genotype distribution of variant PAFAH1B1 r.1250C>T (rs6628) was not in HWE, and was omitted from further statistical analyses.
At the allelic level, only 1 variant, NDE1 p.Y279Y (=rs17283846), showed a significant difference between patients and control individuals, exhibiting a higher frequency in the latter (Table 1; allelic OR=0.60 (95% CI: 0.39–0.92), p=0.016). At the genotypic level, 6 statistically significant effects were found, the strongest of which was observed for TRAF3IP1 p.T23T (=rs13398676), showing a higher proportion of MAF homozygotes in patients versus controls (Table 1; OR=2.07 (95% CI: 1.2–3.5), p=0.008). However, none of these effects passed Bonferroni correction.
Interestingly, it was observed that large part of the identified coding variants represented rare mutations, with over 35% of the variants (18/50) having a MAF below 1%, and 50% (25/50) having a MAF smaller than 2%. Some of these variants were uniquely present in patients (DISC1 p.E834E, =rs41271517) or in controls (NDEL1 p.P324S; ZNF365 p.L318L).
The overall variation burden (i.e. comprising all 49 variants in HWE) did not differ between the 486 patients and 514 control individuals genotyped, neither in the complete set of candidate genes (p=0.75), nor on an individual gene basis (smallest p=0.42). However, when the variants were stratified based on MAF and variant type, several interesting effects were observed (Table 2 and and3).3). It was found that the rare mutations (MAF <0.01) were more common in the patient population compared to the controls (1.24-fold increase, P=0.246). Though not statistically significant for the assembled rare mutations (including missense, silent and UTR variants), this increased burden became significant when only the missense mutations were considered (1.85-fold increase at MAF<0.01, P=0.018). This effect gradually diminished when variants with a higher MAF were included (1.28-fold increase at MAF<0.02, P=0.112; 1.10-fold increase at MAF<0.05, P=0.479). The observed result, however, did not remain statistically significant after Bonferroni correction (16 tests, corrected P=0.29).
Yet, upon further stratification of the data according to disease onset age, we found that the observed increase in rare missense mutation burden may not be a general effect in our SZ sample, but seems to be related to the age at disease onset, being most pronounced in patients with a young onset age.
More specifically, the patient population was stratified into three groups: early-onset (≤20) (N=163), medium-onset (21 - <35) (N=266) and late-onset (35 - <60) (N=57) patients, and the burden of rare missense mutations (MAF <1%) was investigated in these groups. It was found that the early-onset patients had a particularly high burden of these rare missense variants: in this subset of patients, we found 12.9 mutant alleles per 100 individuals, versus 8.6 in the complete SZ sample and 4.7 in controls (SZ/co ratio =2.75; P=0.0004). This burden decreased to 7.9 mutant alleles per 100 individuals in the medium onset group (SZ/co ratio =1.69; P=0.076), and was zero in the late-onset group (0 mutant alleles/100 individuals; P=0.15).
The observed increased burden of rare missense mutations in early onset patients also remains significant after Bonferroni correction (19 tests, corrected P=0.0076).
For all variants, the degree of nucleotide conservation was assessed using the GERP (Genome Evolutionary Rate Profiling) score. Missense and silent mutations were further examined for potential splicing defects. Several potential effects on splicing were predicted, with varying consistencies across the different matrices (Table 4 and and5).5). These include the disruption of several potential ESE sites, and the creation of two potential new splicing motifs in DISC1 (one donor and one acceptor site), which were predicted by both of the splice-site prediction algorithms used. Neither of these two alternative splice variants corresponded to the splice isoforms described by Nakata and co-workers. . The strongest evidence – i.e., corresponding to the highest number of predictions, across multiple matrices – was found for TRAF3IP K295N (=rs12464423) (a common variant; OR=1.05, 95% CI: 0.87–1.28), and FEZ1 E358Q, a rare variant which is slightly more present in control individuals compared to patients (OR=0.64, 95% CI:0.30–1.40).
The variants causing amino acid substitutions were further evaluated by analyzing evolutionary amino acid conservation using SIFT, PolyPhen and Panther. The resulting conservation scores indicated that the majority of identified amino acid substitutions in our data set had either little or no functional effect (scored as neutral by at least two these algorithms), or required more homologous sequence data in order to make reliable predictions. Two variants however, were concordantly predicted to be potentially damaging: ATF5 p.R167C (SIFT, damaging; PolyPhen, probably damaging; Panther, possibly damaging) and DISC1 p.S704C (=rs821616) (all three algorithms: possibly damaging). In addition, one variant (DISC1 p.L607F) (=rs6675281) was predicted to be possibly damaging by SIFT and Panther, but scored as benign by PolyPhen (Table 4).
The missense mutations were also examined for putative disruptions of specific structural and functional properties using various predictors in the SNPeffect toolsuite (Reumers et al., 2008). Single amino acid replacements can affect the protein's structure and dynamics, they can disrupt functional sites or affect the cellular processing of the protein. None of the missense variants detected in this study caused significant changes in any of the properties examined. The lack of three dimensional structures of the proteins in the DISC1 pathway, or even structural models of close homologs, hindered a detailed analysis of the influence of the mutations on protein stability.
However, this scarcity of structural models is probably due to a high occurrence of intrinsically disordered regions in the proteins under study. Indeed, DisProt analysis showed that all proteins except LIS1 (encoded by PAFAH1B1), GRB2 and YWHAE have ≥40% disordered residues (Figure 1) (Peng et al., 2006). The missense variants were overrepresented in proteins with a high content of unstructured regions, with 19 of the 22 identified missense mutations (86%) residing in an unstructured region of the protein, and only 3 mutations (DISC1 L607F (=rs6675281), DISC1 S704C (=rs821616) and ZNF365 A26T (=rs7076156)) lying outside such a region (Figure S4). Notably, none of the 3 more ‘structured’ proteins contained missense mutations.
Finally, all UTR variants were analyzed for interference with predicted transcription factor binding sites, miRNA target sites. We did not find evidence for mutations affecting transcription factor binding sites, but we did identify three potential miRNA target site mutations, in the 3′ UTR of ATF5 and DISC1 (Table 6).
To explore the role of genetic variation in DISC1 and 10 of its interaction partners in the etiology of schizophrenia, we sequenced the coding exons and splice junctions of the genes using massively parallel 454 sequencing in pooled samples. A selection of 80 early onset SZ patients and 80 control individuals was used as a discovery sample, resulting in the identification of 50 validated variants (Table 1). These 50 variants were subsequently genotyped in the complete association samples comprising 486 SZ patients and 514 control individuals recruited from an isolated northern Swedish population.
Six variants were found with a statistically significant frequency difference between patients and controls. These include two synonymous mutations, NDE1 Y279Y (=rs17283846) (p=0.025) and TRAF3IP1 T23T (=rs13398676) (p=0.006). Although the functional consequences of silent mutations may not be obvious, recent studies have shown that these variants might modify protein abundance, structure and/or activity via alterations in mRNA stability , splicing , or translation kinetics. The other variants showing a significant effect in two or more of the inheritance models all involve UTR or splice site mutations: ATF5 r.871G>A, DISC1 g.169488T>C (=rs2273890) and NDE1 r.1041A>C (=rs2075511). Interestingly, ATF5 r.871G>A is located in a potential miRNA target site, and may therefore interfere with ATF5 expression and function. However, as none of these effects survived multiple testing correction, further independent replication of these findings will be required.
The observed scarcity of statistically significant main effects in our data set does not necessarily rule out the involvement of the DISC1 pathway in the susceptibility for SZ in our population, but may (at least partly) be attributed to the relatively high occurrence of rare variants, having frequencies too low to be able to run adequate statistical comparisons. Indeed, over 35% of all identified coding variants (18/50) have a MAF below 1%, and 50% (25/50) were present at a MAF smaller than 2%. Though the frequencies of these rare variants were not significantly different between patients and controls, they often have odds ratios higher than 1.5 (resp. lower than 0.67), and several are unique in one or the other group (Tables 4, ,5,5, ,6).6). As for DISC1, none of rare variants identified here overlapped with the 5 ultra-rare cohort-specific variants previously described by Song et al. . Of the two rare variants we identified in this gene, one (W160L) was completely novel, and the other (E751Q) was also reported as rare by Song and colleagues (MAF<1%). Interestingly, this variant also had an OR of ~2 in their population, analogous with our results (Table 4).
To understand the potential role of the identified rare variants in SZ etiology, we evaluated the mutation burden (defined as the average number of mutations per person) in patients versus controls. Under a model in which rare mutations increase risk, we would expect to observe a greater burden in patients compared to control individuals. There was no difference in overall mutation burden between the genotyped patients and control individuals. Yet, when mutation burden was defined not simply as the total number of variants – including neutral polymorphisms – but evaluated as subgroups of variants (based on MAF and variant type), we found that schizophrenia patients were 1.85 times as likely as controls to harbor rare variants (MAF<0.01) causing amino acid substitutions (including frameshifts) (empirical P=0.018, Table 2), indicating a role for these variants in SZ etiology in at least the northern Swedish population. Though this effect was no longer significant after stringent Bonferroni correction, it pointed us towards an even stronger effect in a subgroup of patients. Indeed, the observed increased burden of rare missense mutations seems to be related to the age at disease onset, being most pronounced in patients with a young onset age, having a 2.75-fold higher burden of rare missens variants compared to controls (empirical P=0.0004, Bonferroni corrected P=0.0076) (Table 3). This observation is in line with previous clinical, cognitive, genetic and imaging studies, implicating that early onset SZ is associated with greater genetic loading –, overabundance of rare CNVs impacting on known genes  and increased neurodevelopmental deviance , amongst others. These data emphasize the importance of studying subgroups of patients and identifying endophenotypes.
In addition, our findings support the hypothesis that multiple, individually rare mutations contribute to SZ risk  and, given the distribution of the variants across different genes, also explain the allelic and locus heterogeneity typically observed in SZ. Replication of our findings in larger sample sets will however be required to further substantiate the observed effects.
Detailed analysis of the variants contributing to the increased burden, showed that 8 out of 9 identified rare non-synonymous mutations in this study had an increased abundance in patients versus controls. These 8 mutations are located in DISC1 (2 mutations), PDE4B (1 mutation), ATF5 (1 mutation), TRAF3IP1 (3 mutations) and ZNF365 (1 mutation) (Table S4). Though not statistically significant on a single gene level, each of these genes causes an individual increase in mutation burden with a factor ~2 in patients versus controls (average fold increase 2.24±0.51). Taking into account the number of coding bases in these genes, DISC1 and ATF5 were found to have highest mutation burden per base (Table S4), and may thus be considered the strongest candidates for further detailed mutation analysis in a larger sample. Indeed, only a subset of our patient population (80 individuals out of 486) was sequenced in this study, enabling the detection of merely a fraction of all rare variants present in this population (see Supporting Text S1, Discussion section). Follow-up sequencing of the candidate genes in the complete patient sample may therefore uncover other rare (non-synonymous) mutations, possibly further contributing to the observed differences in mutation burden.
In order to estimate the potential risk associated with the identified missense variants, a range of protein structural and functional properties was investigated. Rather unexpectedly, we found that none of the 22 identified missense variants caused any significant effect on the various properties examined. This absence led us to the observation that 8 of 11 proteins under study showed a remarkably high occurrence of intrinsically disordered regions (IDRs). Indeed, all proteins except LIS1, GRB2 and YWHAE were found to have ≥40% of disordered residues by DisProt analysis (Figure 1, panel A). Furthermore, we observed that ~90% of the identified missense variants were located in these IDRs (Figure S4), while neither PAFAH1B1, GRB2, nor YWHAE contained a single missense variant.
IDRs are segments of proteins that do not definitively fold and remain flexible and unordered. These proteins take up different structures upon binding to different targets, and thereby exhibit functional flexibility –. Disordered regions of proteins have been shown to have important physiological roles, including molecular recognition, cell regulation and signal transduction . It is therefore not surprising that protein disorder turns out to be very common in human diseases – being significantly enriched in a wide variety of disease-associated proteins, including neurodegenerative disease, cancer, cardiovascular disease and diabetes –. Furthermore, it has been shown that IDRs are particularly prevalent in hub proteins and interaction networks, where their conformational flexibility is required to accommodate binding between the different interaction partners –. Interestingly, our analyses revealed that this is also the case here, with the DISC1 pathway proteins clearly exhibiting a higher abundance of intrinsically disordered residues, compared to the human proteome , as well as a set of brain and schizophrenia-related protein sequences (p=0.018; 0.013 and 0.0098, respectively) (Figure 1, panel B). This is an exciting new insight, which – to our knowledge – has never been reported in the literature, and may provide a new boost to the complex research field of psychiatric genetics. While alterations of disordered regions may not directly cause changes in protein structure, they are very well capable of interfering with the function of proteins, e.g. by affecting the affinity for interaction with other proteins, or altering the coupled binding-folding mechanism of (one of) the binding partners. Importantly, it has been shown that intrinsically disorder is very sensitive to changes in amino acid sequence; as recently described –, maintaining disordered regions through evolution (or sequence changes) appears very difficult, whereas helices and strands are maintained more easily. Neutral mutations with respect to disorder are therefore very unlikely –. Certainly in a complex network, such as the DISC1 pathway, it is very well conceivable that mutations and/or changes in one of the proteins or its environment could reduce its ability to recognize appropriate binding partners and lead to partial or complete collapse of the protein network.
In this study, ~90% of all identified missense variants (including the rare mutations underlying the increased burden in patients versus controls) are located in an IDR. Interestingly, some of the (rare) variants indentified in this study fall into known binding regions on one or more of the interactors (Figure S3). E.g., ATF5 R167C is located in the DISC1 binding region of this protein; DISC E751Q resides in the binding sites for ATF5, LIS1 and PDE4B; and TRAF3IP1 E260K is located in the DISC1 binding region of this protein. Although these observations are certainly very intriguing, they should be regarded with some caution, as the reported binding regions between the different interactors are often quite large, hence no clear conclusions can be drawn from them. Moreover, as not all of the binding regions for the different interactions have been described in literature, it is impossible to give a complete picture of this. The question whether one (or more) of these mutations might influence protein (or even pathway) function, by interfering with any of the key features associated with IDRs, will be one of the major challenges for future work. A first clue about potential effects of some of the variants may be provided by their amino acid conservation (Tables 4, ,5,5, ,6).6). Based on evolutionary conservation scores generated by 3 different algorithms, we found that three variants were predicted to be possibly damaging: ATF5 R167C, DISC1 L607F (=rs6675281) and S704C (=rs821616). Interestingly, two of these variants (DISC1 S704C and L607F) were recently reported to have an actual functional effect –. The fact that the predicted outcome for DISC1 L607F and S704C corresponds to already known biological consequences greatly underlines the value of our in silico predictions, also for other, unknown variants. This is especially interesting as to ATF5 R167C, which was also predicted to be damaging, but not previously reported. This variant corresponds to a novel, rare mutation, having an odds ratio of 2.6 (95% CI: 0.50–13.46). Further studies of this variant are warranted to clarify its relation to disease.
To our knowledge, this study is the first describing a comprehensive resequencing analysis of the DISC1 pathway in schizophrenia. Our results provide support for a model of SZ pathogenesis that includes the effects of multiple rare variants, residing in different vulnerable genes, which may in turn be functionally linked into pathways and networks. This model is consistent with the theory presented by Eyre-Walker , stating that rare alleles should explain most of the variance in complex traits if there is natural selection for the trait. Based on these findings, and as also suggested by McClellan and co-workers , we argue that rare risk alleles may be revealed by research strategies including extensive resequencing of genes previously shown to be informative (e.g. based on a chromosomal translocation, such as DISC1) and, importantly, these genes' functional network.
Assigning potential functional significance to identified variants is a major challenge in genomics research. In this work, a wide array of functional properties was examined to predict possible deleterious effects of the variants. Using these tools, we were able to predict several potential effects on splicing and miRNA target motifs. Yet, alterations of protein structure or function were hard to track down using standard in silico prediction programs, as a majority of the proteins encoded by our candidate genes contain large regions of intrinsically disordered residues. Though amino acid conservation analysis may provide a first hint of potential functional effects, it does not tell the whole story, as disorder-based signaling is a complex process, depending on multiple factors including alterations in protein context, alternative splicing and post-translational modifications , . However, in our opinion, this high prevalence of IDRs in the DISC1 pathway is a very fascinating finding in se, hopefully encouraging further research into this complex area, and providing new clues to our understanding of the complex etiology of SZ and other (psychiatric) disorders. Indeed, as an increasing amount of evidence is beginning to emerge that many important biological functions depend directly on the disordered state, alteration of this disorder may play a crucial role in the pathogenicity of many complex diseases (including SZ), thereby adding another level of complexity to the study of their molecular mechanism, and providing exciting new perspectives for future research.
In a first phase of the study we used DNA of 80 SZ patients (41 females, 39 males) and 80 control individuals (40 females, 40 males) for 454 sequencing based variant discovery. These individuals were selected on the basis of their early age at disease-onset (18.55±3.36), from a larger association sample consisting of 486 unrelated SZ patients (180 females, 306 males) and 514 unrelated control individuals (275 females, 239 males). All originated from a geographically isolated population living in the county of Västerbotten in Northern Sweden. They were all Caucasians and none were of Finnish, Norwegian or Lappish descent. All patients fulfilled the DSM-IV criteria for SZ . The mean age at disease-onset in the complete SZ sample was 24.8 (±7.3) years and the mean age at inclusion 53.1 (±15.1) years (see Supporting Text S1 (Materials and Methods section) for additional information regarding the ascertainment and assessment procedure of the patients).
The control population had a mean age of 58.0 (±13.0) years at inclusion. They originated from the same geographical area as the patients and were randomly selected from the Betula study, described in detail elsewhere (http://www.betula.su.se/en/) –. None of the controls were reported to have a diagnosis of schizophrenia based on studies of psychiatric records and/or an interview.
All participants gave written informed consent, and the study was approved by the regional Medical Ethical Committees of the universities of Umeå and Antwerp.
The patient-control sample was controlled for population stratification by the genotyping of 37 microsatellite (STR) markers via the use of standard genotyping and scoring methods. Statistical tests for population stratification were performed using the program STRUCTURE (http://pritch.bsd.uchicago.edu/structure.html). No population substructure was observed in the association sample (data not shown).
Genomic DNA was extracted from peripheral blood using standard methods.
4 DNA pools were prepared (2 ‘patient pools’ and 2 ‘control pools’), each comprising 40 DNA samples. Hereto, an equal amount of each sample (100 ng per individual) was combined, and the resulting DNA pool was adjusted to a final concentration of 10 ng/µL.
To control the efficiency of DNA pooling, the relative abundance of 3 SNP alleles was measured by pyrosequencing and compared to the allele frequencies of the individual samples constituting the pools (Supporting Text S1 (Materials and Methods section) and Table S2).
Multiplex PCR assays were designed to amplify all coding exons and splice junctions of the 11 selected genes (totaling ~16 kb target sequence). The target sequence was covered by 155 amplicons with an average length of ~221 bp, resulting in ~34 kb of sequence. The 155 amplicons were amplified in 12 multiplex PCR reactions. Simplex PCR reactions of the amplicons showed that all except two of the primer pairs (both in ATF5) amplified the correct fragment (conversion rate =98.7%). The two failed ATF5 amplicons were omitted from further experiments. Multiplex PCR reactions were performed for each DNA pool and 1 individual patient DNA sample, also contained in one of the patient pools (Supporting Text S1 (Materials and Methods section).
Each multiplex PCR reaction was purified on a QIAquick PCR Purification column (Qiagen GmbH, Hilden, Germany), and the concentration of the eluates measured using a Nanodrop spectrophotometer (NanoDrop Technologies, Wilmington, DE). Finally, for each DNA pool (and the individual sample), the 12 purified multiplex PCR products were mixed, taking into account the concentration of each multiplex reaction and its number of constituent amplicons, to obtain an equal representation of every amplicon in the final PCR mixtures.
The final mix of 155 amplicons of each sample was sequenced using the standard amplicon sequencing protocol on a 454 GS-FLX genome sequencer (Roche Applied Science) according to the manufacturer's instructions. For each of the 4 pools, 1 lane of a 2-lane Bead Loading gasket on a 70×75 mm PicoTiterPlate was loaded, and sequenced from both directions. The individual DNA sample was sequenced using 1 lane of a 16-lane Bead Loading gasket.
The generated standard 454 flow files were analyzed using NovoSNP 4.0 (beta), an in-house developed software program for the identification of variants in resequencing experiments. In short, NovoSNP 4.0 uses the quality and height of the flow at the variation position and the neighboring flows for SNP identification. Further, it takes into account the number and ratio of reads showing the variant – thereby allowing for the analysis of pooled sequencing data – and favors variants seen in both directions. Finally, the program creates a database of all identified variants, for which the flows can be visually inspected (De Rijk P., personal communication).
All variants with a frequency ≥0.8% were examined, allowing for a secure cutoff level to detect singleton variants, which theoretically have a frequency of 1.25% in a pool of 40 samples.
Finally, all potential variants (except 4) were validated using iPLEX SNP genotyping in the complete association sample (486 patients and 514 control individuals). For technical reasons, 4 variants were genotyped by traditional Sanger-based sequencing in the original subject population (80 patient and 80 control samples) (Supporting Text S1 (Materials and Methods)).
gPlink version 2.050  (http://pngu.mgh.harvard.edu/purcell/plink/) was used to calculate genotype deviation from Hardy-Weinberg equilibrium (HWE), by an exact test , and to compare individual allele and genotype frequencies between patients and controls, by a standard χ2 test for independence.
Differences in mutation burden (defined as the average number of variant alleles per individual) between patients and controls were assessed by two-sided t-tests, using SPSS version 16.0.2 (Brussels, Belgium). The data were thereby stratified according to type (missense, silent and UTR variants) and MAF (<0.01, <0.02 and <0.05, respectively). Empirical p-values were generated using the max(T) permutation approach, based on 100000 replicates. The level of significance for all statistical tests was 0.05. When correcting for multiple testing, Bonferroni corrective measures were taken to control false positive rates. All association analyses were performed on the complete sample (i.e. including the discovery samples). Contrary to the use of patient samples only for variant discovery, inclusion of an equally large control sample in the discovery phase, does not lead to an inflation of type I errors  (see Supporting Text S1, Discussion section).
To investigate potential deleterious effects associated with the identified variants, each variant was subjected to a battery of in silico analyses, including assessment of nucleotide and amino acid conservation, effects on potential splice sites and cis-acting elements, potential disruption of miRNA target sites and predicted transcription factor binding sites, and alterations of functional and structural properties of the proteins. Finally, sequences where also examined for intrinsically disordered regions using DisProt (http://www.ist.temple.edu/disprot/Predictors.html). A detailed description of the applied methods is provided in Supporting Text S1.
Boxplot showing the distribution of the number of reads per amplicon in each multiplex PCR reaction, for control and patient pools. The observed read count is uniformly distributed across the different amplicon pools (with an average of 1362 reads/amplicon), except for multiplex reaction 12. °: outliers (values between 1.5 and 3x the interquartile range from either end of the box) *: extreme outliers (values more than 3x the interquartile range from either end of the box).
Allele frequencies estimated from pooled DNA samples (as determined by GS-FLX sequencing) versus the actual frequencies (as determined by genotyping the individual samples) in the different pools. °: outlier (corresponding to rs13398676, in SZ pool 1).
Overview of the known interaction domains between the different proteins investigated, along with the positions of the variants identified in this study. Protein lengths are given between brackets. Binding sites between two proteins are indicated along the line connecting them, with the binding site(s) on a certain protein closest to that protein (orange: binding sites on DISC1, blue: binding site on other proteins). The identified missense mutations are shown in a white area within the proteins' oval. Rare missense mutations (MAF ≤1%) are underlined, and mutations located in one of the binding sites are shown in italic. Note that the positions of many of the binding sites were not described/found in literature (indicated with ‘?’).
Schematic representation of the overall domain architecture of each of the proteins investigated, highlighting the regions of predicted disorder (orange), along with regions having known homologous domains (blue). On each protein, the identified missense mutations are indicated. Mutations lying outside a disordered region are marked with an extra line.
DISC1 interaction partners included in the study, and evidence for their involvement in psychiatric disease.
Allele frequencies of pooled DNA and individual samples, as determined by pyrosequencing.
Properties of amplicon subset selected for false negative rate estimation using Sanger sequencing.
Mutation burden of identified rare nonsynonymous mutations (MAF<1%), stratified by gene.
Supplementary Materials and Methods, Results and Discussion.
We are grateful to the patients and family members for their kind cooperation in this study and to the personnel of the Flanders Institute for Biotechnology (VIB) Genetic Service Facility (www.vibgeneticservicefacility.be). Research nurse Eva Lundberg is thankfully acknowledged for her help and expertise. Dr. Del-Favero takes responsibility for the integrity of the data and the accuracy of the data analysis and declares that all authors had full access to all the data in the study.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This research was supported by the Special Research Fund of the University of Antwerp, the Fund for Scientific Research Flanders (FWO-F), the Institute for Science and Technology–Flanders (IWT-F), the Swedish Research Council (grants 2006-4472 and 2009-5269), the Medical Faculty, Umeå University and the County Councils of Västerbotten and Norrbotten, Sweden. LNM was supported by a postdoctoral fellowship from IWT-F, and MVDB holds a PhD fellowship of the IWT-F. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.