|Home | About | Journals | Submit | Contact Us | Français|
We have used a custom 1,536-SNP array to interrogate 94 functionally relevant candidate genes for schizophrenia and identify associations with 12 heritable neurophysiological and neurocognitive endophenotypes collected as part of the Consortium on the Genetics of Schizophrenia (COGS).
Variance-component association analyses of 534 genotyped subjects from 130 families were conducted using Merlin. A novel bootstrap Total Significance Test was also developed to overcome the limitations of existing genomic multiple testing methods and robustly demonstrate the presence of significant associations in the context of complex family data and possible population stratification effects.
Associations were observed for 46 genes of potential functional significance with 3 SNPs at p<10−4, 27 SNPs at p<10−3, and 147 SNPs at p<0.01. The bootstrap analyses confirmed that the 47 SNP-endophenotype combinations with the strongest evidence of association significantly exceeded (p=0.001) that expected by chance alone with 93% of these findings expected to be true. Many of the genes interact on a molecular level, and eight genes displayed evidence for pleiotropy (e.g., NRG1 and ERBB4), revealing associations with four or more endophenotypes. Our results collectively support a strong role for genes related to glutamate signaling in mediating schizophrenia susceptibility.
This study supports the use of relevant endophenotypes and the bootstrap Total Significance Test for the identification of genetic variation underlying the etiology of schizophrenia. In addition, the observation of extensive pleiotropy for some genes and singular associations for others in our data suggests alternative, independent pathways mediating pathogenesis in the “group of schizophrenias”.
Genetic factors clearly play a substantial role in the etiology of schizophrenia, as evidenced by family and twin studies that indicate a heritability of up to 80% for the disorder (1,2). Although a number of replicated linkages have been reported, implicating multiple chromosomal regions (3-5), none of these linkage findings has led to cloning of causative genes for schizophrenia. Several neurobiologically plausible candidate genes have, however, been identified (6-7).
An alternative strategy to linkage and other agnostic analysis methods that may aid in the genetic dissection of complex diseases is to interrogate candidate genes thought to be associated with both the qualitative diagnostic category and quantitative endo- or intermediate phenotypes. This neurobiologically informed strategy utilizes existing knowledge of the underlying neural substrates of the disorder and may be particularly informative in unraveling the genetic architecture of schizophrenia. As part of the Consortium on the Genetics of Schizophrenia (COGS; 8-10), we constructed a custom SNP array containing 1,536 SNPs in 94 genes of relevance to schizophrenia and related phenotypes. We utilized information regarding putatively important neurobiological systems, as well as an extensive review of published linkage, association, and model organism studies, to identify and rank genes in terms of their level of importance in understanding schizophrenia. The resulting COGS SNP Chip provides excellent coverage of many previously suggested candidate genes for schizophrenia, including AKT1, CHRNA7, COMT, DAO, DAOA, DISC1, DTNBP1, ERBB4, GRM3, GSK3B, NOS1AP, NRG1, PAFAH1B1, PPP3CC, PRODH, RELN, and RGS4 (6-7), as well as several novel genes from putatively important pathways.
We utilized the COGS SNP Chip to evaluate the genetic associations of these 94 candidate genes with 12 heritable neurophysiological and neurocognitive endophenotypes that have been shown to be characteristically impaired in schizophrenia: prepulse inhibition of the startle response, P50 suppression, the antisaccade task for eye movements, the Continuous Performance Test (degraded stimulus version), the Letter-Number Span, the California Verbal Learning Test, and six measures from the Pennsylvania Computerized Neurocognitive Battery (abstraction and mental flexibility, face memory, spatial memory, spatial processing, sensori-motor dexterity, and emotion recognition). Our goal was to identify not only singular genetic associations with the COGS endophenotypes but also to assess the degree of pleiotropy (genetic associations with multiple endophenotypes). Since genes exhibiting pleiotropic effects across several endophenotypes may have far reaching neurobehavioral implications, these genes may be optimal candidates to serve as biomarkers for the early identification and intervention of schizophrenia in at risk populations, as well as targets for treatment via novel pharmaceutical and psychosocial therapies. To confirm the collective significance of our findings, we also developed a novel multiple testing strategy, the bootstrap Total Significance Test, which overcomes some of the limitations of similar methods currently used in genomics.
Families were ascertained through the identification of probands at each of the seven COGS sites that met the DSM-IV-TR criteria for schizophrenia based on the administration of the Diagnostic Interview for Genetic Studies (11) and the Family Interview for Genetic Studies (12). The minimal requirement for pedigree ascertainment was a schizophrenia proband, both parents, and at least one unaffected sibling. This sampling strategy provides greater potential for phenotypic contrasts between and among the siblings for quantitative genetic analyses. Additional affected and unaffected siblings were collected whenever possible. All subjects ranged in age from the a priori determined 18-65 years and received urine toxicology screens for drugs of abuse before phenotyping (negative screens were required). The ascertainment and screening procedures, inclusion/exclusion criteria, and descriptive statistics of the sample, are discussed in detail elsewhere (10). After a detailed description of study participation, written informed consent was obtained for each subject per local IRB protocols.
Each subject in the COGS sample was assessed for 12 endophenotypes, as described elsewhere in detail (14-15), all of which have been shown to be heritable (13). Prepulse inhibition was measured as the percent inhibition of the startle reflex in response to a weak prestimulus using a 60 msec prepulse to startle stimulus interval (16-18). P50 suppression was measured as the difference between the amplitudes of the P50 event-related potentials generated in response to the conditioning and test stimuli that are presented with a 500 msec interstimulus interval (19-20). Although the ratio is the more commonly used measure of P50 suppression, we have found the difference score to be more heritable in our COGS family sample (13). The “overlap” antisaccade task of oculomotor inhibition, which requires subjects to fixate on a central target and respond to a peripheral cue by looking in the opposite direction at the same distance, was measured as the ratio of correct antisaccades to total interpretable saccades (21-22). The degraded stimulus version of the Continuous Performance Test (continuous performance), a widely used measure of deficits in sustained, focused attention with a high perceptual load, was assessed based on correct target detections and incorrect responses to nontargets (d’) (23-24). The Letter-Number Span, a prototypical task to assess working memory information storage with manipulation, was measured as the correct reordering of intermixed numbers and letters. For the assessment of verbal learning and memory, we used the California Verbal Learning Test, Second Edition, (verbal learning), an established list-learning test measured as the total recall score of a list of 16 verbally presented items summed over 5 trials.
We also employed a modified version of the University of Pennsylvania Computerized Neurocognitive Battery, excluding measures of attention and verbal and working memory, which were assessed as detailed above (25-26). The six measures evaluated using this battery are as follows: 1) The test for abstraction and mental flexibility (abstraction) presents four objects from which the subject must chose the one that does not belong. 2) An assessment of face memory requires subjects to recognize 20 previously presented target faces amongst 20 distracter faces. 3) The assessment of spatial memory uses Euclidean shapes as learning stimuli in a recognition paradigm identical to that used for face memory. 4) For an assessment of spatial processing, two lines are presented at an angle, and the corresponding lines must be identified on a simultaneously presented array. 5) The assessment of sensori-motor dexterity requires the subject click as quickly as possible using the mouse on a target that gets increasingly smaller. 6) An assessment of emotion recognition involves the correct identification of a variety of facial expressions of emotion. Each of these tests was measured as “efficiency” calculated as accuracy/log10(speed) and expressed as standard equivalents (z-scores).
Genes of interest were identified and ranked in terms of their level of importance in understanding schizophrenia based on complementary information from a number of research domains: 1) linkage and association studies of schizophrenia and related phenotypes; 2) model organism, gene expression, and brain imaging studies; and 3) genetic networks and biological pathways of relevance to schizophrenia. We mined public databases for general information about the genes and polymorphic variants of interest, including haplotype tagging and potentially functional SNPs and those with previous reports of association with schizophrenia or related phenotypes. These data were then combined and compared to the list of SNPs available from Illumina, Inc., for choice of the final 1,536 SNPs in 94 genes. A total of 1,417 haplotype-tagging SNPs obtained from the TAMAL website (Technology And Money Are Limiting, 27) were selected from Caucasian HapMap populations (28) to efficiently interrogate 86 of the genes with an r2 threshold of 0.8 in our primarily (89%) Caucasian sample. We included 5kb of flanking sequence on either side of each gene to capture nearby regulatory elements in the tagged regions. The TAGGER SNP selection algorithm (29) with an aggressive tagging mode forcing all coding SNPs into the model was used to select tagging SNPs for 76 of the genes. The Gabriel SNP selection algorithm (30) with a pair-wise tagging mode was used to select tagging SNPs for an additional ten genes to achieve sufficient gene coverage with the available SNPs. A combination of gene-spanning and putatively associated SNPs was used for the remaining eight genes because suitable tagging SNPs were not available. For CHRNA7, SNPs were selected within exons/introns 1-4 only, as the remainder of the gene cannot be screened due to a partial duplication, CHRFAM7A (31). The custom array includes 109 SNPs in 33 genes with reported evidence of association, 29 coding sequence variants in 17 genes (25 nonsynonymous and 4 synonymous), and 18 SNPs located in putative promoter regions or transcription factor binding sites. On average, there is 1 SNP per 10kb for each gene with variance due to linkage disequilibrium patterns, SNP availability, etc. Minor allele frequencies for these SNPs ranged from 0.01 to 0.50, with an average of 0.23. The complete list of all 1,536 SNPs and 94 candidate genes included on the COGS SNP Chip and the specific details from our research is available in Supplemental Table 1, including rs numbers, chromosomal locations, gene information, designation of SNPs (e.g., as tagging, coding, putatively functional, or associated, including p-values and references), relevant sequence information, and minor allele frequencies for the four HapMap populations. Ingenuity Pathway Analysis (Ingenuity® Systems) was used to investigate the molecular interactions between the included genes and to provide information regarding pathway membership.
A sample of 534 subjects from 130 families was selected for genotyping based on the availability of locally collected blood from the five of the seven COGS sites. For each family, both endophenotype data and DNA were available for all schizophrenia probands and at least one unaffected sibling for a total of 130 sibling pairs discordant for schizophrenia. DNA was also available for 217 parents, 130 of which were phenotyped. An additional 73 phenotyped siblings were included across the family set as well, 57 of which were also genotyped and six of which had schizophrenia. On average, cleaned endophenotype data were available for 370 (±41) subjects. This sample has >80% power to detect SNPs explaining 3% of the variance at p<0.01, 4% of the variance at p<10−3, and 5.5% of the variance at p<10−4.
Genotyping was performed by the Biomedical Genomics Laboratory at the University of California, San Diego, using an Illumina BeadStation 500 Scanner and 20μl of genomic DNA at 50ng/μl plated on 96-well plates with three positive controls per plate. Genotype data were cleaned using Illumina’s BeadStudio v.3 software. Each subject was evaluated across all 1,536 SNPs, and 6 subjects were excluded for having poor allele call rates, defined as an average call rate <80% and a median genotype call score <0.76. Each SNP was then evaluated across all remaining subjects, and 38 SNPs were excluded for having average call rates <90% and cluster separation scores <0.05. Another 95 SNPs were eliminated following a manual examination of all SNPs with call rates >90% but cluster separation scores between 0.05 and 0.25. A total of 133 SNPs were thus removed, resulting in a 91.3% SNP assay conversion rate. An additional 0.03% of the genotypes were removed due to Mendelian inconsistencies. The final group of 1,403 passing SNPs had a genotype call rate of 99.98% (749,052 genotypes called out of a possible 749,202). Accuracy estimated from 72 replicate DNA samples genotyped across the panel indicated a 99.98% reproducibility rate (100,139 identical genotypes out of a possible 100,163). Further quality control assessments using the PLINK analysis toolset (32) identified 15 SNPs with minor allele frequencies <0.01 in the unrelated individuals (i.e., parents) and 3 SNPs with Hardy-Weinberg Equilibrium p-values <10−4. Removal of these additional SNPs resulted in the final 1,385 SNPs with minor allele frequencies approximating those observed in the Caucasian HapMap sample.
Multidimensional scaling, as implemented in PLINK, was used to assess the degree of population stratification in this sample and to validate the self-reported subject ancestries, which are not always reliable. These results suggested that subjects of Caucasian ancestry formed the largest and most genetically homogenous group, encompassing 89% of the sample. Although the remaining subjects reported varying degrees of Hispanic, Native American, Asian, and African American ancestry, most generally clustered with the Caucasian group. To further evaluate the effects of the observed admixture, the first two principal components from the multidimensional scaling analysis were investigated as covariates, along with age and sex, via heritability analyses of the endophenotypes in SOLAR (33). All factors found to be significant covariates were incorporated in the subsequent association analyses. Bivariate genetic correlations between endophenotypes were also explored. The heritability estimates and genetic correlations obtained in these analyses were very similar to those we have previously published in a larger sample, which includes the current sample, and are not reiterated here (13). Schizophrenia was not included as a covariate in these analyses, since that would effectively remove the part of the gene-endophenotype association specifically related to schizophrenia. Therefore, the analysis could not reveal significant SNP associations with an endophenotype perfectly correlated with schizophrenia status, no matter the causal pathway between genotype and endophenotype.
We employed the variance-component association module of the Merlin software package (v.1.1.2; 34) to assess the degree of association between the 1,385 SNPs and 12 endophenotypes in the 130 families. The association analyses were adjusted for age (all except P50 suppression), sex (prepulse inhibition, P50 suppression, verbal learning, spatial processing, emotion recognition), and ancestry as the first principal component from the multidimensional scaling analysis (antisaccade, continuous performance, verbal learning, spatial memory, spatial processing). A secondary analysis was also performed in Merlin to assess the independence of multiple associations by using the most significant SNP as a covariate, thereby decreasing the significance of any other SNPs in linkage disequilibrium with it. Independent signals were considered as those remaining at p<0.01. For comparison purposes, the DFAM module of PLINK was used to perform a discordant sibpair analysis of schizophrenia with the 1,361 autosomal SNPs. The effective number of independent SNPs tested, accounting for redundancies in linkage disequilibrium due to the inclusion of putatively functional and/or associated SNPS along with tagging SNPs and gene-spanning SNPs, was determined to be 977, with a corresponding Bonferroni correction for multiple comparisons of p=5×10−5 (35) for a given endophenotype and 4.2×10−6 for all 12 endophenotypes. The latter number is very conservative due to the observed between-endophenotype correlations, which complicate exact adjustment across endophenotypes. The multiple testing issue is further addressed by the Total Significance Test below.
To test whether the observed genotype-endophenotype associations significantly exceed what would be seen by chance given that there are 16,620 total tests (1,385 SNPs and 12 endophenotypes), we developed and implemented a separate, novel multiple testing strategy, the bootstrap Total Significance Test. Our strategy introduces two innovations that together overcome several limitations of existing genomic multiple testing methods.
First, we base our approach on bootstrap sampling instead of permutation sampling. The bootstrap works in settings where permutation tests cannot be applied or can be applied only with difficulty (36). Bootstrapping allows straightforward handling of family data even with complex patterns of missing data. In contrast, permutation procedures for family data are difficult to construct and do not use all information in the data if genetic variants drive phenotypic differences between families (36). Bootstrapping also handles confounding variables easily. Permutation tests do not as the confounder is potentially associated with both predictor and outcome under the null hypothesis. Most importantly, this problem arises when covariates are included to adjust for cryptic population stratification. Bootstrapping will also work when the goal is to test an interaction in the presence of main effects.
Second, we introduce the concept of a Total Significance Test to determine whether the strongest genotype-endophenotype associations are more extreme than expected by chance alone. The Total Significance Test provides a rigorous statistical p-value that collectively applies to the strongest results in the data but is less conservative than standard p-value adjustments for multiple tests. The Total Significance Test is less dependent than other multiple testing methods on extremely small p-values that are difficult to obtain with moderate samples sizes and may, even in large samples, be due more to rare sampling events or statistical flukes than to replicable biological findings. Lastly, we use the bootstrap Total Significance Test results to provide a posteriori predictive value for each genotype-endophenotype association, giving a measure of how likely each detected association is to be true. When the factors above are not present and bootstrapping is not required, the Total Significance Test can also be based on permutation sampling.
We implemented the bootstrap Total Significance Test in MatLab. Specifically, we first applied a multiple regression model to the original data for each of the 1,385 × 12 SNP-endophenotype combinations, with the endophenotype as the dependent variable and the SNP (coded as the number of copies of the minor allele) and relevant covariates (those used in the variance-component Merlin analyses) as independent predictors. Thus, the multiple regression model used in the Total Significance Test was identical to the variance-component model used in the Merlin analyses, with the sole exception of the within-family correlation structure. For each SNP-endophenotype combination in the original data, we calculated a z-statistic, Z=(B-0)/s, where B is the estimated regression coefficient corresponding to the SNP and s is its estimated standard error in the multiple regression. The value 0 is the expected value of B under the null hypothesis of no SNP-endophenotype association.
We then simulated the same statistics under the null hypothesis by generating one group of 10,000 random bootstrap data sets (the training group) and a second group of 1,000 identically generated bootstrap data sets (the test group), following standard bootstrap theory for clustered data (37). To create each bootstrap data set, we randomly selected families from the original set of 130 families with replacement (i.e., families were selected randomly without respect to whether they were previously selected). Any one of the original families can appear multiple times in a single bootstrap data set or not at all. Each time a family appears, all data associated with it is placed in the new data set, including all family members, their covariates, endophenotypes and genotypes. No data is rearranged, as it would be in a permutation test. For each bootstrap data set, z-statistics were calculated as Z*=(B*-B)/s*, where B* and s* are the regression coefficient and standard error calculated from the multiple regression applied to the bootstrap data. For a bootstrap sample, the true null hypothesis is given by the value estimated in the original data set, so that B replaces 0 in the formula for Z. The bootstrap Z*s provide an empirical estimate of the joint distribution of the Zs in the original data set, under the null hypothesis of no association. This is an application of standard bootstrap theory that is used, for example, in construction of the bootstrap-t confidence interval (37). The bootstrap automatically incorporates the empirically observed family-level correlation structure without relying on the assumption of a multivariate normal distribution, as well as inter-SNP correlations due to linkage disequilibrium.
To evaluate the p-value for the Total Significance Test, we then compared the test statistics for the original data to their bootstrap distributions. For each Z, we evaluated whether it was so extreme as to fall outside the range of Z*s for the same SNP-endophenotype combination in the 10,000 training bootstrap data sets. As each SNP-endophenotype combination is compared to its own distinct bootstrap distribution, an advantage of our approach is that there is no implicit assumption about exchangeability or identically distributed SNPs or endophenotypes. Let T0 be the total number of tested associations in the original data for which Z is either < minimum Z* or > maximum Z* in the 10,000 dataset training group. Similarly, let T0* be the total number for each of the 1,000 independent bootstrap data sets in the test group, also based on comparison to the 10,000 dataset training group. The collective p-value for T0 is the proportion of bootstrap test data sets for which T0* ≥ T0. The Total Significance Test addresses a different question than the “no family-wise error” criterion provided by a Bonferroni correction or a traditional permutation test. Thus, Total Significance Test p-values are not comparable to those provided by these other methods.
To obtain a posterior predictive value for the associations that are so significant as to be outside the range of the training group, we calculated the expected number of false positives F0 at this level as the average T0* in the 1,000 training data sets. The posterior predictive value for all associations in this initial group was then calculated as (R0-F0)/R0, where R0 is the number actually out of range in the original data.
We then extended this approach to determine if somewhat weaker results, those within the range of the tails of the bootstrap distribution, also significantly exceeded results expected by chance. Let T1 and T1* refer to totals based on comparison to the training group after discarding the smallest and largest Z*s for each SNP-endophenotype combination. The subscript denotes the number discarded. A cumulative p-value for T1 was calculated as the proportion of bootstrap data sets in the test group for which either T1*≥T1, T0*≥T0 or both. T2, T3 and so on were computed and a cumulative p-value was calculated for each, taking into account all prior tests of greater stringency. This p-value simultaneously accounts for all stronger results and must increase sequentially. We considered all results satisfying a total cumulative, collective p-value ≤0.05 to be significant by the total significance and calculated posterior predictive values for each analogous to those described above.
The results of the single-marker variance-component analyses implemented in Merlin, as shown in Figure 1 and detailed in Supplemental Table 2, revealed associations between the 12 endophenotypes and 46 of the 94 genes collectively. There were 3 SNPs with a p<10−4, 27 SNPs with a p<10−3, and 147 SNPs with a p<0.01, all of which may be of interest, given the a priori selection of these genes. There were 23 genes associated with at least one endophenotype with p<10−3 as indicated. The most significant finding in these analyses was for a SNP in NRG1 with spatial processing, which gave a p-value of 6.4×10−6 and explained 6.9% of the genetic variation in this endophenotype. Two other SNPs gave p-values <10−4 in GRIK4 (p=8.3×10−5) and CHRNA4 (p=9.0×10−5), explaining 5.4% and 4.5% of the genetic variation in antisaccade and sensori-motor dexterity, respectively. We also found evidence to support association to four nonsynonymous SNPs and one synonymous SNP: GRM1 Gly884Glu (p=1.1×10−3 for verbal learning), NRG1 Arg38Gln (p=5.6×10−4 for verbal learning), SLC18A1 Val392Leu (p=9.7×10−3 for antisaccade), TAAR6 Val265Ile (p=1.1×10−3 for continuous performance), and HTR2A Ser34Ser (p=9.0×10−3 for Letter-Number Span).
Figure 2 provides a summary of the minimum p-value observed for each gene and endophenotype with the number of independent associations indicated, highlighting the associations of genes across endophenotypic domains. Although half of these genes were found to be associated (p<0.01) with two or more endophenotypes, eight genes in particular (CTNNA2, DISC1, ERBB4, GRID2, GRM1, NOS1AP, NRG1, and RELN) displayed extensive evidence for pleiotropy, revealing associations with four or more endophenotypes in this dataset. In contrast, other genes (e.g., GRM3) were found to be associated with a single endophenotype (e.g., P50). Bivariate analyses revealed genetic correlations between continuous performance, abstraction, spatial processing, and emotion recognition that remained significant following correction for multiple testing (data presented elsewhere; 13). The genes that were generally associated with all of these four endophenotypes in the Merlin analyses were CTNNA2, GRM1, and RELN.
The COGS SNP Chip includes a total of 40 genes that have shown prior allelic or haplotypic associations with schizophrenia or related endophenotypes. Specific SNPs for which evidence for association has been previously reported in the literature were included for 33 of these genes, as shown in Table 1. Although associations to schizophrenia have also been reported for DRD2 (38), DRD4 (39), GRM4 (40), NRG1 (41-45), PPP3CC (46), PRODH (47), and SLC1A2 (48), we were unable to include the relevant polymorphisms on this array due to a lack of availability. We have found evidence for association of 25 of the 40 previously associated genes (AKT1, CHRNA7, COMT, DAO, DISC1, DRD2, DRD3, ERBB4, GABRB2, GRID1, GRIK3, GRIK4, GRIN1, GRIN2B, GRM3, GRM4, HTR2A, NCAM1, NRG1, PRODH, SLC18A1, SLC1A2, SLC6A3, SP4, and TAAR6) with one or more endophenotypes, as detailed in Figure 2, including associations to 10 specific SNPs with previous reports of association to schizophrenia (see Figure 2, Table 1, and Supplemental Tables 1 and 2). The majority of the associations with specific SNPs (eight of 10) were in the same direction as in previous studies. Although this sample was not recruited for an assessment of schizophrenia and is thus not well powered for this purpose, a discordant sibpair analysis did indicate the associations of SNPs within ERBB4, HTR4, and GRM5 with schizophrenia as well (p<0.01, data not presented). We did not find evidence for association of any endophenotype to ADRBK2, BDNF, CACNG2, DAOA, DGCR2, DRD4, DTNBP1, GAD1, HTR7, NEUROG1, NOTCH4, PPP1R1B, PPP3CC, RGS4, or ZDHHC8, despite previous reports of association with schizophrenia.
As shown in Figure 3, the genes included on the COGS SNP Chip cluster into several putatively important pathways, including cell signal transduction, axonal guidance signaling, amino acid metabolism, and glutamate, serotonin, dopamine, and GABA receptor signaling. The 46 genes found to be associated with at least one endophenotype were distributed amongst all of these pathways, with notably higher concentrations of associated genes observed in the glutamate signaling pathway. Of the 16 genes tested in the glutamate pathway, 14 revealed associations with at least one endophenotype, ten of which were associated with two or more endophenotypes. Figure 4 further details the molecular interactions of a subset of the genes on the COGS SNP Chip, highlighting the interactions between many of the 46 genes associated with at least one endophenotype. These data reveal a network of genes directly or indirectly related to glutamate signaling and suggest that disturbances of this pathway may contribute to schizophrenia susceptibility.
Given that the association analyses involved 16,620 tests (1,385 SNPs and 12 endophenotypes), we expect some positive results due to chance. We therefore developed a novel multiple testing strategy, the bootstrap Total Significance Test, to evaluate whether there were more highly significant findings than would be expected by chance alone. Forty-seven of the z-statistics in the original data were entirely outside the range observed in 10,000 bootstrap training data sets, simulated under the null hypothesis. The median number of such z-statistics in the 1,000 test data sets was only 2 and in 95% of the test data sets it was at most 12. Only 1 test data set yielded the 47 out-of-range z-statistics seen in the observed data (p=0.001). These 47 findings have an estimated posterior predictive value of 93%.
We extended the Total Significance Test sequentially to identify 292 SNP-endophenotype associations that collectively satisfied a cumulative p-value of 0.05, discarding the 40 lowest and 40 highest bootstrap values for each test in each training data sets. The corresponding posterior predictive value is 53%, indicating each of these 292 findings more likely than not represents a true positive result. As a less stringent criterion is used and more values are trimmed from each end of the bootstrap training distribution, the posterior predictive value decreases (i.e., the corresponding results include more false positives), and results become less significant. The 292 most significant findings are summarized in Table 2 by gene, along with their significance in the separate variance-component analyses (see Supplemental Table 3 for a complete list). Of the 94 genes on the array, 55 contained at least one SNP with an a posteriori chance ≥53% of being a true finding of association with at least one endophenotype. For the 12 endophenotypes, the number of significant findings ranged from 14 to 34. Negative results by this approach should not be over interpreted, since they are based only on the most significant associations in the data. Failure for a gene to have a test reaching this strict level of significance does not preclude the existence of more modest levels of association.
This study combined the analysis of 94 neurobiologically relevant genes and 12 heritable endophenotypes with schizophrenia-related deficits (13) toward the identification of an interesting pattern of association results for 46 genes across all endophenotypes. Given the observed correlations between many of these endophenotypes (13), we expect that some genes will exhibit pleiotropy and contribute to the variance in two or more endophenotypes. Additionally, some of the genes, like NRG1, have been shown to play a role in neurodevelopment and as such may impact more than one physiological or cognitive function. Even with the limited number of genes tested here, we do indeed find evidence of this pleiotropy. Of the eight genes revealing extensive evidence for pleiotropy across the 12 endophenotypes, six genes (ERBB4, GRID2, GRM1, NOS1AP, NRG1, and RELN) involved either directly or indirectly in glutamate signaling featured prominently with associations to five or more endophenotypes. We also observed association for 14 out of 16 genes tested in the glutamate signaling pathway with at least one endophenotype, 10 of which were associated with two or more endophenotypes. These results are consistent with the glutamate hypothesis, which proposes that compromised NMDA receptor function contributes to the development of schizophrenia (117-118), and the observation of a disproportionate disruption of genes in the neuregulin and glutamate pathways in schizophrenia patients (119). Collectively, these results support a strong role for genes involved in glutamate signaling in mediating schizophrenia susceptibility.
The associations of NRG1 and ERBB4 with 5 and 8 endophenotypes, respectively, in this study adds to the growing literature of human molecular genetic studies implicating these genes to offer a compelling picture of the importance of neuregulin-mediated ErbB4 signaling in the pathophysiology of schizophrenia and its associated heritable deficits (83,120-122). The successful use of endophenotypes for schizophrenia in model organism studies provides additional support for the involvement of NRG1 in schizophrenia, as well as for this strategy of gene identification. For example, murine NRG1 hypomorphs show deficits in prepulse inhibition (120). Such deficits are well documented in schizophrenia patients (123-126) and were found to be associated with NRG1 in our analyses. Neuregulin-1 is a trophic factor that signals through the activation of the ErbB receptor tyrosine kinases like ErbB4. ErbB4 plays a crucial role in neurodevelopment and in the modulation of NMDA receptor signaling, processes often disturbed in schizophrenia (127-129). Neuregulin-mediated ErbB4 signaling has thus become an important pathway of consideration in schizophrenia research.
Custom SNP arrays, such as the COGS SNP Chip and the Addiction Array (130), have several advantages. They are affordable, flexible with regard to the inclusion of desired variants, focused by strong inference based candidate gene selection to achieve disease specificity (e.g., 131), and may be much more feasible for use with smaller, yet well defined, samples that are underpowered for genome-wide association studies. Although more comprehensive, genome-wide arrays are nonspecific with regard to disease, and may thus lack adequate representation of specific SNPs that have either been associated with or are thought to be of biological relevance to a particular disease. Some genes of interest, particularly smaller ones, may also be represented with insufficient coverage (e.g., SLC6A3) or not at all (e.g., (DRD4) on genome-wide arrays. Large-scale analyses of candidate genes via custom arrays may therefore provide a complementary strategy to genome-wide association studies for investigators interested in specific genes and SNPs relevant to a particular disorder. This new array can serve as a publicly available resource for other investigators studying schizophrenia and related phenotypes, with the flexibility for modification of the SNP list to optimize it for the particular focus of the research group.
The novel bootstrap-based Total Significance Test developed for this study clearly demonstrates the overall significance of the COGS SNP Chip and the associated endophenotypes. This Total Significance Test goes beyond current multiple testing methods in order to provide a collective test of significance for the strongest results in an entire data set (or, if desired, over an individual gene or pathway), as well as to address situations where simple permutation schemes are not available, such as for family data and confounders that, by assumption, are associated with both genotype and phenotype (e.g., population stratification). Furthermore, it allows for the assignment of meaningful posterior predictive values to individual test results in the context of multiple testing. Limitations of the Total Significance Test include its focus on the most significant test results, while ignoring the contribution of more modest association results. In addition, it was not practical, both in terms of software development and in terms of computer time, to embed the Merlin variance-component, pedigree-based analysis within the computationally complex Total Significance Test. The bootstrap for clustered data (i.e. family data) is a well-validated statistical tool for obtaining accurate significance levels in this situation and is expected to correctly calibrate the statistical inferences for this limitation, but a Total Significance Test that included within-family correlations in its statistical model might yield somewhat more efficient and powerful results than the present version.
Some caveats should be noted regarding this study. First, genetic analyses of schizophrenia are replete with failures to replicate previous findings (e.g., 132), despite the striking heritability of the disorder (2). Such failures are understandable in the context of ascertainment biases, population stratification, and cohort variance due to such factors as gender, smoking, treatment, and age of onset. Here, too, we have found no evidence for association to some prominent schizophrenia candidate genes, such as DAOA, DRD4, DTNBP1, PPP3CC, and RGS4 (6-7). However, we have found further evidence to support association to 25 genes with previous reports of association to schizophrenia, including several specific SNPs for which the effect was in the same direction as the previous report. Second, we note that the family ascertainment scheme described in this study focused on endophenotypes associated with schizophrenia and may thus be underpowered for detecting genetic variants associated with the disorder itself, and, as might be expected for a heterogeneous disorder like schizophrenia, not all individuals exhibit deficits across all of the endophenotypes studied. Additionally, antipsychotic medications may affect these results, although they tend to “normalize” endophenotypic scores, thus reducing, rather than increasing, the probability of significant associations. Although our sample is primarily (89%) of Caucasian ancestry, with most other subjects of partial Caucasian ancestry, we must also consider the possible confound of genetic admixture. We have used multidimensional scaling components as a measurement of ancestry to correct for this admixture in our analyses. We also note that allele frequencies from the Caucasian and African HapMap populations show an average difference of only 4% across our SNPs with p values <0.01 and 6% across SNPs with p values <10−3. Lastly, the degree of allelic, locus, and phenotypic heterogeneity associated with complex disorders now appears to be far more extensive than previously appreciated, with substantial contributions of rare de novo genetic variants, as well as epigenetic and environmental effects, none of which were assessed in this study (133).
Thus, we have observed many interesting associations between our endophenotypes and genes thought to be of biological relevance to schizophrenia. The observation of extensive pleiotropy for some genes and singular associations for others in our data suggest alternative, independent pathways mediating schizophrenia pathogenesis. Further analyses of the genes associated with each of the endophenotypes will likely provide information regarding the underlying genetic pathways involved in schizophrenia susceptibility, as well as information regarding the interaction between these endophenotypes within the disorder. The illumination of the genetic basis of schizophrenia offers the exciting possibilities of early detection of the disorder and the identification of novel pharmacologic targets to facilitate therapeutic intervention.
The authors wish to thank all of the participants and support staff that made this study possible and Daniel R. Weinberger, M.D., for providing information on some of the included genes.