|Home | About | Journals | Submit | Contact Us | Français|
Major depressive disorder (MDD) is a common complex trait with enormous public health significance. As part of the Genetic Association Information Network (GAIN) initiative of the US Foundation for the National Institutes of Health, we conducted a genomewide association study of 435,291 SNPs genotyped in 1,738 MDD cases and 1,802 controls selected to be at low liability for MDD. Eleven of the top 200 signals localized to a 167 kb region overlapping the gene piccolo (PCLO, whose protein product localizes to the cytomatrix of the presynaptic active zone and plays an important role in monoaminergic neurotransmission in the brain) with p-values of 7.7×10−7 for rs2715148 and 1.2×10−6 for rs2522833. We undertook replication of SNPs in this region in 5 independent samples (6,079 MDD independent cases and 5,893 controls) but no SNP exceeded the replication significance threshold when all replication samples were analyzed together. However, there was heterogeneity in the replication samples, and secondary analysis of the original sample with the sample of greatest similarity yielded p=6.4×10−8 for the non-synonymous SNP rs2522833 that gives rise to a serine to alanine substitution near a C2 calcium-binding-domain of the PCLO protein. With the integrated replication effort, we present a specific hypothesis for further studies.
The defining features of major depressive disorder (MDD) are marked and persistent dysphoria plus additional cognitive signs and symptoms (anhedonia, sleep disturbance, weight/appetite changes, motor agitation/retardation, anergia, excessive guilt or worthlessness, poor concentration or indecisiveness, and recurrent thoughts of death or suicide) (1). MDD is distinct from normal sadness by its persistence (i.e., ≥2 weeks), additional signs and symptoms, and substantial associated impairment. The definition of MDD excludes other conditions typified by substantial depressive symptoms (other psychiatric disorders, drug/alcohol dependence, and somatic diseases). The lifetime prevalence of MDD is ~15% (2-4) and is twofold higher in women (5) with a course typified by recurrence of illness (6). It is associated with considerable morbidity (7-9), excess mortality from suicide and other causes (10-13), and substantial direct and indirect costs (14). A WHO study projected MDD to be the second leading cause of disability worldwide by 2020 (15).
Although there is a considerable corpus of research on the epidemiology and biological correlates of MDD, little is known for certain about its etiology. An important etiological clue may be the familial tendency of MDD and its heritability of 31-42% (16). This clue led to a number of genomewide linkage studies (Supplemental Methods) and studies of >100 theoretical or positional candidate genes. As for the use of these study designs with other biomedical disorders, their application to MDD has not been as successful as had been hoped.
It is now clear that genomewide association studies (GWAS) can be a successful tool in the genetic dissection of complex biomedical disorders (17, 18). The goal of this report is to describe a GWAS for MDD that was systematically designed to remediate a set of methodological issues common to genetic studies of MDD. Examples of these issues include small sample sizes, inhomogeneous samples in terms of ancestry and phenotyping, convenience sampling, and controls that are unaffected but not at low liability for MDD. Moreover, large-scale replication was integral to our design.
This GWAS was one of the six initial Genetic Association Information Network (GAIN) studies sponsored by the Foundation for the NIH (19). Individual phenotype and genotype data are available to researchers via application to the dbGaP repository (20). We have attempted to follow published guidelines for GWAS (21, Box 1).
The parent projects that supplied subjects for this GWAS are longitudinal studies, the Netherlands Study of Depression and Anxiety (NESDA, http://www.nesda.nl) (22) and the Netherlands Twin Registry (NTR, http://www.tweelingenregister.org) (23). Sampling and data collection characteristics of the GAIN-MDD study have been described in detail elsewhere (24).
MDD cases were mainly from NESDA, a longitudinal cohort study designed to be representative of individuals with depressive and/or anxiety disorders. Recruitment of participants for NESDA took place from 09/2004-02/2007, and ascertainment was from outpatient specialist mental health facilities and via primary care screening. Additional cases were from the population-based cohorts NEMESIS (25), ARIADNE (26), and the NTR. Regardless of recruitment setting, similar inclusion and exclusion criteria were used to select MDD cases. Inclusion criteria were a lifetime diagnosis of DSM-IV MDD (1) as diagnosed via the Composite International Diagnostic Interview psychiatric interview (27), age 18-65 years, and self-reported western European ancestry. Persons who were not fluent in Dutch and those with a primary diagnosis of schizophrenia or schizoaffective disorder, obsessive compulsive disorder, bipolar disorder, or severe substance use dependence were excluded (the etiology of MDD in these subjects may be distinct). The 1,862 cases included in GAIN were recruited from mental health care organizations (N=785), primary care (N=603), and community samples (NEMESIS N=218, ARIADNE N=96 and NTR N=160).
Control subjects were mainly from the NTR, which has collected longitudinal data from twins and their families since 1991 (total cohort of ~22,000 participants from 5,546 families). The majority of families were recruited when the twins were adolescents or young adults through city council registrations along with alternative efforts to recruit older twins. Longitudinal phenotyping includes assessment of depressive symptoms (via multiple instruments), anxiety, neuroticism, and other personality measures. Inclusion required availability of both survey data and biological samples, no report of MDD at any measurement occasion, and low genetic liability for MDD. No report of MDD was determined by specific queries about medication use or whether the subject had ever sought treatment for depression symptoms and/or via the CIDI interview. Low genetic liability for MDD was determined by the use of a factor score derived from longitudinal measures of neuroticism, anxiety, and depressive symptoms (28) (mean 0, std 0.7); controls were required never to have scored highly (≥0.65) on this factor score. Finally, controls and their parents were required to have been born in the Netherlands or western Europe. Only one control per family was selected. There were N=1,703 controls from the NTR and additional controls from NESDA (N=133 from general practice, N=24 from ARIADNE). NESDA controls had no lifetime diagnosis of MDD or an anxiety disorder as assessed by the CIDI and reported low depressive symptoms at baseline (K-10 score <16 and Inventory of Depressive Symptoms score <4) (29, 30).
If there were multiple eligible NTR controls in a family, we first matched on sex and age, and used the highest number of completed questionnaires as an additional criterion. Again, only one control per family was included.
Before the start of the NESDA and NTR biological sample collection, processing, and storage protocols were harmonized and DNA extraction was conducted concurrently in the same laboratory. For NESDA, blood sampling for the NESDA participants took place during the baseline visit (between 0830-0930) and DNA was isolated using the FlexiGene DNA AGF3000 kit (Qiagen, Valencia, CA, USA) on an AutoGenFlex 3000 workstation (Autogen, Holliston, MA, USA). For NTR, biological samples were taken in the subject's home (between 0700-1000) and DNA was extracted using the Puregene DNA isolation kit (Gentra, Minneapolis, MN, USA) for frozen whole blood samples. DNA concentrations were determined using the PicoGreen dsDNA Quantitation kit (Invitrogen Corporation, Carlsbad, CA, USA). All procedures were performed according to the manufacturer's protocols.
The NESDA and NTR studies were approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Center, Amsterdam, an Institutional Review Board certified by the US Office of Human Research Protections (IRB number IRB-2991 under Federal wide Assurance-3703; IRB/institute codes, NESDA 03-183; NTR 03-180). All subjects provided written informed consent. As part of the GAIN application process, consent forms were specifically re-reviewed for suitability for the deposit of de-identified phenotype and genotype data into the controlled-access dbGaP repository (20). NESDA and NTR subjects were informed of participation in GAIN via newsletters. Only 22 NESDA respondents refused informed consent for genetic research (1.7% of all respondents approached).
Individual genotyping was conducted by Perlegen Sciences (Mountain View, CA, USA) using a set of four proprietary, high-density oligonucleotide arrays. The SNPs on these arrays were selected to tag common variation in the HapMap European and Asian panels using previously described genotype data (31), tagging approach (32), and methodology (33). At the beginning of GAIN, all HapMap (34) samples were genotyped with the Perlegen GWAS platform. Independent review of these data by the GAIN analysis group (19) showed 99.8% agreement with prior HapMap genotypes and the mean maximum r2 between the Perlegen SNPs and HapMap phase II SNPs (31) was 0.89 for single and 0.96 for multi-marker analyses. The genotyping procedures and genotyping calling algorithms are described in the Supplemental Methods and in prior reports (35, 36). Briefly, 40 × 96-well plates were sent to Perlegen for GWAS genotyping. Genotyping was conducted blind to case-control status. Cases and controls were randomly allocated to plates and to positions within plates. Each plate contained DNA samples from 93 Dutch subjects plus 3 quality control samples. The three quality control samples included: two parents of one control on that plate (40 complete trios in total); and half the plates contained the same HapMap CEU sample (used for quality control in all GAIN projects) and half had a randomly-selected duplicate case sample. The total number of samples was 3,840 (=40 plates × 96 samples per plate) or 1,860 cases + 1,860 controls + 80 parents + 20 duplicate samples + 20 HapMap samples.
Of the 3,820 Dutch samples sent to Perlegen (excluding the 20 HapMap internal control samples), genotypes were delivered for 3,761 samples. Fifty-nine samples did not have GWAS data: 39 samples with uncertain linkage between genotype and phenotype records, 7 samples with evidence of contamination, 6 samples that failed genotyping, and 7 miscellaneous failures (2 of these were excluded as chrX and chrY genotyping data were consistent with the presence of XO and XXY sex chromosome status). After further analysis, 8 subjects were removed for excessive missing genotype data (>25%), 1 case for high genomewide homozygosity (~75%), 38 subjects whose genomewide IBS estimates were consistent with first- or second-degree relationships, and 57 additional subjects whose ancestry diverged from the remainder of the sample (see Supplemental Methods for details). After these exclusions (N=104) and removing duplicated and trio quality control sample, there were 3,540 subjects in the final analysis dataset including 1,738 cases and 1,802 controls. The principal reason for fewer cases than controls was the higher prevalence of substantial non-European ancestry. The list of subjects in the final analyses dataset is included as a Supplemental File (“mddC.fam”).
The unfiltered dataset obtained from dbGaP contained 599,156 unique SNPs. The Perlegen genotyping algorithm yielded a quality score for each individual genotype, and a more stringent quality score cutoff (≥ 10) than that used by Perlegen was applied. The SNP quality control process is described in detail in the Supplemental Methods. Briefly, to be included in the final analysis dataset, SNPs were required not to have any of the following features: gross mapping problem (37), ≥2 genotype disagreements in 40 duplicated samples, ≥2 Mendelian inheritance errors in 38 complete trio samples, minor allele frequency <0.01, or >0.05 missing genotypes in either cases or controls. A Hardy-Weinberg filter was not used as lack of fit to Hardy-Weinberg expectations can occur for valid reasons (e.g., a true association) (38) and given that 95.6% (=51,592/53,994) of SNPs with p<0.00001 from an exact test of Hardy-Weinberg equilibrium (39) in controls were already flagged for exclusion. A total of 435,291 SNPs met these criteria and were included in the final analysis dataset (included as a Supplemental File, “mddC.bim”). Additional quality control checks are described in the Supplemental Methods). Thirteen controls were genotyped in a different study using the Illumina 317K platform and, of the 82,636 SNPs common to both platforms, the genotype agreement was 99.94%.
There were three classes of SNPs – those that could be heterozygous in all subjects (chr1-22 and chrX/PAR1), those that were heterozygous in females (non-PAR chrX), and those that were hemizygous in males (non-PAR chrX and chrY). All SNPs that passed quality control checks were tested for association with MDD using 1 df Cochran-Armitage trend tests. For complex traits, it is widely believed that the contributions of individual SNPs to disease risk are often roughly additive (40) . The Cochran-Armitage trend test can be used to detect such effects. This test is usually recommended due to its robustness to the violation of the HWE assumption (41): P-values from females and males for non-PAR chrX were combined using Fisher's method (42).
Population stratification artifacts were assessed in two ways. As described elsewhere (36), including principal components as covariates in a logistic regression model can robustly control stratification effects. To do this, we identified a set of 127,688 SNPs in linkage equilibrium (43) and used the “smartpca” program in EigenSoft (44) to compute 10 principal components for each subject that were included as covariates in logistic regression models (case/control status ~ SNPi + PC1 + PC2 + … + PC10). We also used a stratified Cochran-Mantel-Haenszel test in PLINK (43) as a complementary approach.
For noteworthy associations, there were additional checks to ensure that an association was not due to experimental bias. These checks included: manual inspection of SNP cluster plots to ensure reasonable performance of the genotyping calling algorithm; evaluation of conformation to Hardy-Weinberg equilibrium in controls, cases, and overall (discussed in the Supplemental Methods); the checks for population stratification described above; evaluation of plate-specific association results to ensure that the overall association was not driven by one or a few plates; comparison of control MAFs to the HapMap EUR panel; and evaluation of the characteristics of a SNP in high linkage disequilibrium (“proxy association”) as a similar association with such a SNP decreases the chance of some forms of method artifacts.
Given the 105-107 statistical comparisons in a GWAS, small p-values are expected by chance. To control the risk of false discoveries, q-values (45, 46) were computed for all p-values for single-marker tests of association. A q-value is an estimate of the proportion of false discoveries among all significant markers, or the false discovery rate (FDR) for the corresponding p-value. The use of q-values is preferable to more traditional multiple testing controls because q-values provide a better balance between the competing goals of finding true positives versus controlling false discoveries, allow more similar comparisons across studies because proportions of false discoveries are much less dependent on the number of tests conducted and are relatively robust against the effects of correlated tests (45, 47-54). The q-value threshold for declaring significance was 0.10 (i.e., the top 10% of the significant findings are, on average, allowed to be false discoveries) (50, 55). FDR thresholds <0.10 result in a disproportionate drop in power to detect true effects.
We used two imputation approaches, the SNPMStat method of Lin et al. (56) to impute 246 additional SNPs in the PCLO region and Abecasis' MACH (v1) to impute 2,037,829 autosomal SNPs with R2 ≥ 0.5 (a cutoff that removes ~90% of SNPs with unreliable imputation results while sacrificing 2-3% of reliably imputed SNPs). Both SNPMStat and MACH gave similar results in the PCLO region. Imputed genotypes were used in secondary analyses. The HapMap2 EUR panel (31, 34) was used as reference.
Quanto (57, 58) was used to approximate statistical power given the following assumptions: two-tailed α=1×10−7 (=0.05/500,000), 1,738 cases and 1,802 controls, lifetime morbid risk of MDD of 0.15, and a log additive genetic model. For statistical power of 0.80 (β=0.20), the minimum detectable genotypic relative risks are 1.59, 1.40, and 1.35 for minor allele frequencies of 0.10, 0.25, and 0.40.
PLINK (v1.0) (43), SAS (v9.1.3) (59), R (v2.6.1) (60), HAPSTAT (v3) (61-63), MACH1, SNPMStat (56), HaploView (64), and JMP (v6) (65) were used for data management, quality control, statistical analyses, and graphics.
All genomic locations are per NCBI Build 35 (66) (UCSC hg17) (67). Pseudo-autosomal region 1 (PAR1) is assumed to be located on chrX:1-2,692,881 and chrY:1-2,692,881 and PAR2 on chrX:154,494,747-154,824,264 and chrY:57,372,174-57,701,691 (68). SNP annotations were per TAMAL (37) based chiefly on UCSC genome browser files (67), HapMap (34), and dbSNP (66).
Table 1 presents descriptive data for cases and controls. Controls had a higher proportion of males and were slightly older (and thus were farther through the period of risk for MDD). Consistent with known correlates of MDD, cases had a significantly lower educational level, less often had a partner, were more often smokers, and scored much higher on the NEO-FFI neuroticism scale.
The analysis SNP set had 435,291 SNPs including 427,049 autosomal SNPs, 7.988 SNPs on the non-PAR portions of chrX, 239 SNPs on chrXY/PAR1, 15 SNPs on chrY, and 0 SNPs on PAR2. The median SNP missingness was 0.00339 (inter-quartile range 0.00113-0.0105) and the median minor allele frequency was 0.2422 (inter-quartile range 0.1375-0.3646) with similar estimates in cases and controls. The average marker density over the genome was 1 SNP every 7,069 bases (=3,077,088,087 bases / 435,291 SNPs). The median inter-marker distance was 2,911 bases with interquartile range 966-7,374 bases and a 99th percentile of 50.1 kb.
We used the Cochran-Armitage trend test to test for association of the 435,291 SNPs in the GWAS dataset with case/control status. The estimated lambda (69) was 1.046 (similar p-value minima and lambdas were obtained using logistic regression with 10 principal components and using a stratified Cochran-Mantel-Haenszel tests based on identity-by-state clusters) (43, 44). The minimum q-value was 0.28 (i.e., if these tests were called significant, over the long-term, a minimum false discovery rate of ~28% would be incurred). As the pre-specified q-value threshold was 0.10, no SNP reached genomewide significance. The proportion of all SNPs without true effects (p0) (54) was conservatively estimated to be p0=0.9999954, consistent with the presence of ~2 SNPs with true effects in these GWAS data. The results of this MDD GWAS are included as a Supplemental File (“results.txt”) to facilitate comparisons with other studies.
Panel a of Figure 1 depicts the quantile-quantile plots (40) for these analyses. The observed p-values do not strongly depart from the p-value distribution expected by chance. Panel b of Figure 1 shows a plot of –log10(ptrend) by genomic location.
Table 2 presents the findings for the top 25 SNPs. The quality control metrics – SNP missingness, agreement with HWE, and similarity of the control MAFs to the HapMap EUR panel – for the top 25 SNPs are generally acceptable. Four of the top 25 associations are in the presynaptic cytomatrix protein piccolo (PCLO). Table 3 depicts the top 25 multi-SNP clusters (i.e., for an index SNP with association p<0.001, these clusters are additional SNPs within 250 kb of the index SNP with with r2 ≥ 0.50). The full version of this table is included as a Supplemental File (“Table3_full.xls”). PCLO is present in the top 25 clusters along with two additional multi-SNP clusters in the top 200. Other notable SNP clusters occurred in GRM7 (rank 51), DGKH (rank 83, a candidate gene for bipolar disorder) (70), DAOA (rank 124), and DRD2 (rank 226).
Although no association met genomewide significance, there were clusters of SNPs in PCLO (Figure 2). Notably, 11 of the 200 smallest p-values localized to a 167 kb segment overlapping PCLO. Interest in PCLO was increased given its expression in brain, localization to the presynaptic active zone (71), and involvement in monoamine neurotransmission, a venerable hypothesis of the etiology of MDD (72). Moreover, the third most significant SNP (rs2522833) codes for a non-synonymous amino acid change (ala-4814-ser) in PCLO near its C2A calcium binding domain (73).
We investigated possible causes of spurious associations in the PCLO region (chr7:82,032,093-82,436,848). First, these findings were not due to plate effects as inspection of plate-specific association data for these SNPs did not show any marked outliers or systematic biases. Second, review of allelic intensity cluster plots on which genotype calls were based revealed adequate performance of the Perlegen genotype calling algorithm. Third, inspection of additional quality control metrics did not suggest systematic problems with SNPs in this region. Fourth, inspection of LD matrices excluded very high LD as the sole explanation for the results (Supplemental Figure 10), and none of the genotyped SNPs had strong LD (r2 ≥ 0.8) with rs2715148 (the SNP with the smallest p-value in the PCLO region). Fifth, population stratification can cause false positive findings but this did not appear to explain the PCLO association: (a) the same 11 SNPs had p-values among the top 200 associations in unadjusted analyses as well as with adjustment via principal components and stratified analyses; and (b) for the 57 SNPs in the PCLO region, the p-values across these three types of analyses were consistent (the Spearman correlations between p-values from trend tests, logistic regression, and stratified analyses were all >0.962). Sixth, the minor allele frequencies in the control group in the PCLO region were usually quite similar to available EUR control groups suggesting that the PCLO findings were not due to an artifact of the control selection process (see below). Finally, bioinformatic investigation did not suggest that this is a problematic region to genotype as the PCLO region is not known to be under positive selection in humans (74), to contain segmental duplications (67), or common copy number variants (search of the Database of Genomic Variants yielded two rare CNVs with control frequencies of 0.12% and 0.89%) (75, 76 , 77).
We conducted additional analyses to attempt to localize the association depicted in Figure 2. Imputation (56) supported the directly typed SNP associations but did not yield an association p-value markedly more significant than any directly genotyped SNP (although 22 of the 25 most significant imputed associations in the genome were in this region). Haplotype analysis using 3-SNP sliding windows did not improve localization. Secondary analyses by sex, case ascertainment setting, and recurrent early-onset MDD (reoMDD, arguably the most heritable form of MDD) (16, 78) suggested that most of the signal was from females and from subjects with reoMDD (Supplemental Table 11). The findings for reoMDD were often stronger than the primary analyses, particularly for the most significant SNP (rs2715148) where the p-value decreased by 1.2 orders of magnitude to 9.5×10−8.
Although no finding met genomewide significance, the presence of multiple possible signals in PCLO and the plausibility of a role for PCLO in the etiology of MDD led us to attempt replication in external samples. We assembled a collection of 11,972 independent subjects (6,079 MDD cases and 5,893 controls) from seven different groups and a total of six case-control replication samples (two German samples were combined, see Supplemental Methods). As with NESDA cases, all replication cases were adults of European ancestry on whom a structured clinical interview was used to substantiate the lifetime diagnosis of DSM-IV MDD (1), and all studies excluded common MDD phenocopies (e.g., depressive symptoms due to another psychiatric disorder or a general medical condition). As with NTR controls, all replication controls were adults of European ancestry ascertained from the population, and individuals reporting MDD symptoms were excluded. We estimated statistical power using Quanto (57) (assumptions: log-additive genetic model, MDD lifetime risk 0.15, MAF=0.45 (similar to rs2522833), a genotypic relative risk of 1.14 (“shrunk” down from the observed GRR of 1.26 for rs2522833 to account for the “Winner's Curse” phenomenon) (79), and a conservative two-tailed type 1 error rate of 0.00167 (=0.05/30 replication SNPs). Statistical power was 97.2% for replication for the two SNPs genotyped in all samples (N=11,972) and 90.4% for the remaining SNPs (N=9,278). Five replication samples were genotyped for 30 SNPs using the same Sequenom iPlex SNP pool (15 SNPs were in the primary GWAS and 15 were selected to tag common variation in Europeans) (80) and one sample was successfully genotyped for two SNPs using TaqMan. The SNP selection strategy effectively cast a broad net over the region showing association in Figure 2. For the NESDA/NTR samples, agreement between the initial Perlegen genotypes in this region and independent re-genotyping was high (0.9987).
The single SNP results for MDD are depicted in Figure 3 and Table 4a. Our analytic plan dictated the combined analysis of all replication samples with the use of a one-tailed directional test. No association in the replication sample reached statistical significance after correction for multiple comparisons and SNP non-independence due to LD (ninth column in Table 4a). Similarly, haplotype analyses did not reveal significantly associated regions (Supplemental Figure 16). There were four p-values <0.05 in the replication sample but only rs10954694 also had Z-scores of the same sign in both samples. Table 4b shows the results for reoMDD, and no single SNP was significant after correction for multiple comparisons. When we repeated the MDD analyses restricted to female subjects, the observed significance levels did not become markedly stronger in any of the replication samples in contrast to the initial NESDA/NTR sample. Thus, results from analyses of all replication samples did not reach the a priori criterion for replication evidence for the involvement of PCLO in the etiology of MDD.
However, we observed, a posteriori, that there was potentially important heterogeneity in the replication samples for eight SNPs that were strongly associated in the original sample (I2≥0.4, ninth column in Table 4a). In investigating this further (Supplemental Methods), we determined that there was little evidence for genetic heterogeneity in the genotyped region for controls but, unexpectedly, there was significant heterogeneity in the cases. Principal components analysis and inspection of Table 4a and the forest plots in Figure 3 indicated that the outlier was the Australian QIMR sample. Notably, the original and QIMR samples were particularly similar in that both studies included population-based cases and controls were selected to be at low liability for MDD based on longitudinal assessments. Of the nine SNPs with p<0.05 in the QIMR sample, eight had both low p-values and Z-scores with the same sign as in the NESDA/NTR sample. As an exploratory analysis, we analyzed the original and QIMR samples jointly, and the minimum p-value was 6.4×10−8 at the non-synonymous SNP rs2522833 that gives rise to a serine to alanine substitution near the C2A calcium-binding-domain of the PCLO protein.
We conducted additional analyses of the NESDA/NTR GWAS dataset that were specified a priori but which should be considered exploratory.
(a) The network of proteins with which PCLO interacts in its role at the presynaptic cytoskeletal matrix is relatively well-characterized, and we reasoned that genes encoding these proteins might harbor risk or protective variants. We assessed this hypothesis by testing for association conditioning on the PCLO nsSNP rs2522833 (i.e., investigating whether controlling statistically for the effect of rs2522833 increases the salience of other SNP associations), assessing the minimum p-value per gene, and then comparing this list to a list of 54 genes that make proteins that interact with PCLO. This analysis did not reveal any SNPs or genes whose significance was markedly lower than without including rs2522833 in the logistic regression model. Moreover, no known PCLO interacting protein was notable on this list.
(b) We imputed genotypes for 2,037,829 autosomal SNPs using MACH with reference to HapMap CEU genotypes. The resulting lambda was 1.048, and the minimum p-value was 1.21×10−7. As noted above, 22 of the 25 most significant imputed associations were in the PCLO region. Investigation of SNP clustering that accounted for LD yielded results similar to those shown in Table 3.
(c) We assembled a list of 103 candidate genes that had been studied for association with MDD in the literature (81). Nineteen of these genes had no SNPs within its transcript and another 9 genes had inadequate coverage (>1 SNP/15 kb; Supplemental Table 17). Of the remaining 75 genes, only NOS1 (neuronal nitric oxide synthase, p=0.0006) had p<0.001. However, NOS1 (as with most genes in Supplemental Table 16) is quite large and chance is a prominent potential influence on these results.
(d) We compared the GWAS association results to a meta-analysis of gene expression data from 12 studies of post-mortem brain tissue in MDD cases compared with controls (10 frontal cortex and 2 cerebellum studies). These data are available via the Stanley Foundation (http://www.stanleygenomic.org). There were five genes with GWAS p<0.05 (all had gene expression changes significant at p 0.0004 – 0.007). The genes were: SGCG (sarcoglycan), CALD1 (caldesmon 1), EEF1A1 (eukaryotic translation elongation factor 1 alpha 1), CFLAR (CASP8 and FADD-like apoptosis regulator), and TP73L (tumor protein p73-like). There is no overlap of this list with the PCLO interactors or MDD candidate genes from the literature.
(d) Alternative models, filters, and phenotypes. (i) For reoMDD, the minimum p-value over all GWAS SNPs was at the PCLO region SNP rs2715148 (8.4×10−8) which ranked second of all SNPs using the trend test (Table 2). (ii) rs2715148 also had the smallest p-value under a dominant model of SNP action (6.2×10−6). (iii) Given the female predominance in MDD, we analyzed data from females and males separately. For female cases and controls, rs2715148 had the smallest p-value (4.0×10−7) and multiple other PCLO SNPs had p-values in the 10−5 – 10−6 range. For males, most PCLO SNPs had p > 0.05 and the minimum was in the SLC9A9 SNP rs4839627 (9.1×10−7). (iv) Again, given sex differences in MDD prevalence, we investigated SNPs on chrX and chrY more closely. The minimum p-value in chrX pseudo-autosomal region 1 was 0.02. For the non-PAR regions of chrX in females, the SNPs with the smallest p-values were rs11094388 (p=0.0003, intergenic), rs5971108 (p=0.0003, PTCHD1), rs5930667 (p=0.0004, intergenic), rs4618863 (p=0.0005, intergenic), rs2207796 (p=0.0005, in the very large gene DMD), and rs5936428 (p=0.0009, FMR2). For males, the minimum p-value on chrX was at rs10521594 (p=5.4×10−5, intergenic) and 0.22 on chrY.
Major depressive disorder (MDD) is a common complex trait of enormous public health significance. As part of the GAIN initiative of the US Foundation for the NIH (19), we conducted a genomewide association study of 435,291 SNPs genotyped in 1,738 MDD cases and 1,802 controls selected to be at low liability for MDD. Our study had numerous positive attributes including its historically large sample size, its largely population-based and longitudinal design, and relatively unbiased and dense genomewide genotyping designed to capture common variation in subjects of European ancestry.
According to our primary analysis plan, no SNP-MDD phenotype association reached genomewide significance as the minimum q-value was 0.28, greater than the pre-defined q-value threshold of 0.10. This result was not unexpected. For example, type 2 diabetes mellitus has arguably reaped the greatest harvest from GWAS (82) and yet two of the initial T2DM GWAS were unremarkable when analyzed independently (83, 84). One of the key lessons of the GWAS era is the importance of meta-analysis where its application to the primary GWAS can uncover positive findings that replicate well across studies (18, 85).
Although no locus exceeded the genomewide threshold after correction for multiple comparisons, 11 of the top 200 signals localized to a 167 kb region overlapping the gene piccolo (PCLO). The protein product of PCLO localizes to the presynaptic active zone and plays an important role in brain monoaminergic neurotransmission (86), clearly intersecting with a venerable hypothesis of the etiology of mood disorders (87). Moreover, the third most significant association was a common non-synonymous SNP near its critical C2A binding domain in PCLO (88, 89). Although it is an obvious candidate gene, we are not aware of any prior association studies of PCLO and mood disorders (PCLO is in a region of 7q implicated by linkage in autism and one autism association study has been published) (90).
We judged the intersection of this GWAS result with prior knowledge sufficient to trigger a large-scale replication effort by genotyping PCLO SNPs in 6,079 MDD independent cases and 5,893 controls. Statistical power to replicate exceeded 90% even after accounting for (79) the “Winner's Curse” phenomenon (a form of regression to the mean whereby the true genotypic relative risk is over-estimated in the initial study) (91, 92). However, in spite of the apparent a priori strength of a hypothesis of genetic variation in PCLO in the etiology of MDD, no SNP analyzed in the replication sample met appropriately rigorous criteria for replication (21). Therefore, unlike GWAS for many non-psychiatric biomedical disorders, our GWAS and replication efforts did not yield “proof beyond a reasonable doubt” level of evidence for an association between genetic variation in PCLO and MDD.
Investigation of the sources of heterogeneity in the replication samples indicated that controls were genetically similar to the original sample in the PCLO region but that cases were dissimilar. We observed, a posteriori, that both principal components derived from PCLO region genotypes in QIMR cases and effect size estimates in the QIMR replication sample tended to be similar to the original sample. This is notable because, of all the replication samples, ascertainment of QIMR subjects was most similar to the primary NESDA/NTR sample in that cases were identified from population-based sources (100% for QIMR and 60% for NESDA) rather than tertiary sources as for the other replication samples. MDD cases from clinical samples may differ from population-based cases due to selection bias (93), Berkson's bias (94, 95), differing referral filters (96), or even a different genetic basis (97) with respect to genetic variation in the PCLO region.
Joint analysis of the NESDA/NTR and QIMR samples yielded p=6.4x10-8 (uncorrected for multiple hypothesis testing) for the non-synonymous SNP rs2522833. This result suggests a specific hypothesis for future studies: an association between genetic variation in PCLO and MDD may be detected only in population-based cases. Thus, it would be premature to exclude PCLO from a role in the etiology of some forms of MDD.
Interpretation of the PCLO replication efforts is consistent with two broad possibilities. The first possibility is that genetic variation in PCLO is truly not associated with MDD. This interpretation is supported by the replication analyses (specified a priori) in which no SNP was significantly associated after correction for multiple comparisons and SNP dependence due to LD. This strict interpretation is generally viewed as “best practice” in human genetics (21) but implicitly assumes etiological homogeneity for MDD in the PCLO region. The second possibility invokes a less parsimonious model involving heterogeneity, that genetic variation in PCLO is etiologically causal to some subtypes of MDD. This interpretation is an a posteriori hypothesis consistent with the empirical results particularly in the notable differences in associations between samples, case ascertainment strategies, and indications from principal components analysis that NESDA and QIMR cases are more similar than the clinically ascertained subjects. It is notable that the control samples from each site were considerably more similar than cases from the same sites.
The tension between null a priori results and plausible a posteriori hypotheses is a core issue in psychiatric genetics. Important phenotypes like MDD are defined reliably and with reference to diagnostic schema developed principally for clinical purposes. Heterogeneous etiology of MDD is widely suspected but there are no proven ways to index heterogeneity (indeed, a prominent rationale for genetics studies is improve differential diagnosis).
Our results are consistent with prior observations of the heterogeneous nature of MDD, particularly with regard to ascertainment. Individuals who meet MDD criteria from community or primary care sources may have a more inclusive and less comorbid form of MDD (98) whereas tertiary ascertainment may yield subjects with greater comorbidity and perhaps distinctive etiology (99). In particular, it is formally possible (but unproven) that the PCLO results are accurate – genetic variation in PCLO might be causal to the types of MDD seen in community samples but other loci contribute to a distinctive type of MDD seen in tertiary care samples.
There were two MDD cases who may have had unrecognized genomic disorders (100) (possible Turner's and Klinefelter's syndromes). We speculate that small numbers of cases with MDD will have CNV-related genomic disorders that are plausibly causal to MDD. Clarification of the role of such rare variants will require larger samples.
Most of the additional exploratory analyses were unrevealing, including examination of proteins known to interact with PCLO, genotype imputation, comparison of GWAS findings with MDD candidate genes from the literature and gene expression changes in the brain in cases with MDD, and alternative genetic models, phenotype definitions, and sex-specific analyses.
We searched the SLEP compendium of psychiatric genetics findings (101) to attempt to discover overlap of our findings with those reported in the literature. First, with reference to a meta-analysis of microarray studies on the Stanley brain bank MDD and control samples, expression of CFLAR and MARCH3 were increased and LST1 and HLA-B were decreased in MDD post-mortem frontal cortex. These regions ranked 9, 232, 267, and 432 in the NESDA/NTR GWAS. Second, we looked for convergence of our findings with other GWAS of psychiatric disorders. Notable genomic locations of overlap of the top 480 regions in the present GWAS were found with GWAS for ADHD (ITIH1, S Faraone, personal communication), the WTCCC GWAS for bipolar disorder (SHFM1 and UGT2B4) (102), and a bipolar GWAS that used DNA pooling (GRM7 and DGKH) (70). Third, we looked at the minimum p-values in our study for genes that met or nearly achieved genomewide significance: the minimum p-values in our study for MAMDC1 (103) was 0.004, 0.03 for ZNF804A (104), 0.002 in ANK3 (105), and 0.03 in CACNA1C (105). These overlaps are intriguing (although the play of chance cannot be excluded), and will be formally investigated as part of our participation in the Psychiatric GWAS Consortium analyses (18).
(a) Although statistical power has been systematically underestimated in psychiatric genetics, when we began this study in Q3 2006, it was believed that statistical power would be reasonable to detect realistic genetic effects. However, the definition of “realistic” has shifted considerably since 2006 and it may be important to design studies that can detect genotypic relative risks <1.10. (b) When this study began, the coverage and performance of the Perlegen GWAS platform were among the better options available (19). The technology and pricing have evolved rapidly and superior platforms are now available. A key limitation of the Perlegen platform is its inability to assess copy number variation (106) that may be particularly salient for psychiatric disorders (107, 108). More generally, the GWAS platform might not be sufficiently “genomewide” and unbiased: the platform may have had inadequate coverage in an etiologically important region of the genome, SNPs are only one type of genetic variation, and important non-SNP genetic variation might not have been sufficiently well captured. (c) There was an imbalance in the proportion of males in cases and controls. Although it is unclear whether and how this might bias the results, it may have lead to some degree of bias. (d) Finally, GWAS are predicated upon the crucial assumption that the predominant diagnostic criteria are valid with respect to the fundamental architecture of the disorder.
We describe here a large effort to identify DNA sequence variation fundamental to MDD. Although our initial GWAS results for the PCLO region were intriguing, this highly plausible hypothesis did not find support in a large-scale replication attempt. Our hypothesis about a role of genetic variation in PCLO for MDD in population but not clinical settings emphasizes the importance of knowing the epidemiological sampling frame for a study. Finally, we hope that the model we used in this study – a cooperative international effort – will be adopted by groups studying other psychiatric disorders in order to maximize progress.
We acknowledge support from NWO: genetic basis of anxiety and depression (904-61-090); resolving cause and effect in the association between exercise and well-being (904-61-193); twin-family database for behavior genomics studies (480-04-004); twin research focusing on behavior (400-05-717), Center for Medical Systems Biology (NWO Genomics); Spinozapremie (SPI 56-464-14192); Centre for Neurogenomics and Cognitive Research (CNCR-VU); genomewide analyses of European twin and population cohorts (EU/QLRT-2001-01254); genome scan for neuroticism (NIMH R01 MH059160); Geestkracht program of ZonMW (10-000-1002); matching funds from universities and mental health care institutes involved in NESDA (GGZ Buitenamstel-Geestgronden, Rivierduinen, University Medical Center Groningen, GGZ Lentis, GGZ Friesland, GGZ Drenthe). Genotyping was funded by the Genetic Association Information Network (GAIN) of the Foundation for the US National Institutes of Health, and analysis was supported by grants from GAIN and the NIMH (MH081802). Genotype data were obtained from dbGaP (http://www.ncbi.nlm.nih.gov/dbgap, accession number phs000020.v1.p1). Statistical analyses were carried out on the Genetic Cluster Computer (http://www.geneticcluster.org) which is financially supported by the NWO (480-05-003). Dr. Sullivan was also supported by R01s MH074027 and MH077139. Dr. Schosser was supported by an Austrian Science Fund Erwin-Schrödinger-Fellowship. We would like to express our thanks to: the GAIN Genotyping group (Dr. Gonçalo Abecasis, chair) for help with quality control; Drs. Gonçalo Abecasis and Jun Li for assistance with MACH; Dr. Shaun Purcell for PLINK; Troy Dumenil (QIMR) for expert assistance with the replication genotyping; Dr. Dina Ruano (Portuguese Foundation for Science and Technology, SFRH/BPD/28725/2006); and Drs. Pam Madden (DA012854) and Richard Todd (AA013320) for supplying some of the phenotypes used in the Australian sample. Replication genotyping of the STAR*D samples was supported by a grant from the Bowman Family Foundation and the Sidney R. Baer, Jr. Foundation. We gratefully acknowledge NARSAD for funding the PCLO follow-up genotyping.
Financial Disclosures (Prior 3 Years)
Dr. Baune has received honoraria for educational training of psychiatrists and general practitioners from Lundbeck, Astra Zeneca and Pfizer Pharmaceuticals and travel grants from Astra Zeneca, Bristol-Meyer Squibb, Janssen and Pfizer Pharmaceuticals. Dr. Fava has received: research support from Abbott Laboratories, Alkermes, Aspect Medical Systems, Astra-Zeneca, Bristol-Myers Squibb Company, Cephalon, Eli Lilly & Company, Forest Pharmaceuticals Inc., GlaxoSmithkline, J & J Pharmaceuticals, Lichtwer Pharma GmbH, Lorex Pharmaceuticals, Novartis, Organon Inc., PamLab, LLC, Pfizer Inc, Pharmavite, Roche, Sanofi-Aventis, Solvay Pharmaceuticals, Inc., Synthelabo, Wyeth-Ayerst Laboratories; advisory/consulting fees from Abbott Laboratories, Amarin, Aspect Medical Systems, Astra-Zeneca, Auspex Pharmaceuticals, Bayer AG, Best Practice Project Management, Inc., Biovail Pharmaceuticals, Inc., BrainCells, Inc. Bristol-Myers Squibb Company, Cephalon, CNS Response, Compellis, Cypress Pharmaceuticals, Dov Pharmaceuticals, Eli Lilly & Company, EPIX Pharmaceuticals, Fabre-Kramer Pharmaceuticals, Inc., Forest Pharmaceuticals Inc., GlaxoSmithkline, Grunenthal GmBH, Janssen Pharmaceutica, Jazz Pharmaceuticals, J & J Pharmaceuticals, Knoll Pharmaceutical Company, Lorex Pharmaceuticals, Lundbeck, MedAvante, Inc., Merck, Neuronetics, Novartis, Nutrition 21, Organon Inc., PamLab, LLC, Pfizer Inc, PharmaStar, Pharmavite, Precision Human Biolaboratory, Roche, Sanofi-Aventis, Sepracor, Solvay Pharmaceuticals, Inc., Somaxon, Somerset Pharmaceuticals, Synthelabo, Takeda, Tetragenex, Transcept Pharmaceuticals, Vanda Pharmaceuticals Inc, Wyeth-Ayerst Laboratories; speaking fees from Astra-Zeneca, Boehringer-Ingelheim, Bristol-Myers Squibb Company, Cephalon, Eli Lilly & Company, Forest Pharmaceuticals Inc., GlaxoSmithkline, Novartis, Organon Inc., Pfizer Inc, PharmaStar, Primedia, Reed-Elsevier, Wyeth-Ayerst Laboratories; has equity holdings inCompellis, MedAvante; and has royalty/patent, other income for patent applications for SPCD and for a combination of azapirones and bupropion in MDD, copyright royalties for the MGH CPFQ, DESS, and SAFER. Dr. Nolen has received: speaking fees from Astra Zeneca, Eli Lilly, Pfizer, Servier, Wyeth; unrestricted research funding from Astra Zeneca, Eli Lilly, GlaxoSmithKline, Wyeth; and served on advisory boards for Astra Zeneca, Cyberonics, Eli Lilly, GlaxoSmithKline, Pfizer, Servier. Dr. Perlis has received consulting fees or honoraria from AstraZeneca, Bristol Myers-Squibb, Eli Lilly, GlaxoSmithKline, Pfizer, and Proteus; he is a stockholder in Concordant Rater Systems, LLC, and the holder of a patent related to the monitoring of raters in clinical trials. Dr. Smoller has consulted to Eli Lilly, received honoraria from Hoffman-La Roche, Inc, Enterprise Analysis Corp. and MPM Capital, and has served on an advisory board for Roche Diagnostics Corporation. Dr. Sullivan has received unrestricted research support from Eli Lilly.
Patrick F. Sullivan, University of North Carolina, Chapel Hill ; Email: ude.cnu.dem@villusfp.
Eco J.C. de Geus, VU University Amsterdam ; Email: ln.uv.ysp@oce.
Gonneke Willemsen, VU University Amsterdam ; Email: email@example.com.
Michael R. James, Queensland Institute for Medical Research ; Email: firstname.lastname@example.org.
Jan H. Smit, VU University Medical Center Amsterdam ; Email: email@example.com..
Tim Zandbelt, VU University Medical Center Amsterdam ; Email: ln.abzgg@zmit.
Volker Arolt, University of Münster ; Email: firstname.lastname@example.org.
Bernhard T. Baune, James Cook University Queensland ; Email: email@example.com.
Douglas Blackwood, University of Edinburgh ; Email: firstname.lastname@example.org.
Sven Cichon, University of Bonn ; Email: email@example.com.
William L. Coventry, University of New England ; Email: ua.ude.enu@rtnvocw..
Katharina Domschke, University of Münster ; Email: firstname.lastname@example.org.
Anne Farmer, Institute of Psychiatry ; Email: email@example.com.
Maurizio Fava, Harvard Medical School ; Email: gro.srentrap@avafm.
Scott D. Gordon, Queensland Institute for Medical Research ; Email: firstname.lastname@example.org.
Qianchuan He, University of North Carolina, Chapel Hill ; Email: ude.cnu.liame@hcnaiqeh.
Andrew Heath, Washington University, St. Louis ; Email: ude.ltsuw.yrtaihcysp@ahtaeh.
Peter Heutink, VU University Medical Center Amsterdam ; Email: email@example.com.
Florian Holsboer, Max-Planck Institute of Psychiatry ; Email: ed.gpm.lkyspipm@reobsloh.
Witte J. Hoogendijk, VU University Medical Center Amsterdam ; Email: ln.abzgg@hettiw.
Jouke Jan Hottenga, VU University Amsterdam ; Email: firstname.lastname@example.org.
Yijuan Hu, University of North Carolina, Chapel Hill ; Email: ude.cnu.soib@uhy.
Martin Kohli, Max-Planck Institute of Psychiatry ; Email: ed.gpm.lkyspipm@milhok.
Danyu Lin, University of North Carolina, Chapel Hill ; Email: ude.cnu.soib@nil.
Suzanne Lucae, Max-Planck Institute of Psychiatry ; Email: ed.gpm.lkyspipm@eacul.
Donald J. MacIntyre, Royal Edinburgh Hospital ; Email: email@example.com.
Wolfgang Maier, University of Bonn ; Email: firstname.lastname@example.org.
Kevin A. McGhee, University of Edinburgh ; Email: email@example.com.
Peter McGuffin, Institute of Psychiatry ; Email: firstname.lastname@example.org.
Grant Montgomery, Queensland Institute for Medical Research ; Email: email@example.com.
Walter J. Muir, University of Edinburgh ; Email: firstname.lastname@example.org.
Willem Nolen, University Medical Center Groningen ; Email: email@example.com.
Markus M. Nöthen, University of Bonn ; Email: firstname.lastname@example.org.
Roy H. Perlis, Harvard Medical School ; Email: ude.dravrah.hgm.rghc@silrepr..
Katrina Pirlo, Institute of Psychiatry ; Email: email@example.com.
Danielle Posthuma, VU University Amsterdam ; Email: ln.uv.ysp@elleinad.
Marcella Rietschel, University of Heidelberg ; Email: firstname.lastname@example.org.
Patizia Rizzu, VU University Medical Center Amsterdam ; Email: email@example.com..
Alexandra Schosser, Institute of Psychiatry ; Email: firstname.lastname@example.org.
August B. Smit, VU University Amsterdam ; Email: email@example.com.
Jordan W. Smoller, Harvard Medical School ; Email: ude.dravrah.smh@relloms_nadroj.
Jung-Ying Tzeng, North Carolina State University ; Email: ude.uscn.tats@gneztyj..
Richard van Dyck, VU University Medical Center Amsterdam ; Email: firstname.lastname@example.org.
Matthijs Verhage, VU University Amsterdam ; Email: ln.uv.rcnc@sjihttam.
Frans G. Zitman, Leiden University Medical Center ; Email: email@example.com.
Nicholas G. Martin, Queensland Institute for Medical Research ; Email: firstname.lastname@example.org.
Naomi R. Wray, Queensland Institute for Medical Research ; Email: email@example.com.
Dorret I. Boomsma, VU University Amsterdam ; Email: ln.uv.ysp@terrod.
Brenda W.J.H. Penninx, VU University Medical Center Amsterdam ; Email: firstname.lastname@example.org.