|Home | About | Journals | Submit | Contact Us | Français|
Schizophrenia (SCZ) is a severe mental disorder marked by hallucinations, delusions, cognitive deficits and apathy with heritability estimated at 73-90%1. Inheritance patterns are complex and the number and type of genetic variants involved are not understood. Copy number variants (CNVs) have been identified in individual SCZ patients2-7 and also in neurodevelopmental disorders8-11, but large-scale genome-wide surveys have not been performed. We report such a genome-wide survey of rare CNVs in 3,391 patients with SCZ and 3,181 ancestrally-matched controls using high-density microarrays. For CNVs that were observed in less than ~1% of the sample and greater than 100kb in length, the total burden is increased in SCZ patients compared to controls (P=3×10−5; 1.15 fold increase). This effect was more pronounced for rarer, single-occurrence CNVs and for those that involved genes as opposed to those that did not. As expected, deletions were found within the region critical for velo-cardio-facial syndrome (P=0.0017, odds ratio (OR) = 21.6), which includes psychotic symptoms in 30% of patients12. Associations with SCZ were also found for large deletions on chromosome 15q13.2 (P = 0.0029, OR = 17.9) and 1q21.1 (P = 0.0076, OR = 6.6). These associations were not previously reported in the literature and remained significant after genome-wide correction. Overall, our results provide strong support for a model of SCZ pathogenesis that includes the effects of multiple rare structural variants, both genome-wide and at specific loci.
The International Schizophrenia Consortium (ISC) was established to promote rapid progress towards the identification of genetic causes underlying SCZ. The ISC is comprised of investigators from the University of Aberdeen, Cardiff University, the University of Edinburgh, the Karolinska Institutet, Massachusetts General Hospital, the University of North Carolina-Chapel Hill, the Queensland Institute of Medical Research, the University of Southern California, the Stanley Center for Psychiatric Research at the Broad Institute of Harvard and MIT, Trinity College Dublin and University College London.
We surveyed single nucleotide polymorphisms (SNPs) and CNVs using the Affymetrix Genome-Wide Human SNP 5.0 and 6.0 Arrays in European SCZ cases and ancestrally-matched controls (Table 1 and Supplementary Information)13. Based on the genome-wide SNP data there was no evidence of major population stratification within each site14 (data not shown). Intensity data from both SNP and CNV probes were used to identify autosomal deletions and duplications, based on a hidden Markov model (HMM)15.
This study focused on rare but highly penetrant structural variation in schizophrenia, following a natural extension of the classical medical genetic approach. Common CNVs are better identified with different algorithms and better tested for association separately13,15. Considering CNVs present in less than 1% of our total sample, there were 6,753 greater than 100kb that passed sample and CNV quality filtering (see Supplementary Information, Table S1). The median size was 182.1kb (166.3kb for deletions, 194.4kb for duplications), 39% were deletions and the median number per individual was 1. We assessed the impact of rare structural variation on SCZ risk in two ways: first, in terms of an individual's genome-wide burden and second, by searching for specific loci that were significantly associated with disease.
Structural variants have been identified for severe neurodevelopmental disorders9-11,16,17. Since it has been postulated that SCZ might, at least in part, have a developmental etiology18, we posited a role for CNVs in SCZ, as have others2-6. A number of loci have in fact been identified, including variants containing genes with neurodevelopmental roles2-5. However, a critical question is the extent to which this is a general mechanism for producing SCZ in typical clinical populations rather than cases selected for atypical phenotypic features such as very early onset or mental retardation. This motivated our primary hypothesis: that individuals with SCZ have a greater genome-wide burden of CNVs. Considering all CNVs, we observed that cases had a greater average burden than controls (1-sided, empirical P= 3×10−5 controlling for array type; Table 2). Controls on average had 0.99 CNVs per person, whereas cases showed a 1.15-fold higher rate.
We next explored this subtle, but highly statistically significant, observation of increased burden. We defined burden in two ways: as the number of CNVs an individual carries (as above), and also as the number of genes spanned by those CNVs. This second metric (the “gene-count”) in fact showed a stronger association with SCZ (1.41-fold increase, empirical P=2×10−6) than burden defined simply as the number of CNVs. Characteristics of CNV subgroups studied here are their frequency, type, size, and proximity to a gene (Tables 2 & 3; Table S2). We observed increased burden across multiple independent subgroups of CNVs, a finding that was more pronounced for rarer CNVs and those involving genes. Deletions and duplications also displayed somewhat different profiles: the association of deletions varied more noticeably with respect to CNV size and proximity to a gene, whereas duplications showed a more uniform pattern. Eight hundred ninety CNVs were observed in either a case or a control as a single occurrence. This rarest subset of CNVs would be expected to show enrichment under the model that genetic causes of SCZ are individually unique in some proportion of patients. Indeed, this set of CNVs showed a 1.45-fold increase in cases (empirical P=5×10−6). On average, 13.1% of SCZ cases possessed a deletion or duplication observed only once in the sample, in contrast to 10.4% of controls. Under a model in which very rare (occurring in under 1/1000 individuals) inherited or recurrently de novo events increase risk, we would expect to observe a greater overall burden in SCZ. Although our study was statistically under-powered to identify the actual loci involved, such variants could in theory be mapped in extremely large samples. In this intermediate group, we observed 2,465 CNVs occurring between 2 and 6 times in the total sample, for which there was an increased burden, both for number of CNVs (empirical P = 0.0013) and gene-count (empirical P = 5×10−4).
Because several known genomic disorders of the nervous system result from large CNVs, that are often many hundreds of kilobases11, we additionally stratified by size of event (Table 3). Of deletions, only larger (>500kb) variants were enriched (empirical P=3×10−4) despite being the least frequent set of CNVs (N=285), displaying a 3.57-fold increase in gene-count between cases and controls (empirical P = 2×10−5). In contrast, shorter duplications showed a stronger association with disease than longer duplications, albeit with a smaller fold increase than deletions (Table 3).
In general, the gene-count definition of CNV burden yielded stronger results, particularly for deletions (gene-count P = 3×10−5 versus number P = 0.11; Table 2). In fact, dividing all CNVs into two sets, of those that intersect at least one gene and those that do not, we saw an increased burden only in the number of “genic” CNVs (P = 5×10−6; Table S2) and not for non-genic CNVs (P = 0.16). There was a similar trend for CNVs seen 2-6 times when comparing enrichment in genic and non-genic CNVs (P = 7×10−4 and 0.19) but not single-occurrence CNVs (P=6×10−4 and 6×10−4). These results may reflect biological distinctions, although they may to some extent also reflect variable performance in CNV detection for different classes of variant. We conducted a set of analyses to rule out several sources of bias and confounding in the primary genome-wide burden analysis (Tables S3, S4, S5 & S6). While, in general, low specificity and sensitivity decrease power, of concern here is potential measurement error that systematically varied between cases and controls, leading to spurious results. In this respect, an obvious concern is that both Affymetrix 5.0 and 6.0 arrays were used; as a consequence, we performed all analyses controlling for array type. As described in the Supplementary Information, the primary result was also robust to the following. First, in addition to array type, we controlled for sample collection site, genotyping plate and average probe variance. Second, sensitivity analyses showed that no single sample collection site accounted for the observations. Third, we restricted analysis to the most homogeneous 90% of the sample with respect to intra-individual probe variance. Fourth, if case/control differences in CNV burden were purely due to unmeasured confounders, we would not expect an enriched gene-count after controlling for the overall extent and rate of CNVs. Of note, however, is that after controlling for overall (genic and non-genic) CNV burden there remained a significantly enriched gene-count burden in SCZ patients.
Our large sample size further enabled us to search for specific CNV regions associated with SCZ. One locus previously reported to increase risk for SCZ is 22q11.2 (17-21Mb), at which hemizygosity occurs in 1:4000 live births12. These deletions produce a range of clinically heterogeneous phenotypes, including velo-cardio facial syndrome (VCFS) and DiGeorge syndrome that together are known as 22q11.2 deletion syndrome (22q11.2DS)12; approximately 30% of carriers develop psychosis12. Previous studies estimated the frequency of 22q11.2 deletions to be 0.6%-1.0% in SCZ cases although many of these studies had technically incomplete characterization of this region19. Thus we expected to find examples of 22q11.2 deletions in our sample of 6,572 individuals. The most common form of 22q11.2DS is a 3Mb loss (~90% frequency), although a nested 1.5Mb deletion is also observed (~7%) along with infrequent (~3%) atypical deletions20. We identified 13 large deletions (>500 kb) in SCZ cases within this interval and none in controls (Table S7). Of these, six were consistent with the larger deletion, five with the shorter deletion, and two were atypical. The 11 samples with typical deletions defined an interval with the strongest association (empirical P = 0.0017; genome-wide corrected P = 0.0046; odds ratio = 21.6) (Figure 1A). Controlling for sample collection site or genotyping plate instead of array type did not change the results (Table S10). The two other atypical deletions in this region overlap the distal end of the 3Mb variant. Deletion events within the region were confirmed in all 13 patients by quantitative polymerase chain reaction with three individual assays (qPCR, Tables S11 & S12 and Figure S1). Our findings provide additional evidence that hemizygosity in 22q11.2 is a rare but powerful risk factor for SCZ.
The larger 22q11.2 deletion harbors 43 genes (Table S8). Despite the efforts of many groups, the psychiatric symptoms observed in 22q11DS have not been ascribed to reduced copy number of any individual gene12. Variants within catechol-O-methyltransferase(COMT), an enzyme responsible for degrading catecholamines including dopamine, have been implicated in a wide variety of phenotypes, but with inconsistent results12.
Removing the 13 22q11.2DS individuals, we observed a further 271 deletions >500 kb (175 in cases and 110 in controls). Two additional regions (15q13.2 and 1q21.1) were identified that harbor a significant excess of deletions in SCZ cases after correction for multiple testing (Figure 1B and 1C; Table S7 for case descriptions). On chromosome 15 (28 – 31Mb) there were deletions in 9 cases and 0 controls (empirical P = 0.0029; genome-wide corrected P = 0.046; odds ratio = 17.9). On chromosome 1 (142.5-145.5Mb) there were 10 deletions in cases and 1 in controls (empirical P = 0.0076; genome-wide corrected P = 0.046; odds ratio = 6.6). All 20 large deletions at 15q13.2 and 1q21.1 were validated by one or more qPCR reactions (Tables S11 & S12 and Figure S1). The multiple test correction factors were small as a consequence of restricting our attention to this small class of rare variants. We did not observe any regions with a corrected P < 0.05 for either duplications or smaller (<500kb) deletions. Also of note, the primary CNV burden tests remained significant after removing individuals with a deletion at one of these three loci (number P = 1 × 10−4 and gene-count P = 3 × 10−5); for >500kb deletions specifically, the burden test remained significant for number (P = 0.02) but not for gene-count (P = 0.11).
The large deletions on chromosome 15q13.2 have not been previously associated with SCZ. This region does not include the nearby critical region for Prader-Willi/Angelman syndrome21 but is consistent with the critical region defined by recurrent deletion in cases of mental retardation (MR) with seizures recently reported17. Furthermore, our estimated breakpoints fall within the segmental duplications reported (BP4 and BP5). In the present study, evidence consistent with mildly-impaired cognition was seen in five of the nine patients with deletions and one individual also had a history of epilepsy (Table S7). This broad region has been the focus of previous genetic studies in SCZ. The alpha 7 subunit of the nicotinic acetylcholine receptor gene (CHRNA7) is a candidate based on an initial identification from linkage analysis of the P50 auditory evoked potential deficits observed in SCZ patients22,23 though there are no prior reports of large deletions associated with SCZ.
The deleted region on 1q21.1 is consistent with a previously reported de novo deletion in a patient with MR and seizures17 and two patients with autism (one de novo and one inherited)10. In the present study, three cases had mild cognitive abnormalities and one had a history of epilepsy (Table S7). The region contains 26 known genes, the majority of which are expressed in the brain (Table S8) and has previous reports of linkage24-26 but no prior reports of CNVs associated with SCZ.
Regions of highly homologous segmental duplication (SD) flank the deletions we report at 22q11.2, 15q13.2, and 1q21.1. A prominent mechanism for CNV genesis is non-allelic homologous recombination (NAHR) mediated by SDs, resulting in deletions and reciprocal duplications of the interval between SDs16,27. Neurodevelopmental and psychiatric syndromes have been associated with deletions and duplications flanked by SDs, many of which occur de novo10,11,17. Segmental duplications and NAHR mediate CNVs at 22q11.228 and may be involved in the genesis of CNVs at 15q13.214, and 1q21.1, though other mechanisms may be involved29. While this work was under review, Walsh et al.2 reported a higher frequency of CNVs in cases (15%, 22 in 150 SCZ patients) versus controls (5%). Among the 21 autosomal case CNVs they identified, we observed overlapping control CNVs at seven loci (for example, DLG2 and PTPRM, Table S9), illustrating that large sample numbers are needed to conclude that any one particular CNV or implicated gene can cause SCZ. Our global burden analysis demonstrated that, in aggregate, single-occurrence and very rare (under ~1/1000) CNVs have increased rates in SCZ cases compared to controls, in line with Walsh et al. This suggests that at least some proportion of these rare CNVs seen in cases but not controls are likely SCZ risk factors, although like Walsh et al, we are unable to distinguish exactly which. Some examples of possible risk CNVs that were observed multiple times only in cases include deletions at 12p11.23 (4 cases) and 16p12.2-12.1 (6 cases). These deletions were >500kb, flanked by SDs and span several brain-expressed genes. In addition, duplications in two genes relevant to neural development and growth (NOTCH1 and p21-activated kinase 7, PAK7) were found in 5 and 6 cases respectively and 0 controls. Furthermore, we identified CNVs at two recently reported loci, NRXN1 and CNTNAP23,5 (Figure S2).
The etiology of SCZ has been vigorously debated. We now have strong and replicated2 evidence that individuals with SCZ have a greater burden of structural variation across their genomes. Our data show that CNVs in at least three loci act as strong risk factors for SCZ in a minority of individuals. Thus we can now posit that some cases of SCZ are “genomic disorders”16 although we do not yet know whether the risk is specific for SCZ as opposed to a more general risk factor for neuropsychiatric or central nervous system illness.
Exactly how a subtle, 1.15-fold increase in CNV burden translates mechanistically into illness in a given patient is currently unknown. We also do not know whether common genetic variants of more subtle effect are components of the etiology of SCZ, an empirical question that we and others are addressing. Similarly, we do not know how environmental risk or protective factors might act in concert with specific CNVs or with overall burden of CNVs.
A critically important goal will be to determine the full clinical and phenotypic spectrum in carriers of these deletions. Our data provide preliminary evidence of a variable phenotype in SCZ patients who would otherwise be regarded as clinically typical. Examining the role of these variants in related psychotic disorders, such as bipolar disorder, is imperative. Further work explicating the epidemiology and mechanism of these variants in SCZ may ultimately lead to a role for them in genetic counseling and understanding disease biology.
Cases satisfied DSM-IV or ICD-10 criteria for SCZ and were broadly representative of clinical cases in contact with psychiatric services. DNA was extracted from whole blood, with approval from institutional review boards. CNVs were identified using the Birdseye package15 and analysed using PLINK v1.0314. See the Supplementary Information for detail. A list of all CNVs passing quality control is available at http://pngu.mgh.harvard.edu/isc/
We would like to thank the patients and families who have contributed their time and DNA to these studies. We would also like to thank David Altshuler and members of the Medical and Population Genetics group at the Broad Institute of Harvard and MIT for valuable discussion.
The Stanley Center for Psychiatric Research at the Broad Institute: The Sylvan C Herman Foundation (EMS); the Stanley Medical Research Institute (EMS)
Cardiff University: The Cardiff group was supported by an MRC (UK) Programme Grant and NIMH (USA) (CONTE: 2 P50 MH066392-05A1)
Karolinska Institutet: Swedish Council for Working Life and Social Research FO 184/2000; 2001-2368
Massachusetts General Hospital: Stanley Medical Research Institute (PS); MH071681 (PS); Narsad Young Investigator Award (SP)
Queensland Institute of Medical Research: Supported by the Australian National Health and Medical Research Council.
Trinity College Dublin: The Trinity College Dublin research was supported by Science Foundation Ireland, the Health Research Board (Ireland), the Stanley Medical Research Institute and the Wellcome Trust. Irish controls were supplied by Dr. Joe McPartlin from the Trinity College Biobank.
University of Aberdeen: This work was part funded by GSK and Generation Scotland, Genetics Health Initiative.
University College London: The University College London clinical and control samples were collected with support from the Neuroscience Research Charitable Trust, the Camden and Islington Mental Health and Social Care Trust, East London and City Mental Heath Trust, The West Berkshire NHS Trust, The West London Mental Health Trust, Oxfordshire & Buckinghamshire Mental Health Partnership NHS Trust, South Essex Partnership NHS Foundation Trust, Gloucestershire Partnership NHS Foundation Trust. Mersey Care NHS Trust, Hampshire Partnership NHS Trust and North East London Mental Health Trust.
University of Edinburgh: The collection of the Edinburgh cohort was supported by grants from The Wellcome Trust, London and the Chief Scientist Office of the Scottish Executive.
University of North Carolina, Chapel Hill: Supported by MH074027, MH077139, and MH080403; the Sylvan C Herman Foundation (PFS); the Stanley Medical Research Institute (PFS)
University of Southern California: This work would not have been possible without the collaboration of our patients and their families. We would also like to acknowledge the support of the National Institute of Mental Health and the Department of Veterans Affairs.
Manuscript Preparation: Jennifer L Stone, Michael C O'Donovan, Hugh Gurling, George K Kirov, Douglas HR Blackwood, Aiden Corvin, Nick J Craddock, Michael Gill, Christina M Hultman, Paul Lichtenstein, Andrew McQuillin, Carlos N Pato, Douglas M Ruderfer, Michael J Owen, David St. Clair, Patrick Sullivan, Pamela Sklar, Shaun M Purcell
Data Analysis: Jennifer L Stone, Douglas M Ruderfer, Joshua Korn, George K Kirov, Stuart MacGregor, Andrew McQuillin, Derek W Morris, Colm T O'Dushlaine, Mark J Daly, Peter M Visscher, Peter A Holmans, Michael C O'Donovan, Patrick Sullivan, Pamela Sklar, Shaun M Purcell
Management Committee: Hugh Gurling, Aiden Corvin, Douglas HR Blackwood, Nick J Craddock, Michael Gill, Christina M Hultman, George K Kirov, Paul Lichtenstein, Andrew McQuillin, Michael C O'Donovan, Michael J Owen, Carlos N Pato, Shaun M Purcell, Edward M Scolnick, David St. Clair, Jennifer L Stone, Patrick Sullivan, Pamela Sklar
Stanley Center for Psychiatric Research and Broad Institute of MIT and Harvard: Shaun M Purcell1-4, Jennifer L Stone1-4, Kimberly Chambert2,3, Douglas M Ruderfer1-4, Joshua Korn3,4, Steve A McCarroll3,4, Casey Gates3, Stacey B Gabriel3, Scott Mahon3, Kristen Ardlie3, Mark J Daly2,3,4, Edward M Scolnick2,3, Pamela Sklar1-4
Cardiff University: Michael C O'Donovan5, George K Kirov5, Nick J Craddock5, Peter A Holmans5, Nigel M Williams5, Lucy Georgieva5, Ivan Nikolov5, N Norton5, H Williams5, Draga Toncheva6, Vihra Milanova7, Michael J Owen5
Karolinska Institutet/University of North Carolina at Chapel Hill Christina M Hultman8.9, Paul Lichtenstein8, Emma F Thelander8, Patrick Sullivan10
Massachusetts General Hospital: Jennifer L. Stone1-4, Douglas M. Ruderfer1-4, Joshua Korn3,4, Steve A McCarroll3,4, Mark Daly2,3,4, Shaun M Purcell1-4, Pamela Sklar1-4
Trinity College Dublin: Derek W Morris11, Colm T O'Dushlaine11, Elaine Kenny11, John L Waddington12, Michael Gill11, Aiden Corvin11
University College London: Andrew McQuillin13, Khalid Choudhury13, Susmita Datta13, Jonathan Pimm13, Srinivasa Thirumalai14, Vinay Puri13, Robert Krasucki13, Jacob Lawrence13, Digby Quested15, Nicholas Bass13, David Curtis16, Hugh Gurling13
University of Aberdeen: Caroline Crombie17, Gillian Fraser17, Noelle Kwan18, Nicholas Walker19, David St. Clair18
University of Edinburgh: Douglas HR Blackwood20, Walter J Muir21, Kevin McGhee21, Alan W Maclean21, Margaret Van Beck20
Queensland Institute of Medical Research: Peter M Visscher22, Stuart Macgregor22
University of Southern California: Michele T. Pato23, Helena Medeiros23, Frank Middleton24, Celia Carvalho23, Christopher Morley24, Ayman Fanous23,25-27, David Conti23, James Knowles23, Carlos Paz Ferreira28, Antonio Macedo29, M. Helena Azevedo29, Carlos N. Pato23
Reprints and permissions information are available at “npg.nature.com/reprintsandpermissions”.
The authors report no competing financial interests statement.