|Home | About | Journals | Submit | Contact Us | Français|
A genome scan meta-analysis (GSMA) was carried out on 32 independent genome-wide linkage scan analyses that included 3255 pedigrees with 7413 genotyped cases affected with schizophrenia (SCZ) or related disorders. The primary GSMA divided the autosomes into 120 bins, rank-ordered the bins within each study according to the most positive linkage result in each bin, summed these ranks (weighted for study size) for each bin across studies and determined the empirical probability of a given summed rank (PSR) by simulation. Suggestive evidence for linkage was observed in two single bins, on chromosomes 5q (142-168 Mb) and 2q (103-134 Mb). Genome-wide evidence for linkage was detected on chromosome 2q (119-152 Mb) when bin boundaries were shifted to the middle of the previous bins. The primary analysis met empirical criteria for ‘aggregate’ genome-wide significance, indicating that some or all of 10 bins are likely to contain loci linked to SCZ, including regions of chromosomes 1, 2q, 3q, 4q, 5q, 8p and 10q. In a secondary analysis of 22 studies of European-ancestry samples, suggestive evidence for linkage was observed on chromosome 8p (16-33 Mb). Although the newer genome-wide association methodology has greater power to detect weak associations to single common DNA sequence variants, linkage analysis can detect diverse genetic effects that segregate in families, including multiple rare variants within one locus or several weakly associated loci in the same region. Therefore, the regions supported by this meta-analysis deserve close attention in future studies.
We report here on a new genome scan meta-analysis (GSMA)1-3 of genome-wide linkage scans (GWLS) of schizophrenia (SCZ). We previously published a GSMA of 20 scans that included 1208 pedigrees with 2945 genotyped affected individuals.4 Here we analyze 32 scans that included 3255 pedigrees with 7413 affected individuals.
Genome-wide association (GWA) study methods have proven more successful than GWLS for common diseases: they detect smaller effects of common single nucleotide polymorphisms (SNPs)5 as well as some copy number variants (CNVs),6 and have identified many common disease associations.7 But an adequately powered GWLS can detect a signal from many kinds of susceptibility variants (common SNPs, multiple rare SNPs, tandem repeats and heritable structural variations) in one or more8 loci in a region. Meta-analysis can achieve much larger sample sizes than single studies. Consensus linkage regions may deserve further study using methods such as high-throughput resequencing.
Family and twin data9 suggest that most of the genetic risk to SCZ is conferred by multiple interacting loci, each causing a small increase in risk. As SCZ has similar symptoms and roughly similar prevalence throughout the world,10 a reasonable hypothesis is that at least some loci have effects in many populations. SCZ linkage findings do not replicate consistently, and although there may be some true positives in small samples from unique populations, in general, power is probably limited by inadequate sample size and differences in ascertainment, marker sets, ancestry and statistical methods. There must also be some measurement error due to limitations in diagnostic methods. Current diagnostic criteria produce the largest known estimates of heritability (twins) and of increased risk to first-degree relatives,11,12 and there is good diagnostic reliability within and across research groups.13 But diagnostic judgments are based on data from interview and medical records that vary in completeness and quality. Fortunately diagnostic differences are usually among psychotic disorders that coaggregate in families with similar risks.14 It is not known whether a further subdivision of cases would increase power.
We present a GSMA of all 32 scan analyses, and a secondary analysis of 22 European-ancestry analyses. The new samples (some of them large) are from European, non-European and isolate populations, but are too diverse to consider a subset other than Europeans. We comment briefly below on an analysis of heterogeneity between Asian and European samples. Statistical significance has been evaluated with permutation tests and simulations. In the primary analysis, 2 bins met empirical genome-wide criteria for suggestive linkage; and a set of 10 bins (in eight chromosomal regions) met an empirical aggregate criterion for genome-wide significant linkage, indicating that loci linked to SCZ are present in some or all of these bins. Five of these bins were consistent with our previous report.4 When bin boundaries were shifted to the middle of the previous bins, one genome-wide significant result was observed on chromosome 2q, consistent with our previous report.4
We identified GWLS of SCZ (in addition to those studied previously4) through PubMed literature searches, conference abstracts and personal contacts with investigators, selecting the largest analysis for each study. Study characteristics are summarized in Table 1. Investigators of each of these studies agreed to contribute data to this analysis. All scans ascertained probands with SCZ by one of three closely related sets of diagnostic criteria. Most scans included relatives with SCZ and those with one or all subtypes of schizoaffective disorder (SA). A few scans included only SCZ cases, or included broader diagnoses in alternative models. Scans were included if informative families ascertained through a SCZ case had at least 30 genotyped SCZ and SA cases. We included only primary genome-wide analyses with few gaps in marker coverage, and did not consider subsequent ‘fine mapping’ with additional markers or pedigrees in selected regions. We excluded a small Utah study48 whose RFLP markers were difficult to place on the current map; a study of a large Costa Rican pedigree49 with too few affected cases; a small study from an Italian isolate population50 that combined SCZ and bipolar cases; and a study of isolated villages in Daghestan51 that used a very broad phenotype definition. We lacked detailed results for the Icelandic study of Stefansson et al.52 We included two earlier studies of Icelandic families for which we had complete results,15,20 and discuss below a secondary analysis substituting results from Stefansson et al. inferred from published graphs.
A companion paper38 reports on a combined analysis of studies 24-31 (Table 1). Older data for all or parts of six of those data sets were included as separate scans in our previous GSMA, so all eight are treated separately here using the newer data.38 To eliminate sample overlap, investigators provided us with data for NUH (study 28) that excluded families that were also in the US/International study (study 9); and data for sample 15 (US/Sweden) without families in sample 26 (Cardiff). The investigators determined from genotypes that there were no overlapping families in samples 10 and 22.
There were 14 studies of European-ancestry pedigrees (including partial isolates such as Finland and Ashkenazim); 8 of both European and non-European families; and 10 non-European samples, including 3 Asian (Japanese, Han Chinese and Indonesian), 2 Latino, 1 Arab, 2 Pacific Island isolates (Palau and Kosrae) and 2 entirely (study 20) or predominantly (study 6) African-American samples. Study 6 (Texas) included a few European-ancestry families for which we lacked separate data. The ALL analysis included all 32 samples; the EUR (European-ancestry) analysis included 22 samples or subsamples (Table 1). Data for chromosome X markers were available for 23 studies, so this chromosome was considered in a secondary analysis.
Linkage scores were obtained from investigators or internet postings. Data for each study were rank ordered based on the designated ‘primary’ analysis, and if this included more than one test or model, we took the most positive linkage score across tests for each chromosomal bin. From our previous GSMA,4 data were carried over unchanged for studies 1-10 in Table 1; the Utah study48 was omitted as discussed above; and nine studies were updated with new genotyping results, some with larger samples. Studies 13 and 24-31 used SNP markers from the Illumina version 4 panel; all other studies used microsatellite markers at approximately 8-10 cM density.
Marker locations were determined from the Rutgers combined linkage map (Build 36),53 or interpolated based on physical position, other maps or flanking markers. Chromosomes were divided into bins of approximately equal genetic widths—120 30-cM bins (our primary analysis), or (for secondary analyses) 179 20-cM bins and 88 40-cM bins. Six additional 30-cM bins were defined on the X chromosome (female map length). If a study had no marker in a bin (for example, the narrower 20 cM bins), we used the mean linkage score for the two flanking markers, or the single closest marker for an empty telomeric bin. Bin boundaries in cM and Mb are listed in online Supplementary Table 1.
GSMA is a nonparametric method to combine data generated with different maps and statistical tests.1-3 A GSMA divides the genome into approximately equal-width bins (Nbins), labeled ch.K, where ch = chromosome and K = bin number (that is, 1.4 is the fourth bin of chromosome 1). For each study, bins are ranked by their highest LOD, NPL or Z score or minimum P-value. Ranks for each sample was weighted by the square root of the number of genotyped affected cases, and then, for comparison, analyzed without weights (results of unweighted analyses are provided in the online Supplementary Materials). Note that ‘120’ was the best rank, and higher summed ranks indicate stronger evidence for linkage (whereas in our previous report,4 ‘1’ was the best rank in a study and we reported averaged rather than summed ranks).
We used GSMA software2 to evaluate nominal (binwise) significance by randomly permuting the ranks for bins within each study and then resumming the ranks across studies to determine the empirical summed rank probability (PSR)—the nominal probability that a bin would achieve or exceed the observed summed rank under the null hypothesis of no linkage in the bin. We also determined the empirical ordered rank P-value (POR—the probability that the kth highest summed rank achieves or exceeds an observed summed rank under the null). Thresholds for genome-wide significant and suggestive evidence for linkage54 were determined empirically (see below). For the primary analysis, these thresholds were PSR < 0.00037 and PSR < 0.0077 respectively.
Strong linkage signals can cover a wide genetic region in complex disorders, and often produce high summed ranks in adjacent bins,1 but GSMA can miss a weak linkage signal near the boundary of two bins. Therefore, regions showing suggestive evidence for linkage were reanalyzed using 20 and 40 cM bin widths, and using 30 cM bins starting at the midpoint of the bins used in the primary analysis (Supplementary Figure 1).
We tested for heterogeneity in study ranks between three outbred Asian studies (Taiwan, Japan, Indonesia) and the 22 EUR studies, using the nonparametric Wilcoxon rank sum test. We did not test for heterogeneity arising from an arbitrary set of studies;55 these tests detect bins where study ranks have a wider spread than expected by chance, but they have low power and do not identify the study subgroups contributing to the heterogeneity.56 We also carried out sensitivity analyses, dropping one study at a time and analyzing the remaining studies with GSMA (weighted and unweighted analyses, 30 cM bins).
We performed a simulation study to determine: (1) power; (2) permutation-based estimates of type I error and (3) aggregate criteria for genome-wide significance. Details are provided in Supplementary Online Materials and in a previous report.1 Briefly, genome-wide data were simulated for affected sibling pairs under the assumption of no linkage or assuming a range of genetic models (1-10 linked bins with locus-specific sibling relative risks (λS) of 1.15 or 1.3, at the edge or near the middle of chromosomes), and 1000 replicates of the 32-sample data set (or of alternative subsets of the data) were created by replacing each sample with a number of ASPs with roughly comparable linkage information.
For estimates of type 1 error, we determined empirical bin-wise thresholds for genome-wide suggestive linkage (the nominal P-value observed on average once per GSMA replicate) or genome-wide significant linkage (the nominal P-value observed on average in 5% of GSMA replicates).
We also determined criteria for aggregate genome-wide significant linkage, that is, that the total number of bins achieving a specified nominal P-value threshold was greater than that observed in 5% of genome-wide replicates with no linkage present. This criterion was determined for ALL and EUR studies by establishing a P-value threshold, P0, for which exactly 5% of unlinked replicates had N or more bins with PSR < P0. P0 was initially set at 0.05, and the number of unlinked replicates with ≥N bins achieving PSR < P0 was tabulated ( for N = 1, 2, 3,...). P0 was then reduced incrementally until exactly 5% of simulations had at least N bins with PSR < P0, for some value N. For example, for the primary weighted analysis of 32 studies, the 5% genome-wide threshold by this procedure was 10 bins with P < 0.046; for EUR studies, it was 10 bins with P < 0.048.
For power calculations, we determined how often, on average, the linked bins exceeded the PSR threshold for ALL and for EUR studies, and, to investigate heterogeneity, for ALL studies assuming that linkage arose only in EUR studies.
Figure 1 illustrates P-values for all bins in the ALL and EUR analyses, and the figure legend states the empirical P-value thresholds for each study. For ALL studies (3255 pedigrees, 7413 genotyped cases), Table 2 lists the 10 bins (in eight nonadjacent chromosomal regions) that achieved PSR < 0.05 in the primary analysis (weighted, 30 cM bins). These 10 bins met the empirical aggregate criterion for genome-wide significant linkage (10 bins with nominal PSR < 0.046). Suggestive evidence for linkage was observed for single bins on chromosome 5q (bin 5.6, PSR = 0.0046) and chromosome 2q (bin 2.5, PSR = 0.0075). Figure 1 illustrates the results for all bins (ALL and EUR analyses), and Figure 2 illustrates the relative ranks of bins in each study. (Results for all analyses are available in Online Supplementary Tables; ranks for each study are available online at www.kcl.ac.uk/mmg/ngdata.html.)
Table 3 summarizes results for 22 EUR samples or subsamples (1813 pedigrees, 4094 genotyped cases). Suggestive evidence for linkage was observed on chromosome 8p (PSR = 0.00057, close to the genome-wide threshold for 22 EUR studies of 0.00044), with a further five bins in four different regions achieving nominal significance.
On chromosome X, there were no nominally significant bins for ALL studies; there was one such bin (X.5, 130-162 cM, 119-141 Mb, PSR = 0.0247) for EUR studies.
When estimated ranks (interpolated from published graphs) for the Icelandic study of Stefansson et al.52 were substituted for the results of Gurling et al.20 and Moises et al.,15 results were similar to those shown in Table 1 except that bin 3.4 failed to achieve nominal significance. For EUR studies, results again were similar except that bins 8.2 and 2.8 both achieved suggestive significance.
Online Supplementary Figures 2-4 illustrate results of analyses using four alternative bin widths (20, 30 and 40 cM, and 30 cM bins shifted by 50% for chromosomes with suggestive evidence for linkage (chromosomes 2 and 5 for ALL, chromosome 8 for EUR). On chromosome 2, genome-wide significance was obtained for the shifted 30 cM bin (PSR = 0.00035, 132.2-161.6 cM, 118.7-152 Mb), and suggestive linkage evidence was seen in the adjacent bin (102.8-132.2 cM, 84.9-118.7 Mb). For the region of chromosome 5q that produced suggestive evidence for linkage in the primary analysis, the signal was similar regardless of bin width. On chromosome 8, using 20 cM bins extended the signal by approximately 5 cM in both directions; if linkage is present, this pattern could indicate a broader signal (multiple loci) or the effects of a single strong signal on adjacent bins.
In tests of differences between 22 EUR vs 3 Asian studies, none of the 120 bins exceeded nominal P < 0.01, and only 4 bins achieved nominal P < 0.05, thus, no significant difference was observed. The most significant evidence for heterogeneity between the Asian and European studies was in bin 1.5, but this did not reach the suggestive threshold (online Supplementary Figure 5). Results of sensitivity analyses, omitting each study in turn, are shown in online Supplementary Figures 6 and 7. The largest changes: bin 8.2 (8p) achieves the suggestive linkage threshold without the Taiwan data set; bin 2.8 (2q) becomes substantially more significant (but not achieving genome-wide significance) without the Japan data set; bin 5.6 (5q) becomes substantially more significant (but not genome-wide significant) without the Suarez et al. data set (study 21); and bin 6.3 (6pq) would join the list of nominally significant bins without either the Taiwan or Japan data set.
Type I error was calculated for the unlinked GSMA simulations (120 000 30-cM bins) by determining the number of observed bins in which the GSMA program computed PSR values (based on permutation alone) that were less than the theoretical threshold for nominal significance (0.05); for genome-wide suggestive linkage, or a value expected once per GSMA (1/120 = 0.00833) or genome-wide significant linkage (0.05/120 = 0.00042). The GSMA program’s permutation procedure produced type I errors that were slightly liberal for ALL (0.053, 0.0096 and 0.00051) and less liberal for EUR studies (0.051, 0.0091 and 0.00042). The source of the discrepancy is not clear, but we have used the permutation-based suggestive and significant values here for single bins (0.0077 and 0.00037 for ALL, and 0.0078 and 0.00044 for EUR studies) and an empirical threshold for aggregate genome-wide significance as described above.
Table 4 summarizes the results of power calculations from simulation studies. Power is lower for edge bins for multipoint analyses, so we computed a weighted average of power assuming 20% edge and 80% mid-chromosome bins (see Table 4 legend). Power was excellent to detect significant linkage at multiple loci with observed population-wide λS = 1.3 (30% increased risk in sibs) across ALL studies or in EUR studies alone. For λS = 1.15, there was excellent power to detect significant linkage in multiple bins (for ALL studies), or to detect suggestive evidence for linkage in multiple bins (if limited to EUR studies); there was reduced power in the presence of substantial heterogeneity. Genetic effects limited to a defined subset (EUR) were more readily detected in a separate analysis.
Meta-analysis of 32 GWLS, encompassing 7413 individuals affected with SCZ or related disorders in 3255 pedigrees, produced evidence for significant linkage in the genome based on empirical aggregate criteria: 10 30-cM bins that achieved bin-wise PSR < 0.046. Genome-wide evidence for linkage was detected on chromosome 2q (118.7-152 Mb) in a secondary analysis that shifted bin location by 50%, consistent with our previous finding of significant linkage in the same region in our previous GSMA of 20 studies.4 Suggestive evidence for linkage was detected in ALL studies in two single bins, on chromosomes 5q and 2q. Separate analysis of European-ancestry samples produced linkage evidence falling just short of the genome-wide significance in bin 8.2 (15.7-32.7 Mb).
It is widely assumed that many loci contribute to SCZ susceptibility. Table 4 shows that for the GSMA method, power to detect at least one true linkage goes up with the number of linked loci, but the power to detect any specific locus decreases, because in a rank-ordered test there is a finite number of high ranks in each study. But if there were many loci with locus-specific λS values of 1.3 or even 1.15 (averaged across all study populations and accounting for the effects of various sources of measurement error), we would expect to detect genome-wide significant linkage in several individual bins, and suggestive evidence for linkage in many more than two bins. It would be reasonable to conclude that there are probably no loci with such strong effects worldwide or across all European populations, although they might exist within particular populations for which samples of this size are not available or within subsets of families, which we cannot yet identify in advance.
The power of linkage analysis for a locus is predicted by its contribution to the relative risk (RR) to siblings of probands (λS),57 whereas the power of association analysis is predicted by allelic or genotypic RRs, that is, the increased risk to individual carriers.5 GWA studies have produced robust association findings for many complex genetic disorders58-60 typically with RRs of 1.1-1.5. A variant found on 10% of chromosomes conferring a ‘large’ RR of 1.5 per allele in a multiplicative model (2.25 for homozygous carriers) would produce a population-wide sibling RR of only 1.02 (undetectable by linkage). But GWA chips are unlikely to tag all pathogenic variants. For example, in NOD2 there are three rare variants, each with a frequency of less than 5%, that confer moderate risks of Crohn’s disease. Their individual effects generate only weak signals when assayed by the Affymetrix 500K chip because they are poorly tagged; but combined, they confer a sibling RR of 1.16, which is detectable in a very large linkage sample.58,61 This illustrates a strength of linkage analysis: allelic heterogeneity is difficult to detect in association studies, as the genetic effect is split across variants, but linkage can detect the pooled effect of all variants (not only SNPs) within one or more susceptibility genes or elements in a region.
Readers may be interested to know the location of SCZ candidate genes or regions in relation to bins that produced evidence for linkage here. On chromosome 1q, NOS1AP (160 Mb; previously known as CAPON),62,63 RGS4 (161.3 Mb)64,65 and UHMK1 (160.7 Mb)66 are near the telomeric edge of bin 1.6 (114.6-162.1 Mb), and the rare (usually de novo) SCZ-associated CNV is located in a different part of the same bin (145-146 Mb).67,68 On chromosome 8p, PPP3CC (22.35 Mb)69 and NRG1 (31.6-32.7 Mb)52,70 are both within bin 8.2 (15.7-32.7 Mb). On chromosome 2q, ZNF804A, which contains a SNP that produced genome-wide significant evidence for association in a combined SCZ-bipolar disorder sample,71 is in bin 2.7 at 185 Mb. This bin was not nominally significant in the primary analysis, but 185 Mb is within nominally significant bins in two secondary analyses (one that shifted bin boundaries by 50% and one that considered 40 cM bin widths). DTNBP1, DAOA, TRAR4/STX7, CHRNA7, COMT/ARVCF, DISC1, AKT1, HTR2A and DRD2 are not in bins listed in Tables Tables22 or or3.3. Many of the research groups whose samples are included here are actively investigating association of SCZ to sequence variants in genes identified in linkage regions or by the more recent genome-wide association methodology.
In conclusion, this updated GSMA of 32 SCZ studies showed significant evidence for linkage on chromosome 2q in a secondary analysis, suggestive evidence for linkage on chromosomes 5q and 2q in the primary analysis, and suggestive evidence for linkage on 8p in European samples. Genome-wide significant aggregate evidence for linkage (that is, more modestly significant results than expected by chance) was observed based on results for 10 bins in 8 nonadjacent chromosomal regions. As results from genome-wide association studies emerge, it may be important to keep in mind that there may be SCZ susceptibility loci in some of these consensus linkage regions that cannot be detected by common tag SNPs. It is likely that diverse methods will be required to identify specific DNA sequence variants and to confirm and define their role in the etiology of SCZ.
The work reported here was supported by: Medical Research Council (UK) Grants G0400960 (CML) and G9309834 (MO); National Institute of Mental Health Grants 7R01MH062276 (to DFL [Aust/US], CL [France/La Réunion], MO [Cardiff] and DW [Bonn]), 5R01MH068922 (to PG [ENH]), 5R01MH068921 (to AEP [Johns Hopkins]), 5R01MH068881 (to BR [VCU/Ireland]), MH-41953 (to KSK [VCU/Ireland]); MH63356 and MH80299 (to WB [Palau]); MH58586 (to JMS [Aust/US]); MH 56242 (to VLM [US/Sweden]), MH61399 (EMW, MK [Kosrae, South Africa]), NIMH Grant MH062440, Canadian Institutes of Health Research Grants MOP-53216 and MOP-12155, a National Alliance for Research on Schizophrenia and Depression Distinguished Investigator Award and Canada Research Chair in Schizophrenia Genetics (LMB, ASB [Canada]); Australian National Health and Medical Research Council Grants 910234, 941087, and 971095 (to BJM [Aust/US]), MRC project Grant G880473N, The European Science Foundation, SANE, the Iceland Department of Health, the General Hospital Reykjavik, the Joseph Levy Charitable Foundation, the Wellcome Trust Grant 055379, The Priory Hospital, the Neuroscience Research Charitable Trust, the University of Iceland and the Icelandic Science Council (HMDG [UCL]); Warner-Lambert, Parke-Davis Pharmaceuticals Company and NIMH Grant R01-MH44245 (LEL [US/International]); the Deutsche Forschungsgemeinschaft (HWM [Kiel]; MA WM, SGS, DBW [Indonesia]); the German Israeli Foundation for Scientific Research (BL; DBW); Mammalian Genotyping Service HV48141 (DBW [Indonesia]; CREST of JST (Japan Science and Technology Agency) TA [Japan]; Pfizer, Inc. and the SANE Foundation (LEL [Costa Rica]); the Israel Science Foundation, US. Israel Binational Science Foundation, the National Alliance for Research on Schizophrenia and Depression, and the Harry Stern Family Foundation (BL and YK [Israel]); the VA Merit Review Program (AF); recruitment of the NIMH Genetics Initiative sample was supported by NIMH Grants 5 UO1MH46318, UO1MH46289 and UO1MH46276; the Taiwan Schizophrenia Linkage Study was supported by NIMH Grant 1R01 MH59624-01 and Grant NHRI-90-8825PP, NHRI -EX91,92-9113PP from the National Health Research Institute, Taiwan, and support from the Genomic Medicine Research Program of Psychiatric Disorders, National Taiwan University Hospital; the VA Linkage Study was supported by funds from the Department of Veterans Affairs Cooperative Studies Program; the US/Mexico/Central America study was supported by a collaborative NIMH grant (‘Genetics of Schizophrenia in Latino Populations’) (MH60881 and MH60875) to ME [University of Texas Health Science Center at San Antonio], R Mendoza [University of California at Los Angeles-Harbor], HR [University of Costa Rica, San Jose, Costa Rica], A Ontiveros [Instituto de Informacion de Investigacion en Salud Mental, Monterrey, Mexico], HN [Medical and Family Research Group, Carracci SC, Mexico City, Mexico], and R Munoz [Family Health Centers of San Diego, CA].