|Home | About | Journals | Submit | Contact Us | Français|
A genomewide linkage scan was carried out in eight clinical samples of informative schizophrenia families. After all quality control checks, the analysis of 707 European-ancestry families included 1,615 affected and 1,602 unaffected genotyped individuals, and the analysis of all 807 families included 1900 affected and 1839 unaffected individuals. Multipoint linkage analysis with correction for marker-marker linkage disequilibrium was carried out with 5,861 single nucleotide polymorphisms (SNPs; Illumina 4.0 linkage map). Suggestive evidence for linkage (European families) was observed on chromosomes 8p21, 8q24.1, 9q34 and 12q24.1 in non-parametric and/or parametric analyses. In a logistic regression allele-sharing analysis of linkage allowing for intersite heterogeneity, genomewide significant evidence for linkage was observed on chromosome 10p12. Significant heterogeneity was also observed on chromosome 22q11.1. Evidence for linkage across family sets and analyses was most consistent on chromosome 8p21, with a one-lod support interval that does not include the candidate gene NRG1, suggesting that one or more other susceptibility loci might exist in the region. In this era of genomewide association and deep resequencing studies, consensus linkage regions deserve continued attention, given that linkage signals can be produced by many types of genomic variation, including any combination of multiple common or rare SNPs or copy number variants in a region.
Two major methods are currently available to scan the genome to detect disease susceptibility loci: genomewide linkage studies (GWLS) and genomewide association studies (GWAS). We report here on a GWLS of eight samples of families with multiple cases of schizophrenia (SCZ) using a dense map of single nucleotide polymorphism (SNP) markers, and the companion paper1 reports on a meta-analysis of thirty-two SCZ GWLS including the eight samples studied here.
GWLS uses hundreds or thousands of DNA markers to detect the broad regions (millions of base pairs) within which there are most likely to be disease susceptibility loci, based on the pattern of within-family correlations between marker alleles and disease. GWAS uses hundreds of thousands of SNPs that tag (serve as proxies for) most of the common SNPs in the genome, to identify small regions (tens of thousands of base pairs) likely to harbor susceptibility variants. GWAS can detect loci with much weaker genetic effects if they are due to common SNPs. For common, genetically complex disorders, GWAS have proven more successful than GWLS in producing robust and well-replicated associations.2 However, there are genetic effects for which GWLS can be more powerful, including loci with multiple rare pathogenic mutations in different families, or several different susceptibility loci in the same region.
The present study is a collaboration of seven research groups using pedigree samples collected by each group3–11 plus a publicly-available sample12, totalling over 800 pedigrees with ill individuals in constellations that are informative for linkage analysis. We previously carried out a set of studies of candidate linkage regions.13–16 We now report on a new genomewide linkage scan of the entire sample. Whereas previously around 70% of these families had been included in published linkage scans using microsatellite markers3,5,6,10,11,17–19, we have now scanned all available families using a set of almost 6,000 SNP markers genotyped with high accuracy, extracting on average around 90% of the possible linkage information from these pedigrees. In analyses of 707 European-ancestry pedigrees, significant linkage accounting for cross-site heterogeneity was observed on chromosome 10p, and suggestive evidence for linkage on chromosomes 8p, 8q and 12q, as well 9q when non-European families were included.
The sample is described in Tables 1 and and2.2. Recruitment by each research group has been previously described.3–12 Here, affected cases included probands with DSM-IIIR diagnoses of SCZ and relatives with SCZ or schizoaffective disorder, which co-segregates with SCZ in families20 and is often not differentiated reliably from SCZ.21 Consensus diagnoses were based on information from semi-structured interviews, psychiatric records and family informants.
Genotyping was carried out at the Center for Inherited Disease Research (CIDR) using the Illumina GoldenGate assay22 to analyze the Illumina version 4 linkage marker set of 6,008 SNPs. SNPs were excluded by CIDR (N=53) based on internal quality control (QC) criteria, and by the investigators (N=36) for more than 3 parent-child inheritance errors or deviation from Hardy-Weinberg equilibrium at p < 0.001, leaving 5,861 autosomal or X chromosome SNPs for analyses. deCODE23 map locations were provided by Illumina. There were 0.09% missing genotypes, 0.12% Mendelian inconsistencies prior to QC checks (0.00138% in the analyzed SNPs), and 0.002% discordant genotypes in 224 blind duplicate specimens. There were 132 DNAs excluded for poor performance in genotyping of a preliminary forensic panel or the full SNP panel; and 60 for inconsistency with reported sex, Mendelian inconsistencies greater than 0.5% or sample call rates less than 98%.
Pairwise identity-by-descent (IBD) proportions were analyzed for all pairs of subjects using PLINK24, and differences between specified and actual relationships within families were analyzed using PREST.25 As a result, 50 DNAs were excluded to resolve pairs of identical specimens, 3 families were excluded because genotypic relationships did not fit the family and 8 because the same family was found in two different samples (JHU-NIMH, JHU-ENH, ENH-NIMH or Cardiff-VCU). Pedigree structures were also corrected (e.g., half-sib vs. full-sib relationships) as required.
Because of the high accuracy of genotyping with a dense SNP map that facilitated analysis of relationships, and the enlarged samples, the present data replace previous analyses of candidate regions by this collaboration for this narrow phenotypic model.13–16
Families were assigned to predominantly European (EUR), African-American or African-European (AFR), or “other” (OTH) groups based on STRUCTURE26 analysis of 49 independent autosomal SNPs which had large allele frequency differences between ancestries (0.5–0.69 EUR vs. AFR, 0.3–0.47 for EUR vs. OTH) in this sample based on investigator-reported ancestries, or based on public databases. EUR or AFR families had an estimated 70% or more ancestry from that group, otherwise the family was considered OTH. Members of these groups had a mean 98% or 96% ancestry, respectively, from that group. Analyses were carried out for EUR families and then for ALL families (using allele frequencies estimated separately for EUR, AFR and OTH groups27).
The planned primary multipoint linkage analysis of EUR families used the SPAIRS statistic (ZLikelihoodRatio and its Kong-Cox equivalent lod score28 under the exponential model) computed with MERLIN29 using MERLIN’s correction for LD within clusters of markers based on a threshold of r2 greater than 0.05 for consecutive pairs of markers30. Prior to analyses, unlikely genotypes were detected and excluded using MERLIN.31
Additional analyses included: (1) SPAIRS analysis of ALL families using ALLEGRO 2.032 (analyzing each ancestry subset with it own allele frequencies) with a “no-LD map” of 4,365 autosomal and X chromosome SNPs with no marker-marker r2 greater than 0.05 (because ALLEGRO cannot correct for LD); (2) the Kong-Cox exponential SALL statistic which gives more weight to larger families (results were similar and are not shown here, but are included in online supplementary files); (3) parametric heterogeneity lod score (hlod) analysis under dominant (risk allele frequency = 0.05; penetrances = 0, 0.001 and 0.001) and recessive (risk allele frequency = 0.1, penetrances = 0, 0 and 0.001) models, using MERLIN for EUR and ALLEGRO (and the no-LD map) for ALL families; (3) logistic regression analysis of IBD sharing33,34 to assess heterogeneity across sites, linkage while accounting for heterogeneity, effects of parent-of-origin of each allele and of sex of the affected pair (M-M, M-F, F-F), and interactions between linkage regions (see online Supplementary Table 5 for description).
Thresholds for significant (0.05 or fewer peaks expected genomewide per genome scan) and suggestive (less than 1 peak per scan) evidence for linkage were determined by simulation for nonparametric and parametric analyses analysis using data generated under the assumption of no linkage. “Peaks” were defined as local maxima at least 30 cM from another peak. The empirical threshold for parametric analysis was corrected for two tests by taking the maximum result of the dominant and recessive analyses of simulated replicates at each point.
All genotypic data for this study will be made available to qualified scientists by the NIMH Center for Genetic Studies (nimhgenetics.org).
The empirical lod or hlod thresholds for suggestive linkage were 1.94 for nonparametric and 2.21 for parametric tests, or 3.26 and 3.66 for significant linkage. Mean information content was 0.88 (S.D. 0.026) using MERLIN’s entropy measure (reflecting potential information with fully informative markers) and 0.908 (SD, 0.028) using ALLEGRO’s exponential measure (measuring potential information given the constellation of genotyped relatives).
Figure 1 shows Kong-Cox lods and dominant and recessive hlod scores for EUR and ALL families. Table 2 lists the maximum lod and hlod scores in each analysis on each chromosome. The nonparametric analysis of EUR families, considered the primary analysis here, produced suggestive evidence for linkage on chromosome 8p21 (in EUR families, lod = 2.00, 45.9 cM; in ALL families, lod = 2.51, 46.4 cM, with the latter, larger score at 26.61 bp). The dominant and recessive analyses were considered an alternative approach, and suggestive evidence for linkage (taking both tests into account as noted above) was observed on chromosomes 8p21, 8q24.1, 9q34 and 12q24.1 in non-parametric and/or parametric analyses (see Table 1 for details). Evidence for linkage was most consistent for chromosome 8p21 (five of the six analyses).
Table 3 shows the results of the logistic regression analysis of linkage while allowing for intersite heterogeneity, in EUR families. Highly significant genomewide evidence for linkage with heterogeneity was observed on chromosome 10p12 (see Table 3 legend for additional details). Chromosome 8p21 again produced suggestive evidence for linkage both with and without heterogeneity in this analysis. Results of tests for intersite heterogeneity (i.e., the difference between lods with and without allowing for heterogeneity) are shown in online Supplementary Table 1. Significant heterogeneity was observed on chromosome 10p (45.6 cM) and 22q11.1 (0 cM).
Supplementary online files provide details of parametric and nonparametric linkage scores, genetic location and information content for each analyzed point for each analysis in the entire sample for EUR and ALL families, as well as the full and No-LD marker maps. Online files for the companion meta-analysis paper1 provide ranked results for each of our 8 samples separately and for EUR and ALL families separately.
No significant chromosome-wide effects were observed for sex of the affected pair or parent of origin (online Supplementary Tables 3 and 4). Online Supplementary Table 5 shows results of interaction analyses for all pairs of 18 regions with Kong-Cox lod scores greater than 1. No genomewide significant interactions were observed. The online table legend includes a list of the most significant empirical interaction p-values.
Suggestive evidence for linkage was detected on chromosome 8p21 in multiple analyses: in nonparametric, dominant and recessive analyses of 707 European-ancestry families, and in nonparametric and dominant analyses of all 807 families.
This same region produced suggestive evidence for linkage (and the largest peak), in the independent Molecular Genetics of Schizophrenia (MGS) sample35 of 409 European-ancestry and African American families. Our peak results were between 45.9–46.8 cM (between rs1561817 and rs9797, 26.59–27.65 Mb; deCODE linkage map and genome build 36.3 physical locations). The MGS peak was at 43.3 cM for all families (near rs196886 at 24.79 Mb), while in European-ancestry families it was at 15.3 cM (8p23, near rs7834209 at 6.9 Mb), with a slightly smaller peak at 34.6 cM (8p21, near rs34393111 at 20.28 Mb), and suggestive evidence for linkage extended beyond our peak scores. Pulver and colleagues were the first to report preliminary36 and then strongly suggestive evidence10 for linkage of SCZ to chromosome 8p markers in much of the JHU sample that is included here. We previously reported support for 8p linkage in a study of microsatellite markers in a majority of the families in the present analysis13, consistent with results in this enlarged sample.
The most widely-studied 8p candidate gene is NRG1 (neuregulin 1), found to be associated with SCZ by Stefansson et al.37 in a linkage disequilibrium mapping study of a suggestive linkage peak observed in Icelandic families (there were two 8p peaks in that analysis, with the second one closer to ours), with supportive evidence in some datasets.38 There are several indications that, if there is linkage on chromosome 8p, it is not entirely explained by NRG1. Here, lod scores within one unit of the maximum (1-lod interval) were observed between 21.37–29.36 Mb, whereas NRG1 is between 32.53–32.74 Mb. (The 1-lod interval is a reliable confidence interval in studies of Mendelian disorders, but not for complex disorders.) In the companion meta-analysis paper1, the second “bin” on chromosome 8 (8.2, 28.1–56.2 cM, ~ 15.7–33 Mb) produced the strongest (suggestive) evidence for linkage in 22 European-ancestry datasets, and was ranked eighth in the analysis of all 32 datasets. NRG1 is at the centromeric edge of that bin (~ 55.7 cM), so one would expect that if it explained the linkage, the signal would extend equally in the centromeric and telomeric directions, but support for linkage was not observed in more centromeric bins (bin 8.3 in the primary analysis, from 56.2–84.3 cM; bin 8.4 in the “20 cM” analysis from 56.2–75 cM; or bin 8.3 in the “30 cM” shifted analysis from 42.15–70.25 cM).1 We hypothesize that there is weak linkage to SCZ on chromosome 8p, due to one or more loci in which there are multiple rare risk-associated SNPs and/or structural variants and/or multiple associated common SNPs. There are other candidate genes on 8p (see discussion in the meta-analysis paper1), it is not yet clear what accounts for the evidence for linkage in this region.
Suggestive evidence for linkage was observed on chromosome 9q in the dominant analysis of all families. Support for this region in other analyses was modest, but not substantially different than the evidence for 8p. This region is not supported by previous linkage findings or the meta-analysis.1
Genomewide significant evidence for linkage allowing for intersite heterogeneity was observed on chromosome 10p12 at 45.6 cM (21.28 Mb). We previously reported modest evidence for heterogeneity in this region14, and in that report we also reviewed the evidence for 10p linkage reported previously in the NIMH-SGI, VCU/Ireland and part of the Bonn/Perth samples studied here. A significant signal is now seen in the present expanded sample, with a denser marker map, due to allele sharing in the Paris/CNRS, NIMH-SGI and (to a lesser degree) the VCU samples (online Supplementary Table 2). There is no indication of a high-penetrance signal from a small subset of families: the NIMH sample includes small nuclear families from the general U.S. population; and although there are some large, extended pedigrees in the Paris/CNRS sample from La Réunion Island, most of the families with positive lod scores were small families from the general French population, and no single family had a lod score (Kong-Cox, dominant or recessive) greater than 1.4. Because we combined families from eight previously-colected datasets, we do not have a consistent set of clinical ratings across samples to search for a possible clinical basis for linkage heterogeneity. The 10p peak is not supported by meta-analysis1, and is far from the chromosome 10q peaks observed between 100–110 cM in two independent studies39,40.
Significant heterogeneity (but not linkage with heterogeneity) was seen on chromosome 22q at 15 Mb, adjacent to the typical region (17–21 Mb) of the 22q11 deletion syndromes whose manifestations include SCZ in approximately 20% of cases.41 This deletion was detected in less than 0.5% of SCZ cases in two recent large studies.42,43 No consistent association signals have been observed to date between SCZ and common SNPs in candidate genes within the deletion region.
Two other regions, on chromosomes 8q24.1 and 12q24.1, produced suggestive evidence for linkage in at least one analysis, both reportedly linked to mood disorders rather than SCZ. On 8q, a combined analysis of genotypes from 11 linkage scans (1,067 families) produced a nonparametric lod score of 3.40 at 134.5 Mb, just telomeric to our 1-lod interval, in an analysis of bipolar-I and bipolar-II cases, but the signal was much smaller in an analysis of only bipolar-I.44 Given that by definition only bipolar-I can include psychosis (usually in around half of cases), one would not predict that the same locus in this region would account for linkage signals to bipolar disorder and SCZ. On chromosome 12q, there have been reports of linkage to major depressive45,46 and bipolar disorders (see review by Barden et al.47) with peak locations ranging from 97.4–126.5 Mb -- 116–126 Mb in bipolar studies, close to our peak at 111 Mb. Neither region was supported by the SCZ linkage meta-analysis.1
In the linkage meta-analysis1, genomewide significant evidence for linkage was detected on chromosome 2q (132–162 cM, 121–152 Mb), with some support for linkage across a broad region (118–176 cM and 206–235 cM). In the present study, we see a jagged line across chromosome 2q (Figure 1), reflecting diverse peaks in different samples, although without statistically significant evidence for heterogeneity. Our largest peak was in the nonparametric EUR analysis at 206.6 cM (210.87 Mb). Thus, in our data and in the meta-analysis of 32 datasets, linkage evidence on 2q is intriguing but poorly localized. Thus, in our data and in the meta-analysis of 32 datasets, linkage evidence on 2q is intriguing but poorly localized. It was recently reported that a SNP in ZNF804A, at 185 Mb on 2q, produced genomewide significant evidence for linkage when a large collaborative SCZ association sample was combined with bipolar disorders cases from the Wellcome Trust Case Control Consortium project.48
What is the relevance of linkage studies as the field moves on to GWAS and large-scale resequencing methods? Meta-analysis provides some support for quite modest linkage signals.1 Thus, no gene is likely to have a large effect on overall population risk. In this situation, GWAS methods have better power2, but (currently) only for common SNPs. GWAS technologies can also detect some but not all copy number variants (CNVs). Recent studies suggest that rare deletions on chromosomes 1q and 15q (as well as 22q11) predispose to SCZ42,43,49, 50; and that SCZ cases also have a small but significant excess of very rare CNVs, some of which might therefore also be pathogenic. These findings support the more general hypothesis of multiple rare genomic events (SNPs, CNVs, other structural changes) influencing risk for a common disease.51–53
High-penetrance CNVs like those on 1q and 15q have effects such as mental retardation and/or autism, consistent with the observation that they reduce fertility and thus are usually de novo mutations rather than transmitted in families. But most SCZ risk variants probably have smaller effects: the risk to probands’ siblings is around 5%20, and if one allows for a small proportion of cases to be due to high-penetrance CNVs, the remaining risk should be due to lower-penetrance variants which would thus be transmitted in families. It is possible that weak SCZ linkage signals are in regions where there are multiple rare as well as common risk variants, whose aggregate frequency and effects are sufficient to produce a linkage signal, and whose effects on fertility are not too severe. We refer here both to deleterious transmitted and/or recurring sequence and structural polymorphisms with low population frequencies, and to very rare and thus very deleterious variants that segregate in different families, i.e., extreme allelic heterogeneity.
One approach to finding these variants would be high-throughput resequencing studies of linkage regions. For example, significant differences have been found in the proportions of high- and low-risk individuals carrying very rare non-synonymous coding SNPs for some diseases.54–55 This approach has not yet been attempted for schizophrenia in a large sample, thus we lack information to predict the power or optimal design of such studies. If a region in fact contained a sufficient number of rare high-risk variants to produce a linkage signal, then it might be possible to detect them via resequencing, although success would depend on the the proportion of subjects of families carrying such variants, and by the extent of locus heterogeneity, i.e., if a small proportion of cases carried rare risk variants at a large number of loci in a linkage region, studies of a feasible sample size might not detect them. It is not known whether it will prove most productive to resequence exons, entire genes with their nearby regulatory regions, or entire linkage regions (given that there are likely to be relevant unannotated intergenic regulatory sequences). Family-based samples might be particularly useful for resequencing studies of linkage peaks, if rare variants were contributing to the signal. But it also possible that because these variants are rare precisely because they reduce fertility, they could be more easily found in case-control samples, which are also larger. In our view, multiple strategies should be attempted.
It has also been suggested that the power of GWAS can be increased by upweighting evidence for association based on linkage scores (resulting in a small downweighting of other regions).56 Whether or not this formal approach is used, it would be reasonable to consider linkage findings when selecting genes and regions for dense LD mapping and large-scale resequencing studies.
Supplementary information is available at the Molecular Psychiatry website.
The authors are grateful to the many family members who participated in the studies that recruited these samples. This work was supported by National Institute of Mental Health grants 7R01MH062276 (to D.F.L., C.L., M.O. and D.W.), 5R01MH068922 (to P.G.), 5R01MH068921 (to A.E.P.) and 5R01MH068881 (to B.R.). For the NIMH sample, data and biomaterials were collected in three projects that participated in the National Institute of Mental Health (NIMH) Schizophrenia Genetics Initiative. From 1991–97, the Principal Investigators and Co-Investigators were: Harvard University, Boston, MA, U01 MH46318, Ming T. Tsuang, M.D., Ph.D., D.Sc., Stephen Faraone, Ph.D., and John Pepple, Ph.D.; Washington University, St. Louis, MO, U01 MH46276, C. Robert Cloninger, M.D., Theodore Reich, M.D., and Dragan Svrakic, M.D.; Columbia University, New York, NY U01 MH46289, Charles Kaufmann, M.D., Dolores Malaspina, M.D., and Jill Harkavy Friedman, Ph.D. Genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University, contract number N01-HG-65403. The NIMH Cell Repository at Rutgers University (Drs. Douglas Fugman and Jay Tischfield) and the NIMH Center for Collaborative Genetic Studies on Mental Disorders (Dr. John Rice) made important contributions to this project.