|Home | About | Journals | Submit | Contact Us | Français|
Etiological heterogeneity and complexity has hampered attempts to identify predisposing genes for schizophrenia. We sought to minimize the number of segregating genes involved by focusing on a population isolate with elevated disease prevalence. We exploited the well-established population history, and searched for disease susceptibility loci in families from two alternative founder lineages. We studied 28 schizophrenia pedigrees (123 nuclear families) from an outlying municipality on the eastern border of Finland. We divided the families based on their genealogy and defined two routes of immigration: southern and northern. We examined the kinship coefficients and allele frequency distributions within each group, and performed a linkage analysis based on 497 microsatellite markers across the genome. A high degree of historical relatedness was demonstrated by higher sharing of alleles than predicted by the relationships we identified within the previous four generations alone, as would be expected. Between the two subpopulations, allele frequencies were significantly different, consistent with their isolated genealogies. The southern families showed some evidence of linkage in a schizophrenia locus at 4q23 (Z=3.3) near our previous finding with quantitative variation in verbal learning and memory [Paunio et al. (2004); Hum Mol Genet 13: 1693–1702], while the northern pedigrees gave most significant evidence on 10q21 (Z=2.53). Joint analysis of families from both lineages suggested evidence of linkage only at 3p14 (Z=3.18). Thus the detailed genealogical information led us to identification of distinct linkage signals for schizophrenia susceptibility loci between the three analyses we performed.
Family, twin, and adoption studies support the hypothesis that schizophrenia etiology involves a combination of environmental, hereditary, and stochastic factors [Cannon et al., 1998; Tienari et al., 2000; Tsuang, 2000]. Several chromosomal regions have been suggested to contain predisposing loci to schizophrenia based on linkage studies [cf. Levinson, 2003; O'Donovan et al., 2003] and have revealed promising putative genes like neuregulin [Stefansson et al., 2002] and dysbindin [Schwab et al., 1995]. Despite the fact that some recent genome-wide association (GWA) study follow-ups have suggested that large enough replication samples may prove to harbor putative predisposing genes with modest effect size [O'Donovan et al., 2008], GWA studies overall for schizophrenia have been confusing [Craddock et al., 2006; Crow, 2008; Kirov et al., 2008; Shifman et al., 2008]. For this reason, we feel that it is wise to continue the search for linkage in small population isolates with reduced genetic and environmental heterogeneity.
The global prevalence of this symptomatically and etiologically heterogeneous psychiatric disorder is approximately 1%. As one would expect, the prevalence varies most substantially between small populations [Böök et al., 1978; Torrey, 1987; Youssef et al., 1991; Hovatta et al., 1997]. Reduced genetic, environmental, and cultural variation can be expected likewise in small isolated populations. With detailed population records available in Finland, it is possible to reconstruct very large and deep genealogies connecting individuals or families in which schizophrenia is found. We initially began to study schizophrenia a decade ago in a small internal isolate (IS) in Northeastern Finland, in which the age-corrected lifetime risk of schizophrenia was very high (3.2%), compared with the national average (1.1%) [Hovatta et al., 1997; Haukka et al., 2001]. This led to an intensive effort to gather and evaluate all available members of families containing schizophrenia patients in the isolate, and the connection of many such families into an extended mega-pedigree, containing up to four generations of all known genealogical links [Arajärvi et al., 2006].
The power of extended human pedigrees for mapping loci influencing common diseases can be seen for example in previous studies localizing two genes significantly effecting susceptibility to a parasite infection in a genetically isolated Nepalese population [Williams-Blangero et al., 2002], and identifying the USF1 gene in familial combined hyperlipidemia in the Finnish population [Pajukanta et al., 2004]. In our previous schizophrenia study with neuropsychological endophenotypes [Paunio et al., 2004], we found some linkage evidence for verbal learning and memory on 4q21 (Z=3.01, Zmp=3.84 and empiric P=0.031 for delayed memory Z=2.96, Zmp=3.4 and P=0.026 for verbal learning) and for visual working memory on 2q36 (Z=2.80, Zmp=2.08 and P=0.093). Our mega-pedigree construction revealed new demographic history details of the isolate with immigration routes via two rivers running in opposite directions. Our goal was to explore if this geographic subdivision would clarify the linkage signals by controlling for population substructure in our analysis of the genetic background of schizophrenia.
We drew study participants from a cohort consisting of all individuals born in Finland between 1940 and 1976. We screened this cohort for a history of hospitalization during the period from 1969 to 1998 due to a psychotic disorder, for anti-psychotic drug prescriptions, and for receipt of work disability pensions utilizing three nation-wide computerized databases: the Hospital Discharge Register, the Free Medicine Register, and the Pension Register. To find first-degree relatives and construct nuclear families we linked the unique identification codes of these affected individuals to the database at the National Population Register Centre. We chose families with at least one parent originating from the IS.
Our searches ascertained 446 eligible families from the isolate having at least one schizophrenia patient (code 295 for schizophrenia, schizoaffective disorder, or schizophreniform disorder). We found 273 (with a DSM-IV diagnosis of schizophrenia (DC1)) and additional 50 (with schizoaffective disorder, included in DC2) affected and 2,806 unaffected family members. Of these families, we were able to contact 89%, and the complete IS sample consisted of 1,073 DNA samples, with 283 affected who gave informed consent after complete description of the study. Further details of this ascertainment strategy are given elsewhere [Hovatta et al., 1998; Paunio et al., 2001].
We acquired all available inpatient and outpatient records. Two psychiatrists (in case of disagreement, three) made a blind consensus diagnosis to yield the best-estimate lifetime diagnosis according to the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV) criteria [American Psychiatric Association, 1994]. One of the assessors completed the Operational Criteria Checklist for Psychotic Illness (OCCPI), which consists of 90 items of psychopathology, pre-morbid functioning, and personal history [McGuffin et al., 1991].
We searched for all the genealogical links within the past four generations for all IS nuclear families to construct large pedigrees [see, e.g., Arajärvi et al., 2006]. For the current study, we divided the families according to the birth village of the grandparents within the isolate. Two distinct waterways, emptying in opposite directions, run into the Kuusamo municipality (IS) and have led to the formation of two separate subpopulations (Fig. 1). Three villages (Ala-Kitka, Vasaraperä, Posio) located on the northern waterway constitute the northern subpopulation, while four southern waterway villages (Kuusamo, Heikkilä, Lämsä, Poussu) comprise the southern subpopulation. For our family division, at least three of the four grandparents had to be born in the required subpopulation. Of the 125 nuclear families with both parents born in the isolate, 31 (25%) came from the northern and 68 (54%) from the southern villages, while only 11 families (9%) had an equal number of ancestors from both subpopulations, and 15 remained unknown. The Internal Isolate Northern subpopulation (IS-N) sample of the present study included 9 pedigrees (43 nuclear families) with 199 individuals with genotype data (62 affected). The Internal Isolate Southern subpopulation (IS-S) sample was composed of 19 pedigrees (80 nuclear families) and 330 genotyped individuals (116 affected). Seventy-seven of the families (62%: IS-S=65% and IS-N=56%) were included in our previous genome scan [Paunio et al., 2001].
We carried out a genome-wide scan using a set of 497 microsatellite markers across all the human autosomes and chromosome X from the CHLC-6 [http://gai.nci.nih.gov/CHLC/, Sheffield et al., 1995] set and added supplementary markers from the Généthon map [Dib et al., 1996]. We set up a polymerase chain reaction (PCR) using 20 ng of genomic DNA with denaturation at 95°C for 5 min, followed by 30 cycles at 95°C for 30 sec, 55°C for 30 sec, and at 72°C for 60 sec. We ran the gels on an Applied Biosystems (ABI) 377 DNA sequencer, using ABI Prism* 377 data collection software and analyzed the data with the ABI Prism* GeneScan* 2.0.2 with Genotyper 1.1.1.
We identified all genotyped pairs of second cousins from our genealogical reconstruction. To estimate allele frequencies, we randomly selected one genotyped index person with schizophrenia from each family and estimated their allele frequencies using MERLIN [Abecasis et al., 2002]. Genotypes of all markers were analyzed in the selected pairs of second cousins (n=31) to empirically estimate the proportion sharing 0, 1, and 2 alleles IBD (p0, p1, and p2) at each locus with the ALTERTEST option in PREST [McPeek and Sun, 2000]. We then estimated the Kinship Coefficient with the formula (p1×½+p2)/2 and compared it to the estimated 0.015625.
For testing whether allele frequencies were distributed identically in the IS-S and IS-N samples, we used a likelihood ratio test of the null hypothesis that allele frequencies were identical between populations, and evaluated the statistical significance by randomization.
For testing HWE in the IS-S and IS-N samples, we applied Fisher's exact test for HWE as implemented in Mendel [Lange et al., 1988]. Mega2 was used for manipulation of the data files [Mukhopadhyay et al., 2005].
We used two affecteds-only models for analysis, one assuming a dominant model and one assuming a recessive model, each assuming an absence of phenocopies [for reasons cf. Göring and Terwilliger, 2000]. We utilized the MLINK program of the linkage package [Lathrop et al., 1985] for the two-point LOD score analysis [Terwilliger, 2000] with marker allele frequencies estimated from our dataset for each locus. We performed linkage analysis with the complete extended pedigree structures as well as solely using the nuclear families without their historical interrelationships being specified.
By analyzing the kinship coefficients, we searched for the degree of inbreeding amongst schizophrenia families from the IS. The estimated kinship coefficients for the 31 pairs of apparent second cousins were 1.43 times higher than expected based on theory. In a simulation as well, in which we dropped markers (with allele frequency distributions as estimated from our dataset) through the genealogically reconstructed pedigree structures, the observed Kinship coefficients from the real data were significantly larger than expected in each of the second cousin pairs (Fig. 2a). This is consistent with the hypothesis that there are more connections among individuals than characterized by our four-generation pedigree reconstruction implies, and are thus likely to contribute to the linkage finding from these families.
Based on the IS population history, we anticipated that there could be significant genetic diversity between the schizophrenia families dependent on whether they were from the northern or southern villages (IS-S and IS-N, respectively) (Fig. 1; cf. Materials and Methods Section).
To test if allelic frequencies differed between these subpopulations, we compared the genotype frequencies between the populations using a likelihood ratio test comparing the allele frequencies in IS-S and IS-N families. For this analysis we utilized genotypes from all “unrelated” founders, or, when parents were not genotyped in the study, we used instead one independent sibling from each of the nuclear families, excluding relatives closer than first cousins. When comparing the genotypes of IS-S families to those of IS-N families, the distribution of the allele frequencies for 14% of the markers was significantly different (P ≤ 0.05)—roughly three times as many as would be expected by chance. We determined the P-values by a randomization test, and, as expected, the distribution of P-values for all markers over the genome was skewed toward 0, with mean value of 0.35 instead of the expected 0.5 (Fig. 2b).
Ascertainment bias can lead to skewed distribution of allele frequencies. Although our samples do not represent random population samples, they were ascertained in the same manner, for a particular phenotype (schizophrenia), using combined information from three different national registers (cf. Materials and Methods Section). We searched for the effect of potential contributions of deviation from HWE in the observed genetic differences. No major deviation emerged from the expected uniformed distribution of P-values over the genome in the HWE tests (mean P-value=0.49 in IS-S, and 0.48 in IS-N, with no statistical significance (P=0.31 for IS-S, and 0.29 for IS-N) (Fig. 2b)). Thus, the observed allelic diversity did not result from deviations from HWE in either of the samples and the findings on the DNA level were in accordance with the population history of Kuusamo with limited number of founders surviving from tight genetic bottlenecks.
We next analyzed the IS-S and IS-N samples separately to search for linkage evidence between the set of microsatellites and putative susceptibility loci for schizophrenia.
With marker D4S1647 on 4q23 (using affection status of schizophrenia or schizoaffective disorder together comprising DC2, and dominant mode of inheritance) we observed the highest LOD score (3.33) in the genome scan in the nuclear families of the IS-S sample (Table I). Among 26 IS-S nuclear families informative for linkage analysis, for example, having at least two affected individuals, 13 revealed some evidence of linkage (Z>0) at the position of best linked markers. In the IS-S pedigrees, marker D3S1766 on 3p14.2 gave the highest two point LOD score in the genome, with Zmax=1.95 (using schizophrenia status or DC1 and recessive model of inheritance). Most of the information came from the largest pedigree, H1, with 140 genotyped individuals, for which we obtained a Z=2.08 at Θ=0.12. Four other pedigrees provided weak positive LOD scores individually (Z=0.07–0.39) to D3S1766. The signal on 4q23 was derived from a relatively large number of nuclear families, reflecting the effect of genetic connections beyond the established pedigree structure to the linkage signal.
In the northern nuclear families, our maximum LOD score over the entire genome was 2.64 with marker D9S922 at 9q21.31 (DC2, recessive model of inheritance) (Table I). Among 15 informative IS-N nuclear families, only 6 revealed some evidence of linkage (Z>0) to D9S922. Additionally, D1S513 on 1p35.2 gave Zmax>2.0 in the northern nuclear families (Z=2.26; DC2, dominant model of inheritance). In the IS-N pedigrees the best genome-wide evidence of linkage was to GATA101E02 on 10q21.3 (Z=2.53, LC1, dominant model of inheritance). Six out of nine pedigrees were linked (Z>0), with the highest LOD score value for the A1 (1.59) and A2 (0.52) pedigrees (at Θ=0 for both).
Finally, we analyzed the combined data from both IS-S and IS-N families with markers from those genomic regions that had given some evidence of linkage in the previous analyses (Table II). In the combined sample, we obtained the best evidence for linkage for 3p (Zmax=3.18 with D3S1766 in the pedigrees), and also the signal on 2q increased slightly (Zmax=2.09 with D2S427 in the nuclear families). On the contrary, the evidence of linkage to 4q remained restricted to the IS-S and linkage to 1p and 10q was restricted to the IS-N families. For 7p-q (Z=1.78 with D7S492), 9q (Z=1.80 with D9S303), and 20p (Z=1.56 with D20S477) the evidence of linkage in the complete dataset remained relatively modest. Hence, the IS-S and IS-N subpopulations shared the locus on 3p14, while evidence for a locus on 4q was only detected in families from the southern villages, and 1p and 10q only in those from the northern villages.
To search for genetic susceptibility loci for schizophrenia, we tried to minimize potential sources of genetic and environmental heterogeneity through extensive genealogical analysis of families from an isolated geographical region of Finland. Special characteristics of the Finnish subpopulations are striking. Although the first settlers came 10,000 years ago along the retreating glacial ice, the habitation of the wilderness, an internal migratory movement referred to as late settlement, only began in the 1500s from a small Southeastern area to a large geographical region. These multiple historical bottlenecks make the Finnish population exceptionally interesting for genetic studies. Late settlement was sporadically inhabited by small founder groups. The consequences of multiple founder effects and subsequent isolation in Finland are apparent in their effects on reduced allelic diversity, and are reflected by the overrepresentation of 36 rare, mostly autosomal recessive “Mendelian” disorders [Norio et al., 1973; Peltonen et al., 1999] with one major founder and a more diverse genetic background [Peltonen et al., 1999] and underrepresentation of others like cystic fibrosis or phenylketonuria [Norio et al., 1973].
The Kuusamo municipality is located in the wilderness of Northeastern Finland near the Russian border, in the late settlement region. The first Finnish pioneer moved there in 1676, and 34 families consisting of 194 individuals lived permanently in the region by 1685. The Finnish inhabitation of Kuusamo is well documented for each year and each name. The great famine of 1695–1697 killed about half of the Finns and the majority of the sparse native forager population, the Saami, from the Kuusamo region. The two distinct local water routes emptied in opposite directions, along which settlers had moved upstream. This led to the formation of two separate subpopulations, northern and southern, consisting of altogether seven villages. When Bishop Forbus came to Kuusamo in 1718 and established parish registers, the population in the 165 houses numbered 615. In the 18th and 19th centuries, population growth in Finland was Europe's fastest, and particularly in rural Kuusamo, where the population grew from 2000 up till 17,000. The population of Kuusamo remained almost completely isolated until World War II. This exceptional history is in agreement with our previous finding of a clearly more extended inter-marker linkage disequilibrium (LD) in chromosomes gathered from the Kuusamo region compared to the rest of the late or early settlement regions of Finland (Fig. 3c) [Varilo et al., 2003]. Our finding here of a significantly higher (43%) amount of allele sharing in second cousin pairs also demonstrates this phenomenon.
We scrutinized the genetic diversity of schizophrenia in the Kuusamo population by taking advantage of detailed historical analyses. We detected significant differences in allele frequencies at microsatellite markers across the genome between families from the northern and southern villages, consistent with the known population history (Fig. 3). The finding of partially divergent putative susceptibility loci here for schizophrenia, by whole genome scanning, could also reflect increased homogeneity within the subpopulations. In families originating from the southern villages, we observed the highest genome-wide evidence of linkage at 4q23, while families from the northern villages provided none, nor did our previous analysis on the families from the overall isolate [Paunio et al., 2001]. Approximation from our previous attempts in these complicated pedigrees to estimate the distribution of the LOD scores in the IS data set, the current finding of LOD score 3.3 in the IS-S families has a genome-wide significance of ~0.05 [Paunio et al., 2001]. Recently we reported that three distinct quantitative traits of verbal learning and memory, derived from data from a California Verbal Learning Test performed on 598 individuals from Finnish schizophrenia families, gave evidence of linkage to the close vicinity of the current linkage peak (best evidence to D4S2361, located 14 cM centromeric of D4S1647) [Paunio et al., 2004]. Curiously, in analysis of those traits in the families from the southern villages alone, virtually no evidence of linkage is observed to 4q, despite the substantial (41%) overlap of the two samples. Other groups have also reported linkages of schizophrenia or related traits to chromosome 4q24-32 [Levinson et al., 1998; Straub et al., 2002]. Further research on this region will elucidate whether these findings based on the clinical diagnosis of schizophrenia or quantitative neurocognitive features, considered valid intermediate phenotypes, represent the same or distinctive genetic factors, or even if they are false positives.
In families from both northern and southern villages, we observed some evidence of linkage to the long arm of chromosome 3. The improved LOD score with the marker D3S1766 (from 1.58 to 3.18) is attributable to the accuracy of the pedigree structures and the incremental genotyped individuals [Paunio et al., 2004]. A recent meta-analysis of 20 schizophrenia genome scans revealed 3p25.3-p22.1 to reach a genome-wide significant P-value of 0.006 [Lewis et al., 2003]. Whether these findings are true positives and representative of the same or different chromosomal loci in the various samples remains to be confirmed.
Despite the current tendency of complex disease research towards large-scale GWA studies with single nucleotide polymorphisms, a powerful option for gene identification will continue to be the analysis of large families from genetically isolated populations [Varilo and Peltonen, 2005; Craddock et al., 2007]. The power per genotype of studies of large extended pedigrees is always higher for mapping genes than the power from case–control studies under all conceivable models [Williams-Blangero et al., 2002; Pajukanta et al., 2004]. Family-based study designs are effective for shared haplotype analysis among affected individuals and most probably still represent the most meaningful study design to establish initial locus findings, as in the initial characterization of neuregulin as a schizophrenia susceptibility gene in the Icelandic families [Stefansson et al., 2002]. Linkage analyses in pedigrees collected from different populations can provide numerous candidate loci for fine mapping efforts, and may eventually lead to identification of susceptibility genes and QTLs for schizophrenia. A disease-predisposing allele characterized from a small founder population may prove to be important through many populations, as was shown for variation associated to lactose intolerance in a few genealogically well defined Finnish pedigrees [Enattah et al., 2002]. Probably more often, however, distinct alleles will be identified in distinct populations. Even if the variant itself is not the most associated outside Finland, the same gene or even the genes on the same metabolic pathway may harbor additional variation in other populations, as geneticists have elucidated for numerous other disorders [Paloneva et al., 2002; Kyttälä et al., 2006; Tallila et al., 2008].
To summarize, based on well-defined population histories we divided an IS, with a particularly high age-related risk for schizophrenia, into two subpopulations with different demographic histories, proved that the subpopulations were genetically differentiated one from the other, and performed linkage analysis on the two populations separately, with dramatically different results. Whether these findings are indicative of etiological differences, or are purely stochastic events remains to be elucidated.
The authors are most grateful to the participating individuals, and to all the team workers involved in the sample collection. We wish to thank H. Juvonen, M. Muhonen, J. Suokas, K. Suominen, and J. Suvisaari for their participation in the diagnostic procedure and M. Schreck for her input on the computational issues. This work was supported in part by Millennium Pharmaceuticals, Inc., and Wyeth-Ayerst, and grant MH63749 from the National Institutes of Mental Health, and EU–SGENE–A large scale genome-wide association study of schizophrenia addressing variation in expressivity and contribution from environmental factors–LSHM-CT-2006-037761, and Academy of Finland–Centre of Excellence in Complex Disease Genetics.
Grant sponsor: Millennium Pharmaceuticals, Inc.; Grant sponsor: Wyeth-Ayerst; Grant number: MH63749; Grant sponsor: National Institutes of Mental Health; Grant number: MH63749.