Linkage analysis is optimal for detecting rare disease alleles of large effect that co-segregate with disease in families, and has been used very successfully in investigations of rare mendelian single-gene disorders. In essence, linkage looks for alleles that have been inherited by affected offspring and not inherited by the unaffected offspring in a family. Recombination events between the marker and the disease can be used to determine the most likely genetic distance between these two entities, because the greater the genetic distance, the more recombinations are expected. The object of linkage analysis is to estimate the recombination fraction (θ
) and test whether a deviation from the null hypothesis of no linkage (50% recombination or θ
= 1/2) is significant. Likelihood-based tests can be maximized over different values of θ
, and a likelihood ratio generated for each position, which is converted to a LOD score by logging (base 10) this ratio. In general, a value of +3 or more is considered significantly positive evidence for linkage and a value of −2 or less is considered significant evidence for exclusion. However, in genome scans, these values are increased to account for multiple testing, but the specific value depends on the analytical method and the family structure employed [34
]. For a parametric analysis, one recommendation for significant and suggestive results corresponds to a LOD score of 3.3 and 1.9, respectively. For a non-parametric sib-pair analysis, the corresponding recommended pointwise p values are 0.000022 (LOD of 3.6) for significant and 0.00074 (LOD of 2.2) for suggestive linkage. Alternative guidelines have been suggested by both Morton [35
] and Elston [36
In complex traits, genetic heterogeneity and reduced penetrance can contribute to an inflated recombination fraction, making localization of etiological loci prone to error. The variability in phenotype of schizophrenia and the overlap with related neuropsychiatric disorders has led to a generally held assumption of genetic heterogeneity. This assumption is further substantiated by the replication of specific susceptibility loci on several different chromosomes, though those loci have yet to be confirmed by the elucidation of the gene defect. An allowance for heterogeneity can be incorporated into a LOD score analysis by generating a heterogeneity LOD score (HLOD), which can be maximized over different values of alpha, the proportion of families linked to that locus. This admixture model is implemented in the HOMOG program [37
] and in GENEHUNTER [38
], though the statistics are only likely to be accurate when the number of pedigrees is fairly large. Incomplete penetrance, as would be expected for schizophrenia susceptibility loci, further increases substantially the required number of families [37
]. These admixture tests are widely used, but their validity has recently been questioned due to the unverifiability of the underlying assumptions such as a low disease allele frequency and the accuracy of the phenocopy rate [39
]. The validity of the heterogeneity test in complex traits will be validated only by the discovery of a susceptibility gene in which the locus has been implicated by such an analysis. For schizophrenia, the issue of heterogeneity may be best addressed by using ethnically homogeneous samples and by prior partitioning of the families by a relevant phenotypic category or measure.
As LOD score tests require certain parameters to be specified, such as disease allele frequency and penetrance, non-parametric tests were developed to avoid the need to specify this information, which if incorrect, could generate erroneous results.
Non-parametric programs such as GENEHUNTER [38
], which generate non-parametric linkage (NPL) Z scores, were developed for this purpose. However, simulation studies have shown that the LOD score method is more powerful, as long as both a recessive and a dominant model are tested and penetrance is relaxed to allow for heterogeneity [40
]. Non-parametric affected sib-pair (ASP) methods have also been employed in genome scans, such as SIBPAIR [42
] and MAPMAKER/SIBS [43
], but these are not thought to be as powerful as LOD score methods [44
], and provide no information with regard to locus heterogeneity. Proponents of ASP methods claim that their findings will be more generally relevant than extended pedigree analysis, which may contain etiological genes relevant only for that family or a particular subgroup of families (or ethnic group). However, given the difficulties in finding etiological genes for schizophrenia, rare or otherwise, it would seem sensible to use the most powerful methods initially to maximize the chance of finding a gene, and subsequently to evaluate that gene in other sample groups.
Linkage continues to be the most likely avenue through which etiological genes for schizophrenia will be discovered. Given the difficulties in determining candidate genes for psychiatric disorders, using a method that requires no a priori knowledge of the genes involved, such as a whole-genome scan, is advantageous. Furthermore, larger, more homogeneous family collections have accelerated the pace of gene discovery by increasing the power and therefore maximizing the chance of detecting positive results and minimizing the chance of false negatives. As few linkage findings for schizophrenia have achieved the status of significant linkage according to present guidelines [34
], suggestive linkages may be substantiated by replication in additional data sets [45
]. Thus, the 15 published genome scans and additional regional linkage studies of schizophrenia have been compared for possible overlap of positively linked regions [46
]. However, difficulties in these comparisons arise because of differences in power, the use of different marker data sets, different analysis methods, and the inaccuracies of linkage as a mapping tool for complex traits.
Estimates of power within a data set can be helpful when evaluating and comparing results between different genome scans. However, only 5 [47
] of the 15 schizophrenia scans have estimated this quantity and of these, only 2 have calculated power under the hypothesis of heterogeneity [47
]. Under homogeneity, the study of Coon et al. [47
] had 70% power to detect linkage only if the marker was at the linked gene, and this value dropped to between 18–39% if 20% of families were not linked to this locus. Power to detect linkage in the study of Brzustowicz [51
] was > 75% under all models when > 90% of the families were linked, and 50–75% when 25% of families were not linked. The remaining studies, which estimated power under homogeneity only, reported power as related to λ
s values (sibling risk). Excellent power (> 90%) was reported by Levinson et al. [48
] to detect linkage for a λ
s of 10 and good power with either a λ
s of 3 (θ
= 0) or a λ
s of 5 (θ
= 0.05), but power was poor when λ
s was less than 3. Williams et al. [50
] reported power of > 95% for a λ
s of 3 and 70% power for a λ
s of 2. Lastly, assuming a θ
of 0.05, Shaw et al. [49
] reported a power of < 60% for a λ
s of 2, 3, or 4. However, these estimates of power will be greatly reduced in the likely event of locus heterogeneity. In addition, a general estimate of power can be obtained by examining the sample size and study design. The question of how much variation in localization can be tolerated to constitute an overlap between studies is a topic of much debate at the present time.