For the investigation of many inherited Mendelian diseases, researchers have used linkage analysis in families with several affected family members to identify putative involved genes. Linkage analysis attempts to identify a region (locus) of the chromosome or regions (loci) in the genome associated with the disease or trait by identifying which alleles in the loci are segregating with the disease in families. Geneticists use genetic markers that are evenly distributed throughout the genome to reduce the number of chromosomal regions to a handful that may harbor a disease gene. Simply put, this method exploits the biological reality that in meiosis I, genes located close to each other on the same chromosome are inherited together more often than expected by chance. The genes that are far apart will not inherit together because recombination will break up segments of the chromosome. Thus, if a set of marker alleles are segregating with the disease, those markers are assumed to be located near the disease gene (7
). Using linkage analysis, scientists determine the likelihood that the loci (genetic marker and disease gene) are linked by calculating the logarithm of the odds or lod score, which is a ratio of 2 likelihoods: the odds that the loci are linked and the odds that the loci are not linked or are independent. To take into account multiple testing and the likelihood of linkage prior to considering the genetic evidence, a lod score of 3 or more is used as an indication of statistically significant linkage with a 5% chance of error, though more stringent criteria have been recommended for genome-wide scans (9
). Two-point (ratio of the likelihoods that 2 loci are linked) and multipoint linkage (ratio of likelihoods at each location across the genome) analyses are standard analyses used in gene mapping. Once a location or set of locations suggestive of linkage are identified, researchers turn to finer mapping methods using either a more dense set of additional microsatellites or SNPs in a smaller region underlying the high lod score.
While linkage analysis remains a mainstay of gene mapping, it does have shortcomings. Both genotyping and phenotyping errors have devastating effects on the validity of the lod score. Locus heterogeneity (more than one causal gene) and clinical heterogeneity (multiple forms of the same disease with different etiology) can also pose serious problems. A pattern of inheritance or model must be assumed, and the researcher must estimate the frequency and penetrance of the disease gene. Therefore, the analysis is parametric. In late onset diseases, additional complications can arise when individuals with putative variant allele develop the disease later in life or in a much milder form (incomplete penetrance). Therefore, linkage analysis is best suited for Mendelian disorders, not common complex genetic disorders, unless the correlation between the genotype and the phenotype is known to be very robust (10
). Occasionally, a rare, high risk allele is found in patients with a rare, familial form of a common disorder, such as Alzheimer disease. Though the findings often have implications in the disease pathogenesis, the role of the rare mutation is limited for common sporadic forms of disease in the general population. In this series, an example of this is described by Bertram and Tanzi in their discussion of Alzheimer disease, in which mutations in the amyloid precursor protein and presenilin I and II lead to an overproduction of amyloid β protein, which is deposited in the brains of all patients with Alzheimer disease, regardless of the etiology (3
While linkage analysis is arguably the most powerful method for identifying rare, high-risk alleles in Mendelian disease, many consider genetic association analysis to be the best method for identifying genetic variants related to common complex diseases (11
). In contrast to linkage analysis, which involves scanning the entire genome or a very large segment, association analyses are best suited to interrogating smaller regions or segments of the genome. Association analyses are generally model free, or nonparametric, so the researcher does not have to assume a mode of inheritance is unknown. Unlike linkage analysis, where markers are identified, association studies determine whether or not a specific allele within a marker is associated with disease. Association studies can be conducted in a group of randomly selected patients and controls as well as in small families or affected sibling pairs. Thus, this approach is sometimes added to ongoing epidemiological or clinical trials and can be adapted for use with relatively small-sized families. Association analyses of candidate genes underlying quantitative traits such as body mass index as related to obesity or blood pressure in relation to hypertension are also feasible, as will be clear from the discussion by Majumder and Ghosh in this series (see pages 1419–1424; ref. 13
There is at least one important similarity between linkage and association analyses. Linkage analysis involves association within families, while genetic association analysis examines whether affected individuals share the common allele more often than do controls. Patients who share the variant allele may also share a common ancestor from whom the allele originated. In reality, researchers often do both linkage and association analyses. Linkage analysis is used for the genome-wide screen to identify candidate loci. The region is subsequently narrowed using linkage disequilibrium mapping, which is reviewed by Morton in this series (see pages 1425–1430; ref. 14
). Genome-wide association studies are now feasible and can provide an additional means for identifying genes related to complex disorders. This approach combines the best features of linkage with the strength of association approaches (12
). Figure illustrates the progression from the study of a population to the identification of a variant allele and subsequent functional analysis. Genetic epidemiologists often go back to the population in order to determine the population attributable risk, which is defined as the proportion of disease in the population that can be ascribed to the variant allele or risk factor of concern. It is based both on the relative risk (see Gordon and Finch, pages 1408–1418; ref. 15
) and the prevalence of the variant in the population.
Figure 1 Progression of gene mapping in genetic epidemiological studies. (i) Population from which the complex genetic disorder arose. (ii) One of several families included in the genome-wide scan. However, more recently, genome-wide association studies of unrelated (more ...)
Association studies also have limitations. Because linkage disequilibrium, cosegregation of a series of genetic markers or alleles, is sustained over only a short chromosomal segment, a large number of loci need to be tested to cover a region (or the genome if a genome-wide association is conducted). This increases the possibility of false-positive findings. Therefore, one cannot rely on the conventional threshold P
value of 0.05. With each test, the possibility of a false-positive result increases, requiring the need either for replication in an independent study or computer simulation (11
). For complex genetic disease studies, researchers can use computer simulation of 1,000 replicates of the family collection based on observed allele frequencies and recombination fractions to determine the threshold for statistical significance in order to reduce the possibility of false-positive results. For case-control studies, patients with disease and the comparison group of controls can differ in genetic background, introducing variables unrelated to the disease and causing a type of spurious association or confounding termed population stratification
. Finally, the number of subjects required for these studies can be large, particularly if the heritability or relative risk of the disorder or trait is low. In this series, Gordon and Finch review both the benefits and limitations of using association analysis in family-based and population-based studies to identify genes related to complex disorders (15