The field of psychiatric genetics has used two different methods to attempt to identify individual risk genes: linkage and association. These are fundamentally different approaches with different study designs applied, until recently, to very different research questions. It is important to understand both in order to understand why association approaches have become the norm in followup studies of linkage regions as well as the primary current approach in genome-wide studies.
Humans are ~99.9% identical at the nucleotide level on average. Molecular genetic studies depend critically on the remaining 0.1% (~3 million nucleotides) where variation occurs between individuals, collectively known as genetic polymorphisms or markers. Linkage studies generally use short tandem repeat polymorphisms (STRs). STR alleles are differing numbers of a repeating unit of nucleotides and have specific sequence lengths and molecular weights as a result, allowing them to be separated and identified. STRs are very common and tend to be extremely polymorphic (ie, to have many alleles - where an allele is one of the possible variants that exist in a population at a particular genetic locus) and therefore to have high heterozygosity (the proportion of individuals who have two different alleles at the marker locus). This high heterozygosity is important for linkage analyses, which require a unique allele at each position on each homologous chromosome to be informative.
In contrast, single nucleotide polymorphisms (SNPs) are changes of a single base or insertion/deletion variation up to a few nucleotides in size. SNPs generally have only two alleles, and have lower heterozygosity and lower information content. Association studies tend to use SNPs as the marker of choice, because alleles of these markers evolve more slowly than those of STRs and preserve more of the evolutionary relationships on which genetic association is based. SNPs can also be used for linkage, but about ten times as many SNPs as STRs are required to capture the linkage information.
In marker genotype data from families, new combinations of alleles at a series of markers on individual chromosomes are observed in each generation. This recombination of alleles is observed because there is at least one physical exchange of material (or crossover) between each homologous chromosome pair in every meiosis (Figure 1). Recombination between loci on different chromosomes (because of independent assortment of homologous chromosome pairs) or far apart on the same chromosome (because of crossover at meiosis) is observed 50% of the time. Linkage is observed between loci in close proximity on a chromosome because their alleles are separated by crossover less than 50% of the time.
Mendelian diseases are caused by mutations in a single gene at a single chromosomal location, so disease phenotypes can be treated as marker alleles in linkage analysis. Because these illnesses are rare, for a dominant disorder, the rare risk allele must segregate from one parent (often affected or with family history) into affected offspring, or arise as an even rarer de novo mutation. By following the segregation of marker alleles from the affected lineage into offspring, linkage between markers and phenotypes can be observed when affected offspring inherit a particular set of marker alleles (and thus a specific parental chromosomal segment) compared with their unaffected relatives.
While linkage occurs in families, association is a population-based phenomenon. Genetic association studies test whether specific alleles at variable sites are more common in individuals affected by a disease (cases) than individuals not affected by the disease (controls). This association between allele and phenotype can occur for two reasons. Either the allele being studied directly influences risk for the disorder or, more commonly, the allele is in linkage disequilibrium (LD) with the disease-predisposing allele. Linkage disequilibrium means that specific alleles at two nearby loci tend to occur together in an entire population. Linkage, (the cosegregation of a chromosome region and a disease observed in families), occurs at scales of tens of millions of base pairs because of the limited number of recombinations observed in each generation of a family. Association (and LD) are seen at scales of thousands to tens of thousands of base pairs, because the number of recombinations present in the evolutionary history of a population is large, meaning that the physical distances between loci in LD must be correspondingly small if recombination is to occur rarely (if ever) between them.
LD occurs because a new allele always arises on a specific background chromosome (and its existing haplotype of marker alleles), and will, until separated by recombination, only exist in conjunction with the other alleles present on that background. Over time, the original LD (and thus the genetic association) between more distant loci decays as a result of recombination events, while the rarity of recombination between nearby loci preserves the original LD and association. Association can also be detected spuriously, eg, if observed differences in allele frequency are due to population differences rather than to true association between marker and phenotype. Association approaches are also substantially reduced in power in the presence of allelic heterogeneity (the existence of more than one risk allele at a locus), while this phenomenon has no effect on the detection of linkage.
Challenges associated with gene identification in psychiatric and substance-use disorders
A number of features of psychiatric and behavioral phenotypes contribute to an overall reduction in study power. Association is more powerful, generally for detecting genes of small effect,39
but the specific features of psychiatric and behavioral phenotypes also reduce the power of association studies.
First, psychiatric phenotypes are almost certainly influenced by multiple common alleles of small effect in many genes. Both linkage and association study designs are more powerful for alleles of large effect size, and are much less powerful when examining highly polygenic phenotypes. Replication studies are hampered by the need for sample sizes larger than the discovery sample (in order to maintain power) and stochastic sampling variation, the expected variation in the extent to which any specific risk factor is present (and association detectable) in any particular sample.
Second, interactions between genes (GxG) or between genes and environmental variables (GxE) seem necessary to account for observed risks, but we rely heavily on analytic approaches that assess single genes. In a few cases, genes with known molecular interactions with the candidates have also generated replicated association. Environmental risk factors remain largely unknown and are difficult or very expensive to test in many samples.
Third, these phenotypes are common, so the liability alleles seem likely to be common, although increased rates of rare deletions and duplications (structural or copy number variants) in cases have been observed multiple times and suggest that rare variation may also contribute to risk in a proportion of cases. The common risk variants are expected to occur with relatively high frequency in the general population, reducing contrast between affected and unaffected individuals and reducing power. The impact of individual rare structural variants in the subset of cases where they are observed is harder to assess currently, but the observation of an aggregate increase appears robust, further increasing the apparent etiological complexity.
Fourth, the expected frequency of risk alleles and the clinical variability in presentation, course, and outcome suggest that the etiology of individual cases may be heterogeneous, derived from different specific genes or alleles between individuals. Allelic heterogeneity substantially reduces the power of association designs.
Fifth, diagnostic boundaries are difficult to draw, and the best phenotype to study is a complex choice. It is critically important to consider this last point and the phenotypes that yield the strongest evidence in some detail.