|Home | About | Journals | Submit | Contact Us | Français|
Both genetic and nongenetic risk factors, as well as interactions and correlations between them, are thought to contribute to the etiology of psychiatric and behavioral phenotypes. Genetic epidemiology consistently supports the involvement of genes in liability. Molecular genetic studies have been less successful in identifying liability genes, but recent progress suggests that a number of specific genes contributing to risk have been identified. Collectively, the results are complex and inconsistent, with a single common DNA variant in any gene influencing risk across human populations. Few specific genetic variants influencing risk have been unambiguously identified. Contemporary approaches, however, hold great promise to further elucidate liability genes and variants, as well as their potential inter-relationships with each other and with the environment. We will review the fields of genetic epidemiology and molecular genetics, providing examples from the literature to illustrate the key concepts emerging from this work.
Des facteurs de risque génétiques et non génétiques, et leurs interactions et leurs corrélations mutuelles, participeraient à l'étiologie des phénotypes psychiatriques et comportementaux. L'implication des gènes de susceptibilité est régulièrement confirmée par l'épidémiologie génétique. Des études de génétique moléculaire ont été moins heureuses dans l'identification des gènes de susceptibilité, mais des progrès récents suggèrent que plusieurs gènes spécifiques participant au risque ont été identifiés. Pris collectivement, les résultats sont complexes et contradictoires avec un variant ADN unique présent dans un gène, influant sur le risque à travers les populations humaines. Les variants génétiques spécifiques influant sur le risque sont peu nombreux à avoir été identifiés sans ambiguïté. Les approches actuelles sont cependant très prometteuses pour l'identification future des gènes de susceptibilité et de leurs variants, de leurs interrelations éventuelles les uns avec les autres et avec l'environnement. Dans cette revue, nous analyserons les domaines de l'épidémiologie génétique et de la génétique moléculaire, des exemples de la littérature illustrant les idées phares de notre travail.
Our knowledge of psychiatric and substance-use genetics comes from two key fields of research, both dynamic areas in rapid change. First, genetic epidemiology asks whether there is risk in excess of the population baseline in the relatives of cases, and, if so, whether the excess risk is attributable to the genetic factors or the environments they share. Beyond simply estimating heritability, genetic epidemiology has evolved to address more sophisticated questions, such as whether liability genes have the same effects across the lifespan, how they may influence multiple disorders, and how they might interact with environmental risks.
Genetic epidemiology of psychiatric and behavioral phenotypes has consistently demonstrated that: i) genetic risk factors are, in aggregate, important etiological components; ii) they cannot completely account for observed risk, meaning these phenotypes are multifactorial traits, with important nongenetic (or environmental) contributing factors; and iii) the risk alleles appear to be of small effect size and to occur in a large number of genes. Psychiatric and behavioral phenotypes are influenced by a large number of risk factors that individually are within the range of normal human variation and produce modest individual increases in risk.
The initial goal of the second major research area, molecular genetics, is to identify genes which influence these phenotypes and to identify the specific risk variants within them. There are substantial differences in DNA sequences between individuals, and gene identification methods test whether specific alleles at these variable positions are more common in affected than in unaffected individuals, most commonly with linkage studies (in families) and association studies (primarily in case/controls, but also in numerous other designs). We will discuss the underlying causes of these two genetic phenomena, the methods for detecting them, and the limitations of each.
The second goal of molecular genetics is to identify specific risk alleles and to use functional studies to elucidate how a gene functions normally, how the risk allele alters normal function, and how these alterations contribute to disease. The aim of this work is to explain the aggregate genetic risks observed through the effects of risk alleles on gene expression, protein structure and function, and/or biological processes. This area remains largely unsuccessful to date for complex traits generally.
In this review we focus on the basic methods of genetic epidemiology and molecular genetics, and provide examples, across a variety of psychiatric and substance use disorders, of questions currently being addressed. In contrast to this first section on genetic epidemiology, the sections on molecular genetics focus narrowly on schizophrenia, where there is a much longer history of molecular genetic studies, because we judged that emphasizing a single disorder would provide a more coherent example of ongoing research progress and challenges.
The most fundamental question addressed by psychiatric genetic epidemiology is whether a particular trait or disorder shows evidence for genetic influence. Both twin and adoption studies provide methods to address this question and tease apart the degree to which genetic and environmental influences are important on a given outcome. Twin studies accomplish this by comparisons of the similarity of monozygotic twins (MZs; who share 100% of their genetic variation), with dizygotic twins (DZs; who share on average just 50% of their genetic variation). Adoption studies compare similarity among adopted-apart biological relatives, who share genetic variation, but not their environments, and adoptive relatives, who share their environment, but not their genetic makeup. Through these comparisons, we can quantify the degree to which genetic influences contribute to individual differences in risk, a statistic commonly referred to as the heritability of the trait. These study designs have been applied to virtually all psychiatric disorders and to a number of related traits, yielding compelling evidence that genetic influences play a critical role in virtually all psychiatric outcomes. There is considerable variability in the magnitude of genetic influence across different disorders. On the high end are disorders such as schizophrenia, bipolar disorder, and autism, which yield heritability estimates of the order of 80% or higher. Alcohol and other drug dependence shows moderate heritability, in the range of 50% to 60%. On the lower end of the spectrum, though still showing significant evidence of genetic influence, are anxiety and depressive disorders, as well as eating disorders, which yield heritability estimates of ~30% to 40%. So, while there is variability in the magnitude of importance of genetic effects, it is widely accepted that a significant genetic component plays a role in virtually all psychiatric traits. It is a sign of the paradigm shift that has taken place in psychiatry that heritability estimates are no longer considered controversial, since the original studies finding evidence for genetic effects represented strong challenges to predominant views favoring environmental theories on the causation of most psychiatric conditions, ranging from schizophrenia to autism to alcohol dependence - disorders that are all now widely recognized as having genetic components.
While demonstration of heritability played an important role in altering fundamental assumptions about the etiology of psychiatric disorders, if not understood in their proper context, heritability estimates can also have a number of unfortunate side effects. Firstly, the heritability statistic created a dichotomy of genetic versus environmental influence - nature versus nurture. How much is genetic? How much is environmental? This is, as we hope to show, a somewhat arbitrary distinction. Genetic predispositions by necessity are expressed in the context of the organism's environment, and the environment can differentially affect individuals based on their unique genetic makeup. Further, many environments are not simply “imposed” on an individual; rather, individuals play an active role in selecting and shaping their environments. Accordingly, it is generally more informative to elucidate pathways of risk and show how genetic and environmental influences come together in this process, rather than trying to divide influence into that which is genetic and that which is environmental. Secondly, demonstration of heritability led to the idea that there were genes “for” a given disorder. More complex models that have examined genetic influences across multiple different conditions suggest that the Diagnostic and Statistical Manual of Mental Disorders (DSM) structure of psychiatric diagnoses often does not map onto the underlying genetic architecture of psychiatric traits. Genetic influences appear to be shared across many psychiatric conditions, and likely operate through mediating characteristics that alter risk for a number of different outcomes. Finally, static heritability estimates fail to capture the dynamic nature of genetic and environmental influences on psychiatric outcome. Heritability estimates are specific to the population under study. Lost in heritability estimates are potential differences across environmental conditions, across populations or gender, and across ages. Accordingly, genetic epidemiology has undergone an evolution in the kinds of questions being addressed. No longer is the question simply “Are genetic influences important on Trait X?” or even “How important are genetic influences on Trait X?”. Rather, the focus has shifted to addressing the complexities raised here, using the paradigm we have called advanced genetic epidemiology.
Parsing genetic and environmental influences into separate sources represents a necessary oversimplification, as for most traits we know about, genetic and environmental influences are inexorably intertwined. Most measures of the environment show some degree of genetic influence, illustrating the active role that individuals play in selecting and creating their social worlds.1 To the extent that these choices are impacted upon by an individual's genetically influenced temperaments and behavioral characteristics, an individual's environment is not purely exogenous, but rather, in some sense, is in part an extension and reflection of the individual's genotype. This concept is called gene-environment correlation or, perhaps more descriptively, genetic control of exposure to the environment. It is likely an important process in the risk associated with several psychiatric outcomes. For example, there is considerable evidence for peer deviance being associated with adolescent substance use.
However, individuals play an active role in selecting their friends, and multiple genetically informative samples have now demonstrated that a genetic predisposition toward substance use is associated with the selection of other friends who use substances.2-4 Interestingly, there is evidence that genetic effects on peer-group deviance show a strong and steady increase across development,5 suggesting that as individuals get older and have increasing opportunities to select and create their own social environment, genetic factors assume increasing importance. Another area where gene-environment correlation is known to play a significant role is in the risk pathways associated with depression. Stressful life events have been consistently associated with the manifestation of depression. However, there is evidence for genetic influence on the occurrence of stressful life events,6,7 indicating that an individual's predisposition plays a role in the likelihood that they will experience difficulties that are then associated with risk for depressive episodes. For example, research has shown that a genetic liability to major depression increases the risk for a range of stressful life events, particularly those reflecting interpersonal and romantic difficulties.8 These represent only a couple of areas where individuals are known to play an active role in shaping environmental factors that are associated with subsequent risk for psychiatric problems.
Another way that genetic and environmental influences are linked is via gene-environment interaction or, as we might prefer, genetic control of sensitivity to the environment. In these situations, genetic influences may vary in importance as a function of environmental conditions and/or that the environment differs in importance as a function of an individual's genetic predisposition (these two conceptualizations of gene-environment interaction are indistinguishable statistically). Heritability estimates essentially average across environments; accordingly, if there is reason to believe that the importance of genetic effects might vary as a function of the environment, this information can be incorporated into the twin model to test for significant differences in heritability as a function of the environment. Substance use provides one area where gene-environment interaction effects have been found to be particularly important. Environments that exert more social control and present less opportunity to engage in substance use consistently show reduced evidence for the importance of genetic effects. In this sense, the environment is essentially constraining the expression of a predisposition toward substance use/problems. This has been demonstrated with respect to enhanced parental monitoring in adolescents,9 a more religious upbringing,10 and enhanced community stability,11 among other factors. One nice example of this can be found in an analysis of the heritability of adolescent smoking across the United States using data from the National Longitudinal Study of Adolescent Health. Genetic influences on daily smoking were lower in states with relatively high taxes on cigarettes and in those with greater controls on vending machines and cigarette advertising, again suggesting the importance of social control mechanisms in moderating the importance of genetic influences on substance use.12
The rationale of the basic twin design can be expanded to examine the extent to which genetic and environmental factors contribute to the co-occurrence of psychiatric conditions. Comorbidity among psychiatric disorders is common, and multivariate twin studies have helped address the etiological mechanisms that contribute to these observed epidemiological patterns. A fascinating result to emerge from these studies is that psychiatric conditions with distinct clinical presentations (eg, major depression and anxiety) are not necessarily distinct genetically. For example, a study of major depression and generalized anxiety disorder found a genetic correlation of 1.0, suggesting that the same genetic influences impact depression and anxiety, but differences in environmental experiences contribute to the manifestation of different outcomes.13 An expanded study that examined the genetic and environmental architecture across seven common psychiatric and substance-use disorders found that genetic influences load broadly onto two factors that map onto internalizing disorders (depression, anxiety disorders), and externalizing disorders (alcohol and other drug dependence, childhood conduct problems, and adult antisocial behavior).14 These findings indicate that while distinguishing these disorders as “separate conditions” in the DSM may be useful for clinical purposes, these categories do not necessarily reflect differences in biological etiology. These findings, along with similar results from phenotypic analyses (eg, refs 15,16) have led some to suggest a reorganization of the “metastructure” of psychiatric disorders in DSM-V.
Another area of investigation examines whether there are differences in the importance of genetic and environmental factors at different stages of the disorder. For example, the development of substance dependence is necessarily preceded by several stages, including the initiation of the substance, the progression to regular use, and the subsequent development of problems, whether they be psychological, social, and/or physiological. Twin studies can investigate the degree to which each of these steps in the pathway of risk is influenced by genetic and/or environmental factors, and the extent to which the same or different genetic/environmental factors impact different stages. For example, data from two population-based, longitudinal Finnish twin studies found that shared environmental factors played a large role in initiation of alcohol use, and a more moderate role on frequency of use, and it was largely the same influences acting across these stages of use. However, there was no significant evidence of shared environmental influences on alcohol problems in early adulthood. Problems were largely influenced by genetic factors that overlapped with genetic influences on frequency of use.17 In a study from Virginia in male twins, similar results were found for alcohol, cannabis, and nicotine.18 In the early years of adolescence, shared environmental influences were responsible for nearly all twin resemblance for levels of intake of these psychoactive substances. However, as individuals aged, the impact of shared environment decreased and that of genetic factors increased.
Finally, there is known to be tremendous heterogeneity among individuals with psychiatric conditions. Twin studies can provide insight into whether clinical heterogeneity may reflect differences in etiological risk factors. For example, alcohol dependence with comorbid drug dependence has been found to be a particularly heritable form of the disorder,19,20 and twin studies have suggested a genetic influence on typical versus atypical forms of major depression.21
Another active area of research is the clarification of how genetic and environmental influences may change across development. A recent meta-analysis examined published studies with at least two heritability time points across adolescence and young adulthood for eight different behavioral domains. These analyses revealed significant cross-time heritability increases for externalizing behaviors, anxiety symptoms, depressive symptoms, IQ, and social attitudes, and nonsignificant increases for alcohol consumption and nicotine initiation. The only domain that showed no evidence of heritability changes across time was attention-deficit/hyperactivity disorder.22 Similarly, in a large study of >11 000 pairs of twins from four countries, the heritability of general cognitive ability was found to increase significantly and linearly from 41% in childhood (9 years) to 55% in adolescence (12 years) and to 66% in young adulthood (17 years).23 The robust finding of increases in the importance of genetic influences across development likely reflects, in part, active gene-environment correlation, as individuals increasingly select and create their own experiences based on their genetic propensities.
In addition to changes in the relative magnitude of importance of genetic and environmental influences, another dynamic change is that different genes may be acting at different time points. This is nicely illustrated in recent analyses of alcohol use problems, as assessed at five time points from ages 19 to 28 in the Dutch Twin Registry (Kendler et al, in preparation). Kendler and colleagues found strong innovation and attenuation of genetic factors across this age range - indicating that some genetic influences on alcohol problems that were evident at age 19 declined in importance across time, while new genetic influences became important starting at ages 21 and 23. Thus, although the overall heritability of alcohol problems remained fairly stable, it appeared that different genetic factors were important at different timepoints. In analyses in the TCHAD Swedish study which followed twins from ages 9 to 20 across four waves of assessment, large changes were seen in the genetic risk factors for fears and phobias24 and for symptoms of anxiety and depression,25 with particularly pronounced evidence for genetic innovation at puberty. These analyses suggest that genetic influences of many psychiatric and substance use disorders are likely to be developmentally dynamic.
Sex differences in the prevalence of psychiatric disorders, and in risk and protective factors associated with psychiatric outcomes, are widespread in epidemiology. Twin studies allow us to investigate the extent to which there are differences in the relative importance of genetic and environmental influences on outcome, and the extent to which different genes and/or environments may be important. Large-scale twin studies have suggested, for example, that the genetic risk factors for both depression26 and alcohol dependence,27 while correlated, are not entirely the same for males and females. Results from two large twin studies in the US and Sweden agree that the genetic influences of major depression are modestly stronger in women than in men.28,28
As advances in molecular genetics and statistical analysis have made it possible to conduct large-scale projects aimed at identifying the specific genes involved in susceptibility to psychiatric outcome (detailed in the next sections), some have raised questions about the continuing utility of genetic epidemiology. The argument is that heritability has now been established, which provides the foundation and justification for moving beyond twin studies, on to large-scale gene identification projects. However, as detailed in this paper, most twin studies are no longer conducted simply to test for the presence of genetic effects; rather, they focus on the more complex kinds of questions summarized above. These analyses are not only informative about the nature of etiological pathways of risk, but they can also be used to guide gene identification efforts and to further our understanding of the risk associated with specific genes as they are identified.
Currently, gene-finding efforts for psychiatric disorders (and other common, complex medical conditions) have met with limited success. Findings from genetic epidemiology can be used to inform the phenotypes used in gene-finding studies. For example, based on the twin literature (reviewed above) suggesting that much of the predisposition to alcohol dependence is via a broad externalizing factor, externalizing factor scores were created in the Collaborative Study on the Genetics of Alcoholism (COGA) sample, comprised of symptoms of alcohol and other drug dependence, and childhood and adult antisocial behavior, as well as the personality traits of novelty-seeking and sensation-seeking, which also index general behavioral disinhibition. This latent externalizing factor score was then used in both linkage and association analyses, with results compared with analyzing separately the individual symptoms of each of the psychiatric disorders that went into the creation of the general externalizing score.29 The results demonstrated that this broader externalizing phenotype was useful in both linkage and association analyses, suggesting that creating phenotypes grounded in the twin literature can aid in identifying susceptibility genes. Twin data has also been used to aid in genetic association studies in the area of internalizing disorders. Using data from the Virginia Adult Twin Study of Psychiatric and Substance Use Disorders, multivariate structural equation modeling was used to identify common genetic risk factors for major depression, generalized anxiety disorder, panic disorder, agoraphobia, social phobia, and neuroticism. Cases and controls were then identified for genetic association studies based on scoring at the extremes of the genetic factor extracted from the twin analysis, with the subsequent association analyses yielding evidence for association with the gene GAD1.30
Another area where genetic epidemiology intersects with gene identification efforts is in the characterization of risk associated with identified genes. Most major gene identification efforts for psychiatric disorders currently focus on adult psychiatric outcomes. As we identify genes that are reliably associated with these disorders, one of the next interesting research challenges will be to study how risk associated with these genes unfolds across development and in conjunction with the environment. Here, findings from genetic epidemiology can again be useful in developing hypotheses to test the risk associated with specific genes. For example, based on the twin literature suggesting that adult alcohol dependence and childhood externalizing symptoms overlap in large part due to a shared genetic predisposition,31 genes that were originally identified as associated with adult alcohol dependence (eg, GABRA2,32 CHRM2 33) have been tested for association with externalizing behavior in younger samples of children and adolescents. These studies suggest that children carrying the genetic variants associated with alcohol problems later in life display elevated rates of conduct problems earlier in development, before any association with alcohol dependence has manifested.34-36 Further, based on the twin literatures suggesting that genetic influences on externalizing behaviors are moderated by parental monitoring9 and peer deviance,37,38 further analyses demonstrated that the associations between these genes and externalizing behavior were stronger under conditions of lower parental monitoring and higher peer deviance. Characterizing the risk pathways associated with identified genes will be critical in eventually translating this information into improved prevention and intervention programs.
The field of psychiatric genetics has used two different methods to attempt to identify individual risk genes: linkage and association. These are fundamentally different approaches with different study designs applied, until recently, to very different research questions. It is important to understand both in order to understand why association approaches have become the norm in followup studies of linkage regions as well as the primary current approach in genome-wide studies.
Humans are ~99.9% identical at the nucleotide level on average. Molecular genetic studies depend critically on the remaining 0.1% (~3 million nucleotides) where variation occurs between individuals, collectively known as genetic polymorphisms or markers. Linkage studies generally use short tandem repeat polymorphisms (STRs). STR alleles are differing numbers of a repeating unit of nucleotides and have specific sequence lengths and molecular weights as a result, allowing them to be separated and identified. STRs are very common and tend to be extremely polymorphic (ie, to have many alleles - where an allele is one of the possible variants that exist in a population at a particular genetic locus) and therefore to have high heterozygosity (the proportion of individuals who have two different alleles at the marker locus). This high heterozygosity is important for linkage analyses, which require a unique allele at each position on each homologous chromosome to be informative.
In contrast, single nucleotide polymorphisms (SNPs) are changes of a single base or insertion/deletion variation up to a few nucleotides in size. SNPs generally have only two alleles, and have lower heterozygosity and lower information content. Association studies tend to use SNPs as the marker of choice, because alleles of these markers evolve more slowly than those of STRs and preserve more of the evolutionary relationships on which genetic association is based. SNPs can also be used for linkage, but about ten times as many SNPs as STRs are required to capture the linkage information.
In marker genotype data from families, new combinations of alleles at a series of markers on individual chromosomes are observed in each generation. This recombination of alleles is observed because there is at least one physical exchange of material (or crossover) between each homologous chromosome pair in every meiosis (Figure 1). Recombination between loci on different chromosomes (because of independent assortment of homologous chromosome pairs) or far apart on the same chromosome (because of crossover at meiosis) is observed 50% of the time. Linkage is observed between loci in close proximity on a chromosome because their alleles are separated by crossover less than 50% of the time.
Mendelian diseases are caused by mutations in a single gene at a single chromosomal location, so disease phenotypes can be treated as marker alleles in linkage analysis. Because these illnesses are rare, for a dominant disorder, the rare risk allele must segregate from one parent (often affected or with family history) into affected offspring, or arise as an even rarer de novo mutation. By following the segregation of marker alleles from the affected lineage into offspring, linkage between markers and phenotypes can be observed when affected offspring inherit a particular set of marker alleles (and thus a specific parental chromosomal segment) compared with their unaffected relatives.
While linkage occurs in families, association is a population-based phenomenon. Genetic association studies test whether specific alleles at variable sites are more common in individuals affected by a disease (cases) than individuals not affected by the disease (controls). This association between allele and phenotype can occur for two reasons. Either the allele being studied directly influences risk for the disorder or, more commonly, the allele is in linkage disequilibrium (LD) with the disease-predisposing allele. Linkage disequilibrium means that specific alleles at two nearby loci tend to occur together in an entire population. Linkage, (the cosegregation of a chromosome region and a disease observed in families), occurs at scales of tens of millions of base pairs because of the limited number of recombinations observed in each generation of a family. Association (and LD) are seen at scales of thousands to tens of thousands of base pairs, because the number of recombinations present in the evolutionary history of a population is large, meaning that the physical distances between loci in LD must be correspondingly small if recombination is to occur rarely (if ever) between them.
LD occurs because a new allele always arises on a specific background chromosome (and its existing haplotype of marker alleles), and will, until separated by recombination, only exist in conjunction with the other alleles present on that background. Over time, the original LD (and thus the genetic association) between more distant loci decays as a result of recombination events, while the rarity of recombination between nearby loci preserves the original LD and association. Association can also be detected spuriously, eg, if observed differences in allele frequency are due to population differences rather than to true association between marker and phenotype. Association approaches are also substantially reduced in power in the presence of allelic heterogeneity (the existence of more than one risk allele at a locus), while this phenomenon has no effect on the detection of linkage.
A number of features of psychiatric and behavioral phenotypes contribute to an overall reduction in study power. Association is more powerful, generally for detecting genes of small effect,39 but the specific features of psychiatric and behavioral phenotypes also reduce the power of association studies.
First, psychiatric phenotypes are almost certainly influenced by multiple common alleles of small effect in many genes. Both linkage and association study designs are more powerful for alleles of large effect size, and are much less powerful when examining highly polygenic phenotypes. Replication studies are hampered by the need for sample sizes larger than the discovery sample (in order to maintain power) and stochastic sampling variation, the expected variation in the extent to which any specific risk factor is present (and association detectable) in any particular sample.
Second, interactions between genes (GxG) or between genes and environmental variables (GxE) seem necessary to account for observed risks, but we rely heavily on analytic approaches that assess single genes. In a few cases, genes with known molecular interactions with the candidates have also generated replicated association. Environmental risk factors remain largely unknown and are difficult or very expensive to test in many samples.
Third, these phenotypes are common, so the liability alleles seem likely to be common, although increased rates of rare deletions and duplications (structural or copy number variants) in cases have been observed multiple times and suggest that rare variation may also contribute to risk in a proportion of cases. The common risk variants are expected to occur with relatively high frequency in the general population, reducing contrast between affected and unaffected individuals and reducing power. The impact of individual rare structural variants in the subset of cases where they are observed is harder to assess currently, but the observation of an aggregate increase appears robust, further increasing the apparent etiological complexity.
Fourth, the expected frequency of risk alleles and the clinical variability in presentation, course, and outcome suggest that the etiology of individual cases may be heterogeneous, derived from different specific genes or alleles between individuals. Allelic heterogeneity substantially reduces the power of association designs.
Fifth, diagnostic boundaries are difficult to draw, and the best phenotype to study is a complex choice. It is critically important to consider this last point and the phenotypes that yield the strongest evidence in some detail.
Through 2004, 25 complete or nearly complete genome scans for schizophrenia (in which about 400 individual genetic markers are genotyped at regular intervals over the entire human genome) were published (for review see refs 40,41). None provided evidence for genes of major effect. Some linkage regions were replicated in these studies, and a number of promising genes emerged from sequential linkage and association studies and multiple replication reports. We focus here on those regions with the best replication record and with evidence emerging from other contemporary studies: 22q12-q13 8p22-p21, 6p24-p22, and 1q32-42. Two additional regions with little support in the primary literature, 2p11.1-q21.1 and 3p25.3-p22.1, were among the most significant in a meta-analysis of schizophrenia genome scans. A number of other regions (including 5q22-q31 and 15q13-q14) have less strong summary evidence but also overlap with evidence from more recent GWAS and structural variation studies.
Chromosome 22q has been widely studied using many different designs. Primary linkage signals were observed in a few samples but have generally been widely replicated. However, the cosegregation of a known microdeletion in the region with a phenotype in which psychosis is a common feature added significantly to interest in this region. Velo-cardio-facial syndrome (VCFS) is caused by two overlapping, recurrent deletions at 22q11. Historically, about 10% of VCFS patients were thought to present with a psychotic phenotype, but more recent studies suggest much higher rates of 25% to 29 %.42,43 Conversely, preliminary results suggest that about 2% of adult onset and 6% of childhood onset schizophrenic patients have microdeletions in this region, in excess of the estimated general population frequency of such deletions of 0.025%.44 Interest in this region has been further increased recently by studies assessing structural variation (see below). The gene for catechol-O-methyl transferase (COMT), involved in the degradation of catecholamines, maps to this region; the enzyme is functionally polymorphic with a variable amino acid, Val158Met, affecting activity. Although widely studied, the results from genetic studies of COMT are inconclusive as reviewed recently45
Studies of pedigrees from numerous different ethnic backgrounds have detected linkage to schizophrenia on 8p, as did a statistically robust meta-analysis.46 Although numerous samples support a locus on 8p, comparison between individual studies is consistent with the presence of multiple susceptibility genes, a feature of a number of linkage regions. Almost certainly the most important result on 8p so far is the widely replicated association with the neuregulin 1 (NRG1) gene in families and case/controls from Iceland.47 NRG1 is a large gene with multiple transcripts yielding distinct protein molecules. It is expressed at central nervous system synapses and is involved in the expression and activation of neurotransmitter (including glutamate) receptors. Initial replication studies48,49 detected association on haplotypes identical or closely related to those identified in the Icelandic cases; 13 additional studies in multiple populations reported association with more variation in associated alleles or haplotypes,50-62 while nine studies did not.63-71 A meta-analysis of studies of NRG1 supported involvement of the gene in schizophrenia liability, but did not provide evidence supporting association of the most prominent marker in the original studies.72 In a pattern observed for a number of the best supported schizophrenia genes, several studies have also shown association between NRG1 and bipolar disorder.62,73,74
ErbB4, encoded by the ERBB4 gene, is a receptor for NRG1 and has important roles in neurodevelopment and the modulation of NMDA receptor functioning. Both activation of ErbB4 and suppression of NMDA receptor activation by NRG1 are increased in the prefrontal cortex in individuals with schizophrenia compared with controls.75 This functional relationship prompted genetic study of ERBB4, which demonstrated association in ERBB4 and evidence of interaction with NRG1.59,76-78 Associated alleles in ERBB4 alter splicevariant expression79 and both NRG1 and ErbB4 protein are increased in the brain in schizophrenia. These results may be of particular importance as there is a biologically plausible mechanism for gene x gene interactions, and even if the interaction is not confirmed, both genes impact the glutamatergic system (supporting the widely held view that part of the complexity may be explained by effects at the level of the pathway or system). Important tests of both interaction and system effects unbiased by candidate selection will be undertaken in the current GWAS datasets.
Chromosome 6 has a long history in genetic studies of schizophrenia with major shifts in the apparent importance of particular results. Early linkage studies observed evidence of linkage in human leukocyte antigen (HLA) genes in the major histocompatibility complex (MHC) region on chromosome 6p21. 3-22.1, but the limited genome coverage (only ~6%) and lack of replication reduced the apparent importance of these findings. The first strong evidence for linkage of schizophrenia to the 6p region came from studies of Irish families with a high density of disease.80 This study was also important because it addressed the question of diagnostic boundaries in some detail. Evidence for linkage was modest under a narrow diagnostic model, increased substantially as the diagnostic definition broadened to include psychosis spectrum disorders, and fell when the definition was broadened further to include nonspectrum disorders, in keeping with observed risks in relatives for these traits. Multiple independent studies of this region of 6p observed evidence for linkage, as did a multicenter collaborative study81 and a robust meta-analysis.46
The dystrobrevin binding protein 1 or dysbindin (DTNBP1) gene was first reported to be associated in the same Irish families.82,83 Many studies support association in DTNBP1 in samples from diverse ethnic backgrounds although the markers, alleles and haplotypes associated vary significantly from study to study: 13 studies of 15 independent samples reported significant positive association with schizophrenia (most consistently with common alleles and the highest frequency common allele haplotype),70,82-93 while 14 studies of 18 independent samples did not.61,63,85,94-104 A further four studies have also provided positive evidence for association of DTNBP1 with bipolar disorder.105-108 Although the function of DTNBP1 in brain is unknown, both RNA109 and protein110 expression is reduced in cases.
Interest in chromosome 1 in schizophrenia began with reports of a balanced 1:11 translocation segregating with serious mental illness in a large pedigree from Scotland.111 The chromosome 1 breakpoint lies at 1q42.1, and the breakpoint directly disrupts a novel gene, Disrupted in Schizophrenia 1 (DIS C1).112 There are now nine positive reports of association of DISC1 with schizophrenia74,113-120 and 2 of association with positive symptoms121,122 suggesting that this gene influences schizophrenia liability in the general population, as well as in the family with the chromosomal anomaly. Other rare variants in this gene besides the breakpoint have also been reported to be associated with schizophrenia123,124 and association has been reported for additional psychiatric diagnoses, reviewed in ref 125, and for bipolar disorder.126 A smaller number of negative reports have also been published.103,127-130
Two additional chromosome regions, 5q22-q31, where association was recently reported in the interleukin-3 (IL3) gene131 and 15q13-q14, where evidence for linkage of an evoked potential abnormality common in patients132 was supported by five additional studies reporting linkage of schizophrenia to the same narrow region,133-137 show some overlap with the results of current studies discussed below. Other high-profile candidate genes such as PRODH2 on 22q138 and PPP3CC on 8p139 have not replicated well. One exception is AKT1,140 which has similar numbers of positive141-145 and negative61,103,146-149 replications.
By assaying 500 000 to 1 000 000 DNA variants in a single experiment, GWAS provide unbiased genome-wide coverage, avoiding selection of candidate genes. They use an association framework for analysis, avoiding the weaknesses of linkage in complex traits. They impose stringent criteria due to the number of tests performed (typically around P<5 x 10-8 for genome-wide significance). They hold enormous potential to move beyond the identification of single genes (which may show small effects and be difficult to detect individually) toward the simultaneous identification of multiple genes through their interactions or involvement in systems.
Seven GWAS of schizophrenia have been published to date, four of which were small and underpowered. The first (320 cases, 325 controls) was of limited density as it genotyped only 25 000 SNPs in 14 000 known genes, and did not detect any association that reached genome-wide significance150; nominal association was reported in the plexin A2 (PLXNA2) gene. Only one of four samples tested in three independent studies replicates the association.151-153 The second (extremely underpowered with 178 cases, 144 controls) identified one genome-wide significant association in the X/Y pseudoautosomal region (a homologous region of the sex chromosomes where recombination can occur), near the interleukin 3 receptor (IL3R) gene.154 Cytokines have been suggested as possible candidates previously and IL3 (in the 5q linkage region) was associated with schizophrenia in one study131 One replication attempt supported association in IL3R.155 The third, using the CATIE156 sample (738 cases, 733 controls), did not detect any genome-wide significant results in its primary analysis.157 The fourth, using a multistage design of discovery (479 cases, 2937 controls) and targeted replication (6666 cases, 9897 controls) samples, identified one genome-wide significant SNP in the zinc-finger protein transcription factor ZNF804A gene,158 but only in the meta-analysis including the original sample. One independent replication attempt supported the association of ZNF804A, and showed that expression was increased from the associated haplotype.159
Three substantially larger GWAS of schizophrenia were published in 2009, in the SGENE+ sample160 (multiple European sites, 2663 cases/13498 controls), the International Schizophrenia Consortium (ISC) sample161 (multiple European sites, 3322 cases/3587 controls) and the Molecular Genetics of Schizophrenia (MGS) sample162 (multiple US sites, European ancestry: 2681 cases/2653 controls; African ancestry: 1286 cases/973 controls), analyzed both separately and together. The one region of the genome with significant overlap in signals from the 3 studies was the MHC region on chromosome 6p21.3-p22.1, site of some of the earliest genetic evidence in schizophrenia discussed above. The SGENE+ sample detected significant association with several markers spanning the MHC region, as well as signals upstream of the neurogranin (NRGN) gene on 11q24.2 and in intron four of the transcription factor 4 (TCF4) gene on 18q21.2. The ISC sample detected association in ~450 SNPs spanning the MHC region and the myosin XVIIIB (MY018B) gene on 22q and supported ZNF804A. The MGS sample did not detect any individual genome-wide significant signals, but detected signals in the range of 10-5-10-7 in the CENTG2 gene (reported deleted in autism cases163) on chromosome 2q37.2 and JARID2 (the gene adjacent to DTNBP1) in Europeanancestry subjects, and in ERBB4 and NRG1 in AfricanAmerican subjects.
Meta-analysis of data from all European-ancestry MGS, ISC and SGENE samples detected genome-wide significant association signals for 7 SNPs spanning 209 Kb of the MHC region. LD is high between the 7 SNPs and extends over a region of 1.5 Mb on chromosome 6p22.1, making it difficult to determine if the signal is driven by one or many genes. The genic content of this region is not limited to histocompatibility loci, and also includes genes involved in transcriptional regulation, DNA repair, chromatin structure, G-protein-coupled-receptor signaling and the nuclear pore complex.
The strongest linkage meta-analysis approach ranks 30 cM bins of the genome from most positive to least positive for each study, and then sums the ranks for each bin. Significance levels are calculated by simulation, and this method can identify regions of the genome where modest positive results occur across many studies. Results of this approach supported linkage to chromosomes 6p and 8p among the previously identified regions discussed above.46 The strongest evidence for a potential locus was on chromosome 2p11.1-q21.1, a region suggested by only a few studies and not widely followed up, and on 3p, the site of an early linkage finding that could never be replicated.
A recent effort has been made to systematize the collection and archiving of association data from studies of schizophrenia, and to provide a framework for continuous updating of both the data and the meta-analytic results164 in the SzGene database (http://www.szgene.org/). Metaanalyses of the data contained in this resource provided support of varying degrees for 24 SNPs in 16 previously reported genes, including older candidate genes (eg, dopamine receptor 2 (DRD2) gene, those resulting from association-based follow-up of linkage data (eg, DTNBP1) and one suggested by one of the smaller GWAS (PLXNA2). Meta-analyses of schizophrenia GWAS data from at least 15 000 cases and 15 000 controls are scheduled for completion in 2010.
The epidemiological and genetic data above seems most consistent with the common disease/common variant hypothesis of the genetic risks for complex traits and the results of GWAS in other complex traits like type 2 diabetes provided a major validation of this model.165-168 The alternative common disease/rare variant hypothesis of genetic risks for complex traits has been proposed in schizophrenia,169 largely based on the reduction in fertility observed in cases. A key focus of research in this area has been the deletions, duplications, and inversions of a few thousand (Kb) to a few million (Mb) base pairs collectively known as structural variants, an area of intense research interest generally since 2004,170-172 reviewed in ref 173. As a class, these genomic rearrangements are common: ~360 Mb or 12% of the genome is included in structural variation.174 A few such variants occur at high frequency due to apparent selection in certain contexts,175,176 but studies of large samples consistently show that the majority of structural variants are rare (~50% detected in only one individual).174
The aggregate rate of such rare structural variants is significantly increased in individuals with schizophrenia in all four studies that have examined this question.177-180 Critically, there is substantial overlap in the regions where excess structural variation is observed, most notably on chromosomes 22q11, 15q13.3 and 1q21.1, with some evidence that neurodevelopmental genes are overrepresented, as in181 and more recently on 16p11.2.182 However, even considered in aggregate, structural variants are observed in only 15% of schizophrenia cases, and so cannot account for a substantial fraction of the total population risk. Because they are rare, the true impact of individual structural variants on schizophrenia is difficult to validate and interpret, although the replication of excess structural variation in cases on chromosomes 22q11, 15q13.3, and 1q21.1 is extremely encouraging.
At both the technical/molecular and statistical/conceptual levels, the science of gene discovery in complex disease genetics is moving rapidly. By the time this paper is published, new developments are sure to have arisen. As is common in science in the state of rapid flux, the direction ahead is far from clear. How will the modest but hard-fought advances obtained in more traditional positional cloning and candidate gene work integrate with the new findings from GWAS? How will the commonvariant SNP-based approach inter-relate with the emerging rare-variant copy number variant findings? Will advances in phenotypic assessment or endophenotypes provide critical new insights? How will the burgeoning fields of bioinformatics, expression arrays, and proteomics impact on our gene-finding efforts?
One emerging consensus is that the field needs to move from a “gene-centric” approach toward one that considers “gene networks.” For example, many of the candidate genes discussed above are involved in glutamatergic neurotransmission, which may be an important systemic element in the etiology of schizophrenia. Although a detailed discussion of this theory is outside the scope of this summary, recent reviews of the genetic183 and neuroscience184 data and evidence from other studies highlight the positions of the gene products of NRG1, COMT, and possibly DTNBP1 among others, in the biochemical and functional pathways influencing the glutamatergic system. Many other possible networks may be involved in the etiology of schizophrenia that, if properly articulated, could aid in our gene-discovery efforts.
We have attempted in this article to review the rapidly evolving field of psychiatric genetics. In the section on genetic epidemiology, we took a conceptual approach focusing on a range of the most interesting questions now being confronted by the field, with the goal of giving the reader a “feel” for the issues. While examining a wide range of disorders, we focused on substance use and externalizing disorders because they clearly illustrated the points we wanted to make. In the section on gene-finding, we decided it would be more useful to “drill down” and illustrate our important themes by focusing on one disorder - schizophrenia.
The major theme that cuts across these two sections is the complexity of the pathways from genetic variation to psychiatric and substance use disorders. Results of the last 20 years have shown that the early prior simple hypothesis of large effect genes that directly causes psychiatric illness was seriously misplaced. We now know that multiple gene variants (as well as - for at least some disorders - genomic rearrangements) are involved at the DNA level. These genetic risk factors then act and interact with each other and with the environment in a complex developmental “dance” to produce individuals at high versus low risk of illness. It is this kind of complexity that the field is now confronting directly.
As one might hope, progress is being made in multiple ways. The field that is moving downward - in a reductionist sense - to more detailed biological mechanisms at the DNA, RNA, and protein levels. These efforts are being driven by rapid technological advances. However, we are straining to develop the conceptual and analytic tools to keep pace with the information generated by these new generation technologies. At the same time, the field is moving out into the environment to clarify the often critical inter-relationship between these two broad classes of risk factors. Equally importantly, it is moving “forward” in emphasizing the importance of time and development.
This can all be confusing and sometimes a bit overwhelming. In a desire to simplify, some, in the “glow” of the new biological tools now available, have devalued the genetic epidemiologic approaches. These approaches, they suggest, focus on “statistics” but not “real genes.” However, knowledge gained from genetic epidemiology, in addition to provide a guiding light for molecular approaches, also have their own inherent validity. Studying aggregate genetic risk factors allows us to build etiologic models that can inform prevention efforts, aid policy makers in planning for research programs, and provide critical input into revisions of psychiatric nosology.
We would like to close by emphasizing that knowledge about the role of genetic factors in the etiology of psychiatric illness can be profitably understood from several perspectives. The human mind/brain system - the organ that instantiates psychiatric illness - is surely influenced by processes occurring at the levels of basic molecular biology, neural systems and networks, and psychological, social, and cultural processes.185 A full understanding of the processes whereby genetic risks lead to the development of psychiatric disorders will surely require considering all these perspectives, each of which contributes a useful viewpoint with methodologies that have important (and different) strengths and limitations.
Danielle M. Dick, Virginia Institute of Psychiatric and Behavioral Genetics; Department of Psychiatry; Department of Human and Molecular Genetics, Virginia Commonwealth University School of Medicine, Richmond, VA, USA. Department of Psychology, Virginia Commonwealth University, Richmond, Virginia, USA **