|Home | About | Journals | Submit | Contact Us | Français|
Epistasis, or interactions between genes, has long been recognized to be fundamentally important to understanding both the structure and function of genetic pathways and the evolutionary dynamics of complex genetic systems. With the advent of high throughput functional genomics and the emergence of systems approaches to biology, as well as a newfound ability to pursue the genetic basis of evolution down to specific molecular changes, there is a renewed appreciation both for the importance of studying gene interactions and for addressing these questions in a unified, quantitative manner.
To what extent can we understand the function and evolution of genetic systems by examining one gene at a time and to what extent do we need to worry about the potentially daunting number of possible interactions among the thousands or tens of thousands of genes operating within most organisms? Individual components or entire systems? Although this is an old topic of debate within genetics, the recent rush towards combining comprehensive functional genomics and systems biology with high resolution genetic mapping is now providing the necessary empirical muscle to address these issues much more thoroughly than possible in the past. At the same time, a deeper understanding of the functional basis of gene interactions is generating an exciting intersection among a wide set of genetic disciplines, ranging from protein biochemistry to evolutionary genetics. We have never been in a better position to assess the role that gene interactions play within biological systems.
It has been roughly 100 years since William Bateson invented the term “epistasis” to describe the discrepancy between the prediction of segregation ratios based on the action of individual genes and the actual outcome of a dihybrid cross1. The usage of epistasis has since expanded to describe nearly any set of complex interactions among genetic loci (Box 1). Over the years geneticists have used epistasis to describe three distinct things: the functional relationship between genes, genetic ordering of regulatory pathways, and quantitative differences of allele-specific effects (Fig. 1). Using the same word to describe subtly different phenomenon has generated surprisingly little confusion in the literature—mostly because of a tendency for different areas of genetics to effectively ignore one another. This is no longer possible. Molecular geneticists are now studying how specific allelic effects traverse complex regulatory networks, while evolutionary geneticists are moving from statistical descriptions of genetic variation to identifying the specific nucleotide changes responsible for adaptive evolution. What has become clear in the century since the concept of epistasis was introduced, however, is that most of the systems that underlie cellular, developmental, and physiological function are composed of many elements that interact with one another in frequently complex ways. The challenges generated by the presence of epistasis provides a point for unification of traditionally disparate areas of research and shows that this fundamental genetic concept is more relevant now than ever.
There have been many different uses of the term “epistasis” over the last 100 years, which leads to the potential for some confusion now that more biologists from different areas of genetics are increasingly looking at gene interactions. The original definition comes from William Bateson94, who was specifically concerned with the observation that in some dihybrid crosses, not all possible phenotypic classes were be observed and/or that some gene combinations resulted in novel phenotypes. Some of the mutations seemed to be “stopping” or “standing above” the effects of other mutations. Such mutations were said to be epistatic (the ones being blocked, hypostatic). It was clear from these circumstances that the mutations must be interacting with one another, at least in the loose sense that they exist within pathways that both influence the same phenotype. It was therefore perhaps natural that R.A. Fisher95 used a derivative of this term, “epistacy” to mean any statistical deviation from the additive combination of two loci in their effects on a phenotype (Box 2). Unfortunately, population geneticists rapidly adopted the term “epistasis” to apply to this second, much more general class of phenomena1, and so we are left with a situation in which geneticists studying genetic segregation of (usually) discrete phenotypes mean one thing by epistasis, whereas population and quantitative geneticists mean something slightly different. It is especially troubling that finding epistasis in one context (say during segregation in a specific cross) does not necessarily mean that there will be epistasis in the other context (say in the statistical sense). Worst of all, the opposite will frequently be true: an absence of the detection of epistasis in the statistical sense does not mean that there are no interesting interactions between loci in the stricter genetic sense12,88. There is perhaps a bit of irony in the fact that most scientists who work with epistasis rely on context to define the type of gene interaction that they are referring to.
There are a few other conceptual barriers to generating a more unified approach to gene interactions and epistasis. One issue arises from the way that one views how organisms are assembled. Are organisms constructed, with genes and their individual effects serving as the building blocks, or do organisms come to us as wholes, with each component only being understandable within the context of the system as a whole96? The former characterizes the approach to epistasis followed by most population geneticists, who tend to build up genotypes as though each allele has a specific predetermined effect that can be perturbed in certain circumstances by an interaction term describing epistasis. This viewpoint also fits in well with a traditional mutational approach to examine the function of a gene, since the mutant and wildtype functions of genes can be examined, manipulated and combined. The second approach has been used by quantitative geneticists and others studying natural variation or complex allelic series because in this context it become unclear what is part and what is whole—what is the reference standard against which each allele can be tested? Since there is no such thing as a naked gene without some broader genomic context, the building block model must ultimately be left to the theoreticians, but any real data will have to be examined using an effects model (Box 2).
Here, I first review various definitions of epistasis and show what they share in common and how they differ. I then turn to how the analysis of epistatic interactions between genes can be used to tease apart the global structure of these systems, and examine the impact that epistasis has on our ability to understand the genetic basis of natural variation, especially as it pertains to genetic variation associated with disease within human populations. I finally review various models of how evolution builds complex systems and explore how recent studies of molecular evolution can be used to determine the role that epistasis plays in directing the path of evolutionary change. A review as broad as this one naturally can not hope to cover all of the relevant literature. Luckily, there have been a number of recent, more specialized reviews that cover topics such as the evolutionary impacts of epistasis2-5, the role of epistasis in complex traits6-8, the impact of epistasis on human disease9-11, statistical issues in detecting epistasis12-17, and the use of synthetic interactions to define complex interaction networks18,19.
Over the years the disparate needs of geneticists have lead to a plethora of different nuanced meanings for the term “epistasis”, all of which involve gene interactions at various scales (Box 1). Although a few have suggested using the more generic term “gene interaction” to encompass the variety of phenomena labeled as epistasis so that epistasis can retain it original, more specialized meaning1, given the history of use, this seems untenable. As will be evident below, traditional uses of epistasis to order genes within pathways have become increasingly quantitative, further obscuring boundaries of usage, and suggesting that expansion, rather than contraction of the usage of “epistasis” is the order of the day.
In principle, detecting epistasis using the Bateson’s definition is relatively straightforward because the phenotypes are qualitative and few in number. Once epistasis is made more quantitative and is expanded to include nearly any kind of genetic interaction, then things get a bit more complex. First of all, epistasis means that something different happens when a particular set of alleles from different loci are in combination than when apart. But different from what? It must be different from what we would expect if the effects of the two loci were combined independently. Here, however, the scale of measurement becomes important. Fisher defined epistasis as a deviation from the additive expectation of allelic effects95. For a haploid model, this might look something like Wxy = αx + αy + ε, where α is the individual effects of each allele at loci x and y and ε is the deviation due to epistasis. Relationships for diploids are more complex because of the possibility of interlocus interactions with the dominance state of the other locus14. Fisher presumably chose this definition because additive linear models are very tractable from a statistical point of view. Part of the reason that Fisher did not think that epistasis was that important is because he felt that there would usually be some scale to which the phenotypic values could be transformed such that the effects would be additive.
In the late 1960’s, population geneticists started using deviation from a multiplicative model of gene action as the definition of epistasis instead of the additive model. This is because the evolutionary trajectories of loci with multiplicative fitness are independent of one another. In particular, if no linkage disequilibrium is present in the ancestral population, then none will develop if fitness effects are multiplicative86. A multiplicative haploid model would look something like Wxy = αx αy + ε. It is a little appreciated fact, however, that if linkage disequilibrium is already present in the base population, then it can still be maintained under a multiplicative model97, so there is in fact no perfect scale with which to measure epistasis. For some traits, such as fertility, an additive scale might be natural, whereas for other traits, such as mortality, the multiplicative approach is probably more appropriate. Perhaps not surprisingly, different measures can lead to different interpretations of epistasis98. Aylor and Zeng99 discuss possible extensions to common models of epistasis that attempt to span classical and statistical frameworks.
The fact that linkage disequilibrium can be generated by epistasis is sometimes proposed as an indicator of gene interaction, although this would frequently be expected to be a weak effect relative to other factors such as admixture, and, as we see, this expectation depends on how one measures epistasis. The stability of linkage disequilbrium depends strongly on the recombination rate, especially when linkage is tight97. Now that we have a firm idea of genomic structure of many organisms, we now know that there can often be very many genes with recombination map distances less than 0.01 or 0.05, so this might not be a trivial effect for many genes.
Much like the definitions of epistasis itself, there is a virtual menagerie of terms associated with particular forms of epistatic effects. Examples include: synergistic, diminishing, antagonistic, aggravating, ameliorating, buffering, compensatory, and reinforcing. Most of these refer to similar phenomenon, which makes it difficult to keep things straight. Take for example synergistic epistasis. This occurs when an individual with a particular two-locus combination of alleles displays a phenotype beyond that expected from the individual effects of the alleles. If these are, say, deleterious mutations, then that means that the phenotype is less than expected, but for positive mutations this means that the phenotype is greater than expected. So sometimes synergistic means “extra good” and sometimes it means “extra bad”. The field would be paid a great service if all of these redundant names, whose meaning depends strongly on context, were done away with in favor of two simple terms: positive and negative epistasis100. Positive epistasis means that the phenotype is higher than expected and negative epistasis means that the phenotype is lower than expected. These terms are preferable because (1) their meaning is immediately apparent with minimal thought and (2) it turns out that it is the sign of the epistasis that matters in most evolutionary processes (such as the generation of linkage disequilbrium), not the change in relative direction of the effects of the individual loci2. The latter relationship can be addressed using another simple term, sign epistasis5. This indicates that the direction of the epistatic and individual effects differ from one another that the direction of selection on the individual alleles actually changes depending on the genetic context. For example, if a pair of mutations lower fitness when found individually, but increase fitness when found together, then this results in an adaptive valley, which has very different functional and evolutionary implications than if the sign of the individual effects and epistasis are in the same direction (Fig. 4).
The last two views of epistasis are actually complementary to one another. Compositional epistasis measures the effects of allele substitution against a particular fixed genetic background, while statistical epistasis measures the average effect of allele substitution against the population average genetic background. Measures of statistical epistasis are dependent on genotype frequencies, which would seem to make them a bit ephemeral. However, Fisher would undoubtedly point out that any measure of interactions is dependent on the specific genetic context in which it is measured, so simply choosing a fixed genetic background to test for allelic effects is equivalent to setting allele frequencies at other loci to 1.0. In this way, compositional epistasis is arbitrary in its own way. There is something of a tradeoff between precision and generality here. Statistical measures often average over variable epistatic effects at many different loci, which may tend to cancel out one another2,21.
The hierarchical structure of the relationship between compositional and statistical epistasis is very similar, say, to the way that a Punnett square can be seen to be a special case of the Hardy-Weinberg condition under specific mating conditions. Some authors have referred to substitution against a fixed background as “physiological”22 or “functional”20 epistasis, but I do not favor these terms in this context because they are still essentially statistical measures and do not really tell us anything about gene function in the way that, say, a molecular biologist would use the term.
In the end, different uses of epistasis can in fact be unified under a single perspective, using the view that epistasis measures bi-allelic substitution under different genetic backgrounds: fixed or average. It is as simple—and as complex—as that.
One of the first characters studied by Bateson and Punnett23 that revealed a pattern of epistasis was flower color in sweet peas. As illustrated in nearly every genetics textbook, it is possible to cross two colorless (white) flowers and recover purple flowers in the offspring. The non-Mendelian segregation ratios of the F2’s in this cross (9:7) suggest that two complementary genes are interacting with one another (Fig. 2). Our modern interpretation is that these genes produce enzymes that operate within the anthocyanin pathway such that a mutation in either gene can disrupt flower color. In general, the fact that the phenotype of an individual depends strongly on the specific combination of alleles at two or more loci suggests immediately that this dependency must be saying something about the nature of the functional interaction between these loci (Fig. 1). Although many anecdotal cases were collected towards the beginning of the twentieth century, this idea was exploited most fully by Beadle and Tatum and their students during the advent of biochemical genetics24. Separate knockouts that each disrupt a particular function can be crossed together in order to observe the nature of the interaction and thereby order the genetic pathway25. This can be used most effectively in organisms in which large numbers of mutants can be generated, crossed and phenotyped26. For instance, in C. elegans epistasis analysis has been used extensively to order dozens of genes into pathways affecting diverse traits such as sex determination27, the development of the vulva28, and entry into the dauer resting stage29. In each case, ordering of the regulatory pathways genetically preceded molecular characterization and provided a strong set of functional models or hypotheses that could then (1) be used to make predictions about the probable functions of the identified genes and (2) be tested by the molecular characterization of the gene products.
The vast majority of studies that have used epistasis to analyze the structure of genetic pathways have used a small set of genes that had previously been identified to influence the trait of interest using single mutant analysis. However, the entire premise of epistasis is that genetic interactions can generate novel phenotypes when found in combination with one another. How better to discover such interactions than by looking for interactions between randomly selected genes? Even better, why not conduct a systematic study of the possible pair-wise interactions between all genes? The problem here, of course, is one of scale. The number of pair-wise interactions between genes grows at roughly the square of the number of genes [n(n-1) to be exact, where n is the number of loci; n(n+1) if the parental strains are included; half these numbers if reciprocal interactions do not need to be tested]. A daunting, but doable task for the 190 interactions resulting from twenty genes, but something nearing impossible for the over 18 million possible interactions for every gene in the yeast genome. Despite this challenge, such comprehensive approaches are beginning to be executed, spurred on by the availability of comprehensive deletion and/or knockdown libraries and high throughput maintenance and screening techniques.
Among the first studies of this type was that of Tong et al.30, who used a high throughput analysis approach (synthetic genetic array analysis) to examine the interactions of eight different deletion mutants against an array of ~4,700 other deletion backgrounds. They later expanded this query set to 132 different genes31. The interactions revealed here define a network of ~1000 genes involving ~4000 interactions, with most interactions tightly clumped within self-similar functional clusters. These results can go a long way towards helping us understand how complex gene networks can build robustness into cellular systems32, but because they are based on a growth/no growth criterion, they are largely qualitative33. A nice example of how classical compositional epistasis can be combined with a more quantitative approach is St. Onge et al.’s34 examination of the genetic interaction system influencing the resistance of yeast to the mutagen methyl methanesulfonate (MMS). Using 26 mutants known to influence MMS resistance, St. Onge et al. constructed all 650 possible double-deletion strains. Of these 10 interactions generated synthetic lethality, while 67 were classified as “aggrevating interactions” (negative epistasis; Box 2) and 45 were classified as “alleviating interactions” (positive epistasis) based on a multiplicative model. The interactions within this latter class, which provide greater than expected resistance to MMS, were used to generate a functional interaction map among the loci (Fig. 2). Most known genetic pathways were recovered, in addition to a few novel connections. Interestingly, nine of the ten deletion combinations that yielded essentially the same growth characteristics both in the single and double mutants (a “coequal” relationship) involved direct interactions between protein subunits, suggesting that it might be possible to connect particular epistasic outcomes with specific forms of functional interaction.
There is no reason to expect all forms of epistasis to be revealed simply by the absence of a gene (certainly a sledgehammer approach for perturbing complex systems). For example, Kroll et al.35 devised a method for looking for interactions induced after systematically overexpressing genes. Using this approach, Sopko et al.36 found that about 15% of a set of 5280 yeast genes show a growth defect when overexpressed, with most of the overexpression effects not matching the phenotypes of their corresponding deletions. Testing a deletion of the cyclin-dependent kinase pho85 over the overexpression library revealed 65 synthetic interactions, most of which were previously unknown. There are of course an endless combination of knockouts, overexpression, natural, and induced alleles that can be used in combination to probe a given library or array. The fact that the specific interaction results obtained depend on the nature of the “probe” (deletion, overexpression, etc) in the few studies that have been conducted thus far is perhaps not surprising, but does indicate that the overall structure of the interaction network is likely to be complex and allele-specific. There is also the possibility of something like a “network uncertainty principle”: perturbing one part of the network is likely to cause changes in the nature of the interactions between other elements of the network37.
The existence of comprehensive deletion libraries and high throughput screening methods have made yeast a particularly powerful system for systematically dissecting epistatic interaction networks (reviews in Refs 18,19). Such approaches are unfortunately unlikely to be widely applicable, so what about other organisms? Again, the problem here is one of scale. How do we generate and score so many possible combinations? Perhaps the cleverest approach for getting around this problem in multicellular organisms has been the use of RNAi libraries to knockdown (rather than knockout) genes in a systematic fashion. Lehner et al.38 used 37 strains of C. elegans with mutations in cell signaling components to “query” a RNAi library of ~1750 genes involved in signal transduction, transcriptional regulation and chromatin remodeling. The query was executed by raising each strain on bacteria expressing double stranded RNA of the target gene of interest and observing how many of these ~65,000 combinations resulted in disrupted growth and/or reproduction. This yielded a genetic network of 349 interactions involving 162 genes. While the number of interactions may seem low versus the total number possible, it is on the same order as that observed in yeast, which displays around 0.6% of interactions for non-essential genes31, with the frequency of interactions for essential genes being much higher at 3.3%39. Despite these gross similarities, it appears that the structure of the interaction networks between yeast and worms is quite divergent, with perhaps fewer than 5% of the interactions shared in common40.
The obvious next step for these analyses is to link the network structure revealed by epistasis analysis to information obtained from other methods, such as yeast two-hybrid, chromatin immunoprecipitation, and gene expression assays to build a comprehensive map of the full interactome41-44. Because the scale of experiments required, a fruitful approach for now appears to be to concentrate on a large, but finite set of genes known to be involved in a well defined biological process. For example, Collins et al.45 used all pair-wise interactions of 743 genes known to influence chromosomal processes such as DNA repair, transcriptional regulation, and chromatid segregation in yeast. This allowed them to place their interaction results within the context of already well-explored systems, such as the biochemistry of the multiprotein Mediator transcriptional coactivation complex. In this case, epistatic interactions corresponded well to known physical interactions among proteins, yet allowed novel interactions to be detected above what would otherwise be a chaotic set of more than half a million potential interactions.
A continuing challenge for the future is figuring out ways of overcoming the inherent scaling problem of the exponential growth in the number of possible genetic interactions. Further, now that we are beginning to look at pair-wise interactions, how do we know that third or higher order interactions will not also be relevant20? Returning to the theme of this section, however, high throughput screens by themselves are not going to explain the basis for why the interaction exists in the first place. Interaction networks are simply hypotheses that need to be rigorously tested using other functional approaches. In this, epistasis analysis has proven to be a valuable tool whose use is sure to continue to grow.
The presence of epistasis can greatly obscure the mapping between genotype and phenotype. In contrast to mutation based studies, which start with a known genetic lesion and then ask how a specific locus interacts with other loci, the goal of complex trait analysis, or quantitative trait locus (QTL) mapping, is to start with a given set of phenotypes and to then identify the genes responsible for generating differences among individuals within a population. Some of the issues that arise in QTL mapping in the presence of gene interactions are nicely illustrated by a recent study of the genetic basis of the response to selection on body size in chickens by Carlborg et al.46 (Fig. 3). After roughly 40 generations of selection, males in the low weight line weighed six times less than males in the high weight line. There must surely be a strong genetic signal in such a difference. Yet, after examining many marker loci for their individual effects, only one QTL (prosaically named Growth9) appeared to have an effect, and the signal for that was itself weak. However, by looking for epistatic relationships among the markers, the authors were able to identify five additional genomic regions with significant effects on growth, each of which only showed their effects when on the high-growth Growth9 background (Fig. 3). Together, this loose network of epistatic genes accounted for 45% of the difference among the selected lines, an overall effect of 3.3 phenotypic standard deviations. The “individual” effect of Growth9 was in fact completely accounted for via its epistatic interactions with the other QTL. A very similar pattern of modules of interacting QTL have been identified as influencing obesity in mice47. Although the genes underlying these QTL still need to be identified, it is clear that the vast majority of the genetic information in this system would have been missed without also looking for possible interactions.
Similar hidden effects are undoubtedly lurking within natural populations as well. For example, Ehrenreich et al.48 used association mapping at 36 candidate loci to investigate the genetic basis of natural variation in shoot branch architecture within populations of Arabidopsis thaliana. They were able to identify three loci with significant associations with morphology in the wild, but none of these loci were implicated in a standard QTL mapping experiment involving recombinant inbred lines. Interestingly, however, these loci did exhibit significant epistatic relationships among one another within these lines. The authors conclude that epistasis may be prevalent within these populations. If nothing else, this study illustrates that moving between association and QTL mapping studies can be complicated by genotype-specific patterns of epistasis. Association mapping in natural populations will be based on statistical epistasis whereas QTL mapping in between two inbred lines draws closer to compositional epistasis49,50 (although the total number of segregating backgrounds is still huge). Similar interactions underlying complex traits have been found in odor sensing behavior in Drosophila51, growth and yield in tomatoes52, the Arabidopsis metabolome53, the skeletal architecture of mice54, and in large scale studies of yeast growth55, morphology56, and gene expression57,58.
From mutational studies, we know that epistasis in the classical sense is ubiquitous because genes interact in hierarchical systems to generate biological function. For quantitative genetics and the genetics of complex traits however, it is not the total scaffold of biological function that matters, but the residual variation segregating within natural populations that determines differences among individuals. Traditionally, quantitative genetics has focused on aggregate measures, such as genetic variance and heritability, to estimate genetic effects. There seems to be little evidence that epistatic variance plays an important role in most populations59, although epistasis at individual loci can make significant contributions to additive variance60. Now that we are beginning to be able to dissect the specific genetic basis of complex traits, will epistasis have a larger role to play? The answer is that we still do not really know. This is partially because, even after several decades of work in this area, identifying the causal basis of individual variation in complex traits has remained fairly elusive and partially because the statistical issues involved in estimating large numbers of potential interaction effects has limited the power of most existing studies. Both of these barriers are beginning to fall.
By its very nature, epistasis is a property of whole genotypes. Epistatic effects are therefore most clearly revealed in the eccentricities of particular individuals. It is perhaps therefore not surprising that some of the best examples of epistasis are emerging from an area in which the focus on the individual reigns above all else: human health. Here we have the most complex of complex traits. Part of the motivation for the recent enthusiasm for looking for genetic interactions underlying human disease is the sense that previous failures to identify, and especially to replicate, significant individual genetic effects might be driven by underlying complexity generated by epistasis6,11. Indeed, epistatic shielding of disease alleles is one possible explanation for their persistence within populations61. Given the rapid increase in the size and precision of human association studies, we are now entering an era in which we can rigorously address the hypothesis of whether previous problems are a function of limitations in the data or truly the result of genetic complexity62.
There are numerous cases of epistasis appearing as a statistical feature of association studies of human disease. A few recent examples include coronary artery disease63, diabetes64, bipolar effective disorder65, and autism66. Unfortunately, in only a few cases has the functional basis of these potential interactions been revealed. One interesting exception involves genetic interactions underlying the autoimmune disease multiple sclerosis. Here, Gregersen et al.67 found evidence that natural selection might be maintaining linkage disequilibrium between several different histocompatibility loci (DR2a and DR2b; Fig. 3) known to be associated with multiple sclerosis. Linkage disequilibrium can be generated by strong epistasis among adjacent loci (Box 2). To test this idea, Gregersen et al. genetically engineered mice to produce the appropriate human immune proteins and found that mice producing the DR2b protein were highly susceptible to disease, whereas those producing DR2a did not progress towards disease at all. Then in the critical test, mice expressing both genes had an overall reduced susceptibility to disease, suggesting that DR2a modulates the impact of DR2b. One possible model for this interaction is that DR2b stimulates the production of T-cells sensitive to the antigen that ends up inducing MS, whereas DR2a suppresses or even leads to the death of these cells68 (Fig. 3). Such an interaction could help to explain why these negative effects could be segregating within human populations in the first place: under most conditions the influence of these two factors balance against each other, presumably to generate a heightened response to real pathogens. Multiple sclerosis is a complex disease with a fairly weak genetic signal69, and the epistatic effect of these two immune genes has yet to be tested in humans because natural recombinants between DR2a and DR2b have yet to be observed68. These issues only serve to highlight how difficult it can be to identify underlying complex diseases, let alone the extra complications that can arise when epistasis between two loci dramatically affects disease penetrance.
As the field moves toward whole genome association mapping, the problem of scale that pervades all interaction tests becomes intense. The total number of tests required suggests that very stringent significance thresholds will be needed to control against false positives, but this in turn means that the only epistatic effects that will be detected will have to be really gigantic and/or sample sizes will need to be very large70. Indeed larger studies are more prone to detect epistasis than smaller ones6. Suggestions to limit testing to only those QTL with significant main effects71 are probably ill-advised because (1) epistatic interactions with the largest relative effect sizes will be those with small main effects and (2) we already know that epistasis is frequently detected in the absence of main effects50,72. Carlborg and Haley6 advocate a sequential approach where potentially interesting QTL are first identified using a high false discovery rate, and then used for subsequent tests for genetic interactions. Specialized breeding designs can be used to increase the resolution and ability to detect complex interactions13,73. As above, the most productive approaches are probably going to involve coupling mapping strategies with other functional assays in an effort to focus on the interactions that are most likely to matter in practice42,43. While there is certainly strong evidence that epistasis can be important in determining variation in natural and human populations, only further detailed studies will tell us whether this is a widespread or limited phenomenon.
Where does all of this epistasis come from in the first place? Is there something about the evolution of genetic systems that yields epistasis as a by-product? Because evolutionary change is predicated on the current state of a genetic system, functional epistasis is in fact an extremely likely outcome of the evolutionary process. Since future changes are built upon past changes, the “tinkering” nature of evolution74 has the potential to build somewhat baroque systems. As solutions to one functional problem become fixed within an evolutionary lineage, future functional changes will frequently be built by adding additional elements to these existing systems, as for example when new effector molecules attach themselves to the backbone of an existing signal transduction pathway. This will be true whether or not epistatic variation is present or important while selection is operating (Fig. 4).
Under this view, evolving genetic systems are something of a house of cards. Removing one central component can bring the whole house down (i.e., be epistatic to many other genes) more because of the overall structural dependence induced by historical contingency than because it is a result of some intricately pieced together machine75. Indeed, Crow76 has conjectured that alleles with more severe effects, such as knockouts, will be more likely to display epistasis than alleles with more subtle genetic effects because larger perturbations are more likely to disrupt the overall structure of the genetic system. Thus, the fact that perturbation approaches, as outlined above, commonly reveal epistasis does not necessarily mean that the alleles responsible for evolutionary change also tend to be epistatic. Each allelic difference, including those generated via induced mutations, needs to be evaluated in its own light. Although epistasis is usually portrayed as a property of a given locus, as we have seen, it is really a property of individual alleles at multiple loci. Unfortunately, allelic variation for epistatic effects has yet to be studied in a systematic fashion2.
It is important to remember that most models of the evolution of genetic systems (such as those depicted in Figure 4) represent very simple metaphors of complex genetic phenomena. One of the problems in this approach is representing complex, multidimensional processes as three-dimensional cartoons. It is clear that taking these kinds of cartoons too literally can lead to a limited view of possible evolutionary dynamics, such as neglecting the possibility of complex ridges connecting regions of high fitness. Fisher’s77 view was the evolutionary process was so multidimensional that there will always be some axis along which selection can move a population, such that adaptive valleys, even if they exist, will be very localized in their impacts. Kauffman78 has emphasized the opposite, showing that the number of valleys can rapidly increase with increasing dimensions. To a large extent, this is an empirical question—albeit one that is extremely difficult to address adequately.
The bottom line here is that epistasis and genetic interactions are an inevitable consequence of the evolutionary process, no matter how it is conceived. This means that functional biologists have to confront the reality of complex genetic systems no matter what their ultimate cause. This is the exquisite—and sometimes frustrating—result of 3.5 billion years of descent with modification.
Epistasis can have an important influence on a number of evolutionary phenomena, including the genetic divergence between species79, the evolution of sexual reproduction4, and the evolution of the structure of genetic systems80. One of the more interesting long term questions in evolutionary biology is whether or not epistasis determines the path of evolutionary change. Although the focus here has traditionally been on interactions between disparate loci, currently the best systems for investigating this question are derived from functional studies of interactions operating within individual proteins (Box 3). Thus far, these studies81-85 have shown that epistasis can play a strong role in limiting the possible paths that evolution can take, but not in limiting its eventual outcome. Of course this might be due in part to the fact that inaccessible evolutionary outcomes may never be observed, but this in itself is an important result. These studies have been especially valuable in helping to build a bridge between the functional analysis of epistasis that has characterized molecular genetics and the long term impact of epistasis on genetic change that has characterized much of the debate in evolutionary biology.
One of the best systems for rigorously testing the functional and evolutionary consequences of epistasis is in the within-locus interactions that characterize protein folding and activity. The nicest example of this thus far comes from Ortlund et al.’s82 investigation of the evolution of novel function in vertebrate steroid receptors. The first step here was to use phylogenetic methods to reconstruct the inferred ancestral protein sequence that predates the separate evolution of the mineralcorticoid and glucocorticoid steroid receptors and to test its function81. It turns out that the ancestral protein is fairly promiscuous and interacts with a variety of steroid ligands, even with ligands not present within the ancestral organism at that time. Specialization therefore occurred via the evolution of a glucocorticoid-specific receptor from a more general mineralcorticoid ancestor, which was achieved via changes at two interacting sites, S106P and L111Q. By itself, S106P essentially destroys receptor function, while L111Q by itself has little functional effect. Together, however, S106P changes the architecture of the protein in such a way that allows L111Q move down to form a novel hydrogen bond with cortisol, which is a clear case of functional epistasis (a). Ortlund et al. call these two changes “group X”. Three more amino acid changes (group Y) are needed to yield the final specificity to cortisol, but these substitutions destabilize the protein. They must therefore be preceded by two other amino acid changes (group Z) that stable the purturbation in protein structure induced by changes in the X and Y groups. Ortlund et al. call the Z group substitutions “permissive” mutations, because they appear to have little effect on receptor function, but are a critical first step for allowing the other functional changes to occur. There is another permissive mutation, Y27R (b), which precedes all of these other changes and which generates a novel cation-π interaction (replacing a weaker hydrogen bond) that stabilizes portions of the structure that would have otherwise been destabilized by the subsequent changes.
Together, these structural interactions create a specific order in which the evolutionary substitutions must occur. There are a number of possible pathways for these changes (c), but only a few are functionally viable because the so-called “conformational epistasis” generated by structural failure of the protein limits the evolutionary options. Here the evolution is from a generalized response in the ancestor (AncGR1) to the hormones aldosterone (green), cortisone (purple) and DOC (orange) to specificity to cortisone alone (+XYZ). In this example we have a direct tie between specific amino acid changes, epistatic interactions generated by their influence on protein structure, and the impact that these interactions have on subsequent evolutionary change. Figure reprinted with permission from Ref 82.
One consequence of a systematic search for gene interactions is that the consequences of linkage may tend to be overlooked. As seen in the case of the histocompatibility loci in multiple sclerosis, linkage can facilitate the maintenance of epistatic interactions (and visa versa)86 and could help explain how molecular complexity evolves. Such linkage is self evident when looking at evolution of protein function, but recent analysis of patterns of gene regulation suggest that there can be very complex patterns of gene regulation within localized genomic regions87 that may be the result of similar types of evolutionary constraints. We need to look at interactions between promoters and coding genes, micro RNAs, chromatin remodeling, and other factors that Bateson would never have dreamed of, as being parts of epistatic networks whose evolutionary dynamics may be guided by complex sets of genetic interactions and their genomic relationship with one another.
It should be apparent that the global analysis of gene interaction patterns bears a striking resemblance to what is now called “systems biology”88. One of the central questions in this field is whether there are emergent properties of complex systems that are not predicted from looking at individual system components, yet are essential for understanding the function of the system as a whole. From an evolutionary standpoint, we might also add questions such as whether the structure of the system has evolved to facilitate these properties (e.g., robustness, modularity, evolvability33,80,89). The answers to these questions will rely on our ability to expand the use of epistasis in two directions. First, as has already been occurring in a few model systems, we need to explore more of the potential interaction space through high throughput screens of genetic interactions, transcriptional regulation, protein modification and interaction, and phenotypes. Second, we need to complete the unification of classical and statistical views of gene interaction by encouraging molecular biologists to continue to become more quantitative in their measures of genetic outcomes and evolutionary geneticists to become more mechanistic in their interpretations of evolutionary change. As this occurs, all sides of epistasis (Figure 1) should become unified through the metaphor of quantitative flow across a genetic network. This approach can be used to predict the emergence of epistasis in the traditional sense90, can facilitate the use of knockout and gain of function studies to test system-level predictions91-93, and can help direct tests that should lead to the functional nature of the interactions themselves. This quantitative detail can then be used to understand the implications of these interactions from systems and evolutionary viewpoints in order to understand the broader population level consequences of epistasis for generating differences among individuals. This is neither reductionistic nor holistic, but a powerful combination of the two. The overwhelming combinatorics of the problem is a major issue (and at some point insurmountable), so progress will ultimately need to be based on strong hypotheses generated from functional information. Given recent work in this area, it is likely that for the next 100 years the concept of epistasis will be even more central to biology than it has over the last 100.
This work was initiated while the author was a sabbatical visitor at the Gulbenkian Institute of Science. I gratefully acknowledge their support. I also deeply appreciate input from Hopi Hoekstra, Sally Otto, Joe Thornton, and three anonymous reviewers, as well as a long term synergistic interaction with Mike Whitlock. This work was supported by a fellowship from the Guggenheim Foundation by grants from the National Institutes of Health and National Science Foundation.
Patrick Phillips received his Ph.D. in evolutionary biology from the University of Chicago and did his postdoctoral work with James Crow in the Laboratory of Genetics at the University of Wisconsin, Madision. He is currently a professor of biology in the Center for Ecology and Evolutionary Biology and the Department of Biology at the University of Oregon, where his lab focuses on theoretical and empirical studies of complex traits, using the nematode C. elegans and it relatives to pursue the molecular genetics of the genotype-phenotype map for traits such as reproductive success, sexual interactions, longevity, and the behavioral response to temperature and chemicals.
Competing interests statement The author declares no competing financial interests.