|Home | About | Journals | Submit | Contact Us | Français|
That regulatory evolution is important in generating phenotypic diversity was suggested soon after the discovery of gene regulation. In the past few decades, studies in animals have provided a number of examples in which phenotypic changes can be traced back to specific alterations in transcriptional regulation. Recent advances in DNA sequencing technology and functional genomics have stimulated a new wave of investigation in simple model organisms. In particular, several genome-wide comparative analyses of transcriptional circuits across different yeast species have been performed. These studies have revealed that transcription networks are remarkably plastic: large scale rewiring in which target genes move in and out of regulons through changes in cis-regulatory sequences appears to be a general phenomenon. Transcription factor substitution and the formation of new combinatorial interactions are also important contributors to the rewiring. In several cases, a transition through intermediates with redundant regulatory programs has been suggested as a mechanism through which rewiring can occur without a loss in fitness. Because the basic features of transcriptional regulation are deeply conserved, we speculate that large scale rewiring may underlie the evolution of complex phenotypes in multi-cellular organisms; if so, such rewiring may leave traceable changes in the genome from which the genetic basis of functional innovation can be detected.
One and a half centuries have passed since Darwin wrote in The Origin of Species “our ignorance of the laws of variation is profound”. Despite enormous progress in understanding these “laws”, we are still grappling with important aspects of this issue. For many genetic circuits that have evolved to different forms in extant species, little is known about the evolutionary pathways connecting the ancestral and modern forms. Our ignorance of the laws of variation and evolutionary pathways make it impossible to predict (in a probabilistic sense) evolutionary outcomes.
Addressing these challenges will necessarily require a deep understanding of how gene regulatory networks have evolved. Genes do not function in isolation; rather, they interact with each other to form complex networks that respond to environmental inputs and developmental programs. Such networks determine the complex relationship between genotype and phenotype, and may severely constrain the possible variations observed in nature. As a consequence, the emergence of complex phenotypes that distinguish one species from the next likely requires coordinated changes of many network components, as well as the regulatory relationships between them.
Organisms devote a significant fraction of their genomes to producing proteins and RNAs, and to cis-regulatory sequences that specify when and where the expression of each gene should be turned on or off. Although there are many forms of this regulation, we will focus on transcription networks, in particular sequence-specific DNA-binding proteins and the cis-regulatory sequences they recognize.
The importance of regulatory evolution in driving phenotypic diversity was recognized soon after the discovery of gene regulation. Jacob and Monod [1,2] speculated on the role of the operator sequence mutations in evolution. Britten and Davidson  proposed that repetitive sequences may drive evolutionary novelty by reshaping the genomic regulatory program. King and Wilson  argued, based on the observation that homologous protein sequences in human and chimp are very similar, that “regulatory mutations may account for their biological differences”. In the past decade, studies of single genes in animals have demonstrated that changes in transcriptional regulation can underlie important morphological and physiological changes (see  for a review). These examples include lactase persistence in human subpopulations [6,7]; the reduction in pelvic armor of the fresh water stickleback fish relative to their marine form ; changes in insect wing morphology and coloring pattern [9,10]; and differences in trichomes among flies . It has been argued, based on examples and on theoretical grounds, that certain types of phenotypic change are more likely to result from cis-regulatory changes rather than from coding changes. Consistent with the important roles transcriptional regulation plays during development, it was observed that regulatory mutations occur with high frequency in morphological changes [12,13].
Recent advances in DNA sequencing technology and functional genomics have led to new investigations into the evolution of transcriptional networks in simple model organisms. For several reasons, yeasts have turned out to be particularly useful for such investigations. First, the genomes of a large number of yeast species, covering a wide range of evolutionary distances, have been sequenced, making it possible to carry out detailed and informative comparative sequence analysis. Second, it is relatively simple to make genetic manipulations in many yeast species. Third, the small size of the genome and well-defined regulatory regions allow accurate mapping of cis-regulatory sequences via functional genomics and bioinformatics. Fourth, because yeasts do not undergo complex developmental programs, their transcription circuits are often simpler than those of animals and plants. Finally, the relatively short generation times make it feasible to carry out in vitro evolution experiments under controlled environments [14–16].
In the last few years, a number of transcriptional circuits have been characterized in different yeast species. These studies have led to some new insights into the evolution of transcription networks. From these studies, it has become clear that transcription networks are surprisingly plastic, with large-scale rewiring being common. A number of recent reviews, each with a different perspective, have been devoted to this topic [17–19]. Here, we describe some new findings and some emerging themes. Rather than providing a comprehensive account, we will focus on a few selected examples to illustrate a common phenomenon or potentially general mechanism. We first describe a few scenarios for network rewiring (using examples from yeasts and, in some cases, animals), followed by a discussion of potential evolutionary pathways that may connect the ancestral form to the extant forms. At the end, we speculate on what we have learned from yeast that may help us to understand evolution of transcription networks in general.
In the past few years, a number of studies have used a combination of global gene expression profiling, genome-wide chromatin immunoprecipitation (ChIP) followed by microarray or DNA sequencing, and bioinformatic analysis to characterize and compare transcriptional circuits in different fungal species. The transcriptional circuits analyzed are responsible for regulating a wide range of biological processes, including ribosomal gene expression, galactose metabolism, amino acid biosynthesis, cell-cycle control, and cell-type control. A general observation from these studies is that the transcriptional circuits are plastic over evolutionary times, leading to significant difference among the modern species.
Transcription networks can be rewired through cis-regulatory mutations, for example mutations that create or destroy a binding site, through protein-coding (trans) changes that either alter the binding specificity of a transcription factor or change its interaction with other co-factors, or by combination of the two. In the following, we describe a few general scenarios of rewiring from the perspective of a regulon. We use ‘regulon’ to refer to a set of target genes directly recognized and thereby regulated by a transcription factor or combination of transcription factors. In a given species, the target genes of a regulon often have related functions and exhibit coherent expression under a number of different conditions.
In this scenario, the network evolution is mainly driven by changes in cis-regulatory sequences of target genes (Figure 1). Studies both in yeasts and in animals have provided many examples of this. The conservation of the transcription factor itself can be detected by simple sequence analysis; the conservation of its binding specificity can be detected by comparative ChIP analysis in different species, or through de novo motif discovery in the promoters of orthologous regulon members [20,21]. Borneman et al.  analysed Ste12 and Tec1, two transcription factors known to cooperatively regulate pseudohyphal growth, in three closely related species — Saccharomyces cerevisiae, Saccharomyces mikatae, and Saccharomyces bayanus, separated by ~20 million years of divergence — and found that only one-third of transcription factor target connections seen in one species are also conserved in the other two.
Similar observations have been made in comparisons of mice and humans. For example, a comparative analysis of the binding profiles of four liver-specific transcription factors in mouse and human hepatocytes revealed that a large fraction of the binding events are species-specific; that is, a gene bound by a transcription factor in one species is not necessarily bound by the orthologous factor in the other species. . Such changes of regulon membership are mainly due to cis-regulatory mutations: a human chromosome in mouse hepatocytes can recapitulate most of the binding patterns observed in human hepatocytes, arguing that the ‘trans environment’ is conserved . A recent intra-species comparison of binding of RNA polymerase II and Nfkb in several humans also found significant differences that are associated with single-nucleotide polymorphisms (SNPs) and genomic structural variants .
In another recent study, Bradley et al.  analyzed the genome-wide binding patterns of six transcription factors involved in initiating segmentation in two closely related fly species. They found that quantitative variation in binding is common and is attributable to the gain and loss of cognate recognition sequences for the factors.
From all these studies, it is clear that a considerable amount of cis-regulatory sequence variation exists between closely related species and even among the individuals of the some species. Such cis-regulatory variation is a major contributor to the divergence of gene expression patterns, as was also demonstrated by several studies that compared the allele-specific expression of an inter- (or intra-) species hybrid to that of their parents [27–29].
Comparative gene expression profiling and sequence analysis has revealed examples where the regulon structure and expression pattern are conserved but the factors that regulate them have changed: the substitution of one factor for another has occurred (Figure 2). An early example was observed in the transcription circuits regulating mating type. A special set of genes, called the a-specific genes, are expressed in a-cells but not in α-cells (a and α cells are the two mating forms). Tsong et al. [30,31] found that the regulation of a-specific genes is implemented differently in S. cerevisiae compared to Candida albicans: in C. albicans, a-specific genes are turned off by default in α cells and induced by the transcriptional activator a2 in a cells; in S. cerevisiae, a-specific genes are on by default in a cells, and turned off by the transcriptional repressor α2 in α cells. Thus, there has been a handoff from one regulator to another (in this case, the two regulators are structurally unrelated), and the form of control has changed from positive (in the ancestor) to negative (in modern S. cerevisiae). The overall output of the circuit, however, has remained the same: a-specific genes are expressed in a cells but not in α or a/α cells.
Transcription factor substitution has also been observed in the regulation of highly conserved metabolic pathways. A remarkable example is the rewiring of the transcriptional circuitry regulating the expression of ribosomal protein genes. Given their high abundance and important functions, it is not surprising that these genes are tightly co-regulated [32,33]. However, the transcriptional circuits that regulate such a highly conserved cellular machine turn out to be plastic, with large-scale rewiring having occurred in different species. Earlier bioinformatic analyses of ribosomal gene promoters identified different enriched motifs in different species [20,21], suggesting that they may be regulated by different regulators. Using a combination of genetics, expression profiling, and ChIP-chip analysis, Hogues et al.  established that, in C. albicans, the ribosomal genes are controlled by Tbf1 in conjunction with Cbf1 , while it is known that in S. cerevisiae Rap1 is the major regulator of these genes [35,36]. Motif analysis across yeast lineages suggests that the regulation by Cbf1–Tbf1 is the ancestral mode, while regulation by Rap1 is a new innovation in the S. cerevisiae branch .
Lavoie et al.  recently performed a systematic analysis of a set of regulators known to be involved in ribosomal gene regulation either in S. cerevisiae or C. albicans, and mapped the genomic locations of the orthologous factors in both species. This study not only confirmed the hand off of ribosomal genes from one set of regulators (Tbf1/Cbf1) to another (Hmo1/Rap1), but also revealed a broad range of reorganization in which a factor lost the control of one set of genes but gained control of another set of genes with different function. For example, Tbf1 in S. cerevisiae lost the control of ribosomal genes but gained control of cell cycle and telomere related genes .
Another example of transcription factor substitution in a highly conserved metabolic system came from the study of the regulation of galactose metabolism. In S. cerevisiae, the presence of galactose (and the absence of glucose) induces the transcription of genes that produce galactose metabolism enzymes via the transcription factor Gal4 binding to its well characterized cis-regulatory sequence. In C. albicans, the same enzymes are induced by galactose, but regulated through a different cis-regulatory sequence recognized by an as yet unknown transcriptional regulator that is not Gal4. The C. albicans Gal4 ortholog has, in turn, been co-opted to regulate genes unrelated to galactose metabolism [38,39].
Combinatorial regulation is a common theme in eukaryotic transcriptional circuits, as transcription factors often work in different combinations to regulate different sets of genes under different conditions. Many combinatorial interactions are due to direct protein–protein contacts between sequence-specific DNA binding proteins. These interactions are often much weaker than the protein–DNA interactions. It is therefore not surprising that changes in the interactions between transcription factors play an important role in transcriptional rewiring. Comparative analysis of mating type control indicates that the handoff in the regulation of the a-specific genes (discussed above) involves the formation of a new combinatorial interaction between α2 and the general regulatory protein Mcm1 [30,31].
Analysis of the full Mcm1 circuit across species provided more evidence for transcriptional rewiring via changes in combinatorial interactions. In S. cerevisiae, Mcm1 is constitutively expressed and works with different partners to regulate different biological processes, including mating type specification, cell cycle, and arginine metabolism. To investigate the evolution of regulons defined by Mcm1 and its partners, Tuch et al.  performed ChIP-chip analysis in three different species — S. cerevisiae, Kluyveromyces lactis and C. albicans — and found large-scale turnover of target genes within many regulons. In addition, new regulons appear to have formed by new combinatorial interactions along several different branches of the yeast lineage. For example, it was found that most ribosomal protein genes in K. lactis are bound by Mcm1, and Mcm1 binding sites are positioned with fixed orientation and preferred distance to the Rap1 binding sites, suggesting that Rap1 and Mcm1 have formed a new interaction in K. lactis.
The formation of new (and the breaking of old) interactions between transcription factors may be a general mechanism for rewiring transcriptional circuits, as it could ‘jump start’ the rewiring of a set genes while maintaining their coordinated regulation. After a new interaction forms, the circuit can be improved through cis-regulatory changes target gene by target gene (Figure 3).
In a systematic analysis of physical interactions between transcription factors in humans and mice, Ravasi et al.  found several hundred interactions in each species, with only half of them present in both. Although it is unclear to what extent these differences contribute to the differences in the transcription networks in the two species, the results support the idea that combinatorial interactions can change considerably over evolutionary timescales. ‘Trans-changes’ that alter combinatorial regulation were also observed in an intra-species comparison. In a recent analysis of the binding of the transcription factor Ste12 in the segregants of a cross between two diverged S. cerevisiae strains, Zheng et al.  found extensive variations among individuals that were mapped to both cis and trans changes. Two genes (one encoding a transcription factor) that vary in different strains and modulate Ste12 binding to the promoters of a number of targets were identified.
For simplicity, we divided observed wiring changes into three basic types. In reality, these mechanisms probably work in concert. For example, transcription factor substitution with conserved regulation (scenario 2) may transition through an intermediate with redundant regulation, which could be facilitated by the formation of a new interaction (scenario 3). We also note that it is often difficult, from the available data, to accurately classify a rewriring event. For example, if a target gene loses a cis-regulatory sequence, it could have no consequence (the site was not functional), it could signify that the gene moved out of the regulon (scenario 1), or it could mean that the gene remained in the regulon but underwent transcription factor substitution (scenario 2). These possibilities can be resolved by direct experimentation, but this type of additional analysis is typically absent from genome-wide studies.
Once the transcription circuits in different species have been described and differences identified, it is often possible to infer the ancestral circuit through comparative genome analysis across many fungal lineages. An important and challenging question concerns the possible evolutionary pathways that connect the ancestral circuit to the extant circuits. If evolutionary pathways for a number of cases can be inferred, it may be possible to derive some general rules of transcription network evolution.
Inferring evolutionary pathways is challenging even for the evolution of a single protein, as the number of possible pathways increases exponentially with the number of mutations. For example, Weinreich et al.  analyzed the evolution of antibiotic resistance in the bacterial protein β-lactamase, which requires five point mutations. They enumerated all possible pathways and showed that only a small number of pathways have no fitness barrier, suggesting that the actual evolutionary pathway may be severely constrained. For network rewiring the problem is even more challenging as such rewiring often involves both trans-changes and cis-changes in a large number of genes; in many cases the full range of changes is unknown.
For most of the examples of rewiring described above, the evolutionary pathways are very difficult to infer as few traces of the intermediates remain in a modern species. In a few cases, however, an extant species appears to have retained a circuit that resembles, at least in some regard, a transition intermediate. If the information of potential intermediates is combined with the constraint that the evolutionary pathway should have no severe fitness barrier, it is possible to suggest some plausible pathways. For the change in regulation of the a-specific genes discussed above, it was suggested that the evolutionary pathway proceeded through intermediates with redundant regulation such that the regulatory logic was maintained throughout the transition .
Rewiring through a redundant intermediate was also suggested for the regulation of ribosomal genes. From a bioinformatic analysis of the cis-regulatory sequences in the promoters of ribosomal genes across species, Tanay et al.  suggested that the regulation switched from a homo-D motif in Schizosaccharomyces pombe to the Rap1 motif in S. cerevisiae through a redundant intermediate where both binding sites are present in close proximity, as still observed in several extant species.
Evolutionary novelty can clearly arise from the rewiring of gene regulatory networks that produce new expression patterns. Here we have described some examples in which comparative analyses of transcriptional circuits in different yeast species have revealed surprising patterns of large-scale rewiring, and have allowed changes in whole networks (rather than in the regulation of single genes) to be monitored. Compared to the evolution of a single protein, the evolution of transcription networks has a number of distinct features. At the level of cis-regulatory architecture, many different configurations of the promoters/enhancers can confer the same spatial-temporal expression pattern; thus, for a given expression pattern, the number of possible solutions is enormous. For example, studies in flies showed that stabilizing selection can maintain the same expression pattern while still allowing for considerable drift in cis-regulatory sequence [44–46]. Similar observations have been made in a wide range of species (see  for a review).
At the network level, some regulatory tasks can be accomplished by different network architectures/topologies [47–49]. As a simple example, condition-specific expression can be achieved either by an activator or a repressor . When the combinatorial regulation by many transcription factors is included, the potential solution space for a specific regulatory task can be enormous. Although it is clear that different species can use different solutions to accomplish the same regulatory task, in most cases little is known about the pathways through which these solutions have evolved from the same ancestral circuit (if indeed, they are homologous) and the selective pressures (if any) underlying the different pathways.
Deciphering the pathways of network evolution is daunting, as the fitness landscape is complex with many potential fitness barriers between possible solutions. In addition, transcription networks are highly connected — changing one part probably affects other parts. Although the yeast studies have hinted at a few possible evolutionary pathways, they have raised some difficult questions For example, we have discussed that transition through redundant intermediates may reduce fitness barriers because the regulon members remain connected by at least one factor (Figure 2). However, in the intermediate states where some of the target genes of the regulon are controlled by one factor and others by two (Figure 2), the quantitative balance of the regulon would be expected to be disrupted. Is this expectation correct? And, if so, how important is its consideration? We also note that at each step of rewiring, the fitness cost probably depends on previous steps. Similar to protein evolution, epistasis between regulon members may severely constrain possible mutational paths and lead to a preferred order of rewiring. At present, we do not how to rigorously apply this idea to cases of network evolution.
In addition to cis-regulatory changes, it is expected that many large-scale rewirings will also involve trans-changes that allow a regulator to respond to different environmental inputs, to make new combinatorial connections, and to acquire and lose target genes. However, such trans-changes may have pleiotropic effects as a result of the connectivity of the network. In the scheme illustrated in Figure 3, the recruitment of TF2 to the targets of TF1 via the evolution of new interactions may lead to crosstalk between TF1 and the original targets of TF2. How would such a problem be resolved? One possibility is that the advantage of rewiring simply outweighs the cost due to the crosstalk. Once the rewiring is completed, the crosstalk can be eliminated by further adjustments (for example, passing the original TF2 targets to a new factor). Alternatively, if the functions of the two regulons are related, the cross-regulation may be nearly neutral and perhaps even advantageous. In that case, one would predict that transcription factor substitution is not random but instead typically happens between factors regulating coupled cellular processes.
Much of the large scale rewiring observed in yeasts is likely to be adaptive, although the selective pressures that might underlie such changes are not known with any certainty. For example, it is hard to imagine that the rewiring of a large number of ribosomal gene promoters was due purely to random drift. In some cases, as in the a-specific genes, different circuits seem to yield identical logic, at least qualitatively. However, there may still be quantitative differences, for example, in the dynamic range of the regulation or the speed of the response. Nevertheless, inferences of adaptive network evolution should be treated with caution: the vast number of possible solutions can facilitate network evolution through non-adaptive processes. By analogy to thermodynamics, the many possible solutions can contribute a large ‘entropy’, making network drift unavoidable. It is therefore likely that some of the rewiring that has been observed may simply arise through non-adaptive process of genetic drift [50,51]. There is certainly no justification to assume, without additional evidence, that a change in a transcription circuit is adaptive. It remains a great challenge to understand the connectivity and the fitness landscape of the space of seemingly equivalent solutions.
Gene and genome duplication can be an important driving force for evolutionary novelty (see  for a review). For example, Bridgham et al.  showed that the evolution of a specific hormone/receptor pair originated from the duplication of an ancient receptor with broad specificity. An ancestor of S. cerevisiae underwent a whole-genome duplication, and recent genome-wide studies indicate that this event may have provided opportunities for the evolvolution of novel regulation patterns through the divergence of the promoters of the duplicate gene pairs [54,55]. For example, the S. cerevisiae Gal 3 (a co-inducer of the GAL genes) and Gal1 (the first enzyme in the pathway that converts galactose to glucose) arose from the same ancestral gene as a result of the whole-genome duplication. Hittinger et al.  provided convincing evidence that the divergence of the two promoters of these genes (coupled with diversification of their coding sequenes) allowed high basal level expression of the inducer and tight regulation of the galactokinase — requirements that cannot be simultaneously satisfied by the single ancestral promoter.
A recent study of ribosomal gene regulation has provided an example of duplication and functional divergence of a regulator. Wapinski et al.  showed that two ribosomal gene regulators in S. cerevisiae, the activator Ifh1 and the repressor Crf1, were derived from the duplication and subsequent specialization of an ancestral gene. The retention of Crf1 seems to correlate with the retention of the duplicated ribosomal genes, and the authors argued that the specialization of Crf1 as a repressor allowed tighter control of ribosomal genes when the expression burden is high under stress conditions. These examples show that divergence in cis or trans elements after gene duplication can lead to new patterns of gene expression. Although genome duplication may facilitate large-scale changes, it is not a prerequisite, as several examples have been documented in pre-duplication lineages. Genome duplications have also occurred in plant and animal lineages [58–63], and it will be of great interest to understand how these have reshaped the corresponding transcription networks.
In the yeast studies mentioned above, one typically starts with a transcriptional circuit that is characterized in one species and maps the differences across species in a relatively unbiased way. This approach is complementary to studies (particularly in animals), where a known phenotypic difference (intra- or inter-species) is traced to genetic changes at specific loci. One advantage of the unbiased approach is that it gives a global view of the rewiring of the circuit. An important limitation is that, unless explicitly investigated, the range of phenotypic differences produced by the rewiring is unknown. Nevertheless, the abundance of network rewiring observed in fungi raises the possibility — since the basic components of transcription circuits are conderved — that this phenomenon also applies to higher eukaryotes. Perhaps more importantly, if such large-scale rewiring exists, is it necessary for the evolution of complex phenotypes?
We do not know the answers, but there are some intriguing hints. A recent study  of the evolution of wing color patterns in flies found that the elaborated spot pattern of Drosophila guttifera evolved via coordinated cis and trans changes: cis-regulatory changes of yellow and other pigmentation genes to put them under the control of the transcription factor Wingless, and trans changes leading to the co-option of Wingless expression at new sites, utilizing the pre-existing positional information. In another example, Konopka et al.  analyzed the transcriptional targets of the human and chimp versions of the transcription factor Foxp2 in human neuronal cells. Foxp2 has been implicated in the development of the human ability to speak language (mutations of Foxp2 cause a severe speech and language disorder [65,66]), and the gene shows signs of accelerated evolution in the human lineage [67,68]. Although the human and chimp versions of the protein differ by only two amino acids, Konopka et al.  found that they induce expression of different sets of genes; moreover, the expression differences correlate with genes differentially expressed in the human and chimp brain . It is unclear whether the trans-changes (the two amino acid mutations) are sufficient to explain the human-specific gene expression or whether coordinated cis-changes are also required, but it may turn out that large-scale rewiring involving both trans- and cis-changes are needed for the human-specific expression pattern.
The insights gained from the yeast studies may prove particularly valuable in understanding the basic constraints and possible pathways underlying transcription network rewiring. The massive circuit rewiring, the handover of regulation from one transcription factor to another through redundant intermediates, the evolution of new combinatorial interactions followed by sweeping changes of promoter elements are likely to be general hallmarks of circuit rewiring in many species. If so, such pathways of rewiring may leave traceable evidence from which we can detect the genetic basis of new functional innovation in specific lineages.
We thank Xin He and Chris Fuller for helpful comments and critical reading of the manuscript. This work was supported by grants from the National Institute of Health to H.L. and A.D.J., and the Packard Fellowship in Science and Engineering to H.L.