|Home | About | Journals | Submit | Contact Us | Français|
We examine how different transcriptional network structures can evolve from an ancestral network. By characterizing how the ancestral mode of gene regulation for genes specific to a-type cells in yeast species evolved from an activating paradigm to a repressing one, we show that regulatory protein modularity, conversion of one cis-regulatory sequence to another, distribution of binding energy among protein-protein and protein-DNA interactions, and exploitation of ancestral network features all contribute to the evolution of a novel regulatory mode. The formation of this derived mode of regulation did not disrupt the ancestral mode and thereby created a hybrid regulatory state where both means of transcription regulation (ancestral and derived) contribute to the conserved expression pattern of the network. Finally, we show how this hybrid regulatory state has resolved in different ways in different lineages to generate the diversity of regulatory network structures observed in modern species.
In many organisms, gene regulatory networks have been shown to undergo significant divergence over evolutionary time (reviewed by Carroll, 2005; Davidson and Erwin, 2006; Doebley and Lukens, 1998; Tuch et al., 2008; Wohlbach et al., 2009; Wray, 2007). In the simplest cases, the gain or loss of a cis-regulatory sequence upstream of a single gene can produce changes in coloration, losses of ancestral anatomical features, or altered ability to digest sugars (Chan et al., 2010; Gompel et al., 2005; Tishkoff et al., 2007). Yet, it seems likely that the evolution of complex biological innovations requires concerted evolution across entire networks of genes (Lavoie et al., 2010; Lynch et al., 2011; Tuch et al., 2008). Two considerations suggest that network evolution requires mechanisms in addition to the loss and gain of single cis-regulatory sequences. First, the adaptive value of acquiring coordinated expression of a large set of genes may not be realized until all or at least a large fraction of the gene set acquires the new regulatory input. Second, expression of only a portion of the gene network could be detrimental to the fitness of the organism, for example, through the non-stoichiometric expression of components of a protein complex.
To understand the molecular events that underlie changes in the regulation of groups of genes, we investigated a transcriptional network that determines cell-type in a wide variety of fungal species. This network—comprised of the a-specific genes (asgs) and their regulators—underwent a major circuit rewiring in the hemiascomycete yeasts (Tsong et al., 2003; Tsong et al., 2006). This group of yeast includes Saccharomyces cerevisiae (the baker’s yeast), Kluyveromyces lactis (a dairy yeast), Candida albicans (the most common human fungal pathogen), and over 30 additional genome-sequenced species (Figure 1A). This lineage has been estimated to represent at least 300 million years of evolutionary time (Taylor and Berbee, 2006). Virtually all of yeast species in the hemiascomycete lineage exist in three cell types—the mating competent a and α cells and the product of their mating, the a/α cell (Figure 1B). Mating cell-type is controlled by transcriptional regulators that are encoded at the mating-type (MAT) locus (Herskowitz, 1989). These regulators control the expression of genes that are responsible for the specialized properties of each of the three cell types. The asgs are a group of seven to ten genes (depending on the species) whose key regulatory characteristic is that they are expressed in the a cell-type but not in the α and a/α cell-types (Galgoczy et al., 2004; Herskowitz, 1989; Tsong et al., 2003) (Figure 1B). The asgs encode proteins (e.g. α mating pheromone receptor, a mating pheromone, agglutinins and exporters) that are necessary for the specific properties of a cells (Herskowitz, 1989) (Madhani, 2007).
In principle, there are two ways that the asgs could be expressed in a cells but not in the other two cell types: (1) the asgs could be activated by a regulatory protein present only in a cells or (2) the asgs could be repressed by a regulator made only in α and a/α cells. In fact, both schemes are observed, the latter in S. cerevisiae and the former in C. albicans and (Strathern et al., 1981; Tsong et al., 2003). In C. albicans, the HMG domain protein a2 binds to and activates the asgs. In S. cerevisiae, the homeodomain protein α2 binds to and represses the asgs (Johnson and Herskowitz, 1985). We previously showed that the activation mode of regulation (by a2) was present in the ancestor of C. albicans and S. cerevisiae and that the switch to the repression mode (mediated by α2) occurred along the branch to S. cerevisiae (Tsong et al., 2006). Indeed, the gene encoding the a2 protein was lost from the genome in an ancestor of S. cerevisiae (Butler et al., 2004). (Figure 1C)
Here we define the evolutionary path for the switch in regulation of the asg network using a combination of bioinformatic analysis, direct experiments in the yeasts Kluyveromyces wickerhamii, Kluyveromyces lactis, and Lachancea kluyveri, ancestral protein reconstruction, and trans-species reporter gene analysis in S. cerevisiae. Our principle conclusions are as follows: First, regulatory protein modularity was crucial for the change in network regulation. In particular, protein modularity accounts for the cooption of an existing repressor for a new function (repression of the asgs) while maintaining its ancestral function. Second, the cooperative binding of transcriptional regulators facilitated the gain of the repression mode of regulation across this gene set by stabilizing early evolutionary intermediates. Third, the conversion of one cis-regulatory sequence into another occurred through an “intermediate” cis-regulatory sequence that was recognized by regulators of both the ancestral and derived regulatory modes. Fourth, the evolution of asg repression in the common ancestor of K. lactis and S. cerevisiae did not disrupt the ancestral (positive) mode of regulation, and thereby formed a “hybrid” regulatory state (Tsong et al., 2006). Finally, we show that once the hybrid regulatory network formed, it resolved in different ways along the branches to the modern yeast species: in S. cerevisiae the ancestral form was discarded, leaving only the derived form; in K. lactis the derived form was inactivated, reverting to the ancestral mode of regulation; in L. kluyveri and K. wickerhamii, aspects of the hybrid regulatory state have been maintained. Because the regulatory proteins studied here are conserved in all eukaryotes, the evolution of asg regulation can serve as a model for understanding the molecular mechanisms underlying the extraordinary flexibility of transcriptional circuits over evolutionary time.
We determined the time at which repression of the asgs arose during evolutionary time. To do this, we moved the asg regulatory sequences (from the conserved asg STE2) and the α2 proteins from a variety of species into S. cerevisiae and determined their abilities to support repression (Fig. 2A). In S. cerevisiae, α2 binds asg cis-regulatory sequences cooperatively with a MADS-box transcription regulator, Mcm1 (Figure 1C). Both proteins bind with high affinity to DNA sequences and their cooperative binding results from a relative weak protein-protein interaction (Tan and Richmond, 1998; Vershon and Johnson, 1993). The cis-regulatory sequence consists of an Mcm1 homodimer site flanked by two α2 binding sites (Keleher et al., 1988). Removal of any these four binding sites from an a-specific cis-regulatory sequence, or disruption of the protein-protein interaction, severely compromises repression (Smith and Johnson, 1994; Vershon and Johnson, 1993).
The STE2 cis-regulatory sequences from species that branch from the S. cerevisiae lineage prior to the loss of the a2 gene—such as Zygosaccharomyces rouxii, K. lactis, and Ashbya gossypii— supported levels of α2 repression comparable to the S. cerevisiae site (Figure 2A). STE2 cis-regulatory sequences taken from the Candida clade (C. albicans and Pichia membranifaciens) and the out-group species Yarrowia lipolytica failed to support repression in this assay (Figure 2A), consistent with the inference that in C. albicans and the C. albicans-S. cerevisiae ancestor, α2 does not repress the asgs (Tsong et al., 2006).
Full-length α2 ORFs from 8 species were fused to the S. cerevisiae α2 promoter and integrated into the genome in single copy (Figure 2B). α2 orthologs from species within the Kluyveromyces group repressed the asg reporter comparable to levels observed for the S. cerevisiae protein (Figure 2B). In addition, the α2 ortholog of a species (Z. rouxii) that branches within the Saccharomyces group, but prior to the loss of a2, (Figure 1A) efficiently repressed the asg reporter (Figure 2B). In contrast, α2 orthologs from Candida clade species failed to repress the reporter. The C. albicans α2 protein also failed to repress the C. albicans asg cis-regulatory sequence (Figure 2C). These results show that changes in both the asg cis-regulatory sequences and the α2 protein were both necessary for the switch in regulation and that the gain of α2 repression of the asgs clearly preceded the loss of the a2 gene.
The clear trend from these experiments is that asg cis-regulatory sequences and α2 proteins from the Saccharomyces and Kluyveromyces clades (Figure 1A) are competent to bring about repression, whereas those outside these clades are not. However, there is an important exception to this observed pattern. The K. lactis α2 protein failed to repress in this assay even though its STE2 cis-regulatory sequence is competent to bring about repression in this same assay (Figure 2B). To rule out the trivial possibility that α2 was misfolded or poorly expressed, we carried out a series of control experiments (Figure S1A). We will return to this unique feature of K. lactis later in this paper.
To investigate the molecular events that gave rise to α2 repression of the asgs, we considered first the contribution of trans changes (coding sequence mutations in α2 or Mcm1).To identify regions of the α2 protein that may have been critical for the gain of α2-mediated repression, we quantified the levels of conservation across the α2 protein (Figure 3B). The α2 protein sequences from the hemiascomycete yeasts were divided into two groups: those that diverged prior to and those that diverged after the gain of α2 repression of the asgs. In Figure 3B, high scores indicate conservation of those residues in the species group, whereas low scores indicate unconserved regions. Regions where the scores for the two groups are dissimilar reflect positions within α2 that experienced different levels of purifying selection in these two groups.
Much of the α2 protein has similar levels of conservation between the clades. This includes the 60 amino acid homeodomain (which mediates the sequence specific DNA-binding) (Hall and Johnson, 1987) and the 15 amino acid region of α2 that interacts with a1 (Mak and Johnson, 1993). DNA-binding and the interaction with a1 are functions of α2 that are required in all the clades considered, and their high sequences conservation reflects their high functional conservation. The α2 conservation traces diverged at two regions within the α2 protein, regions 1 and 3 (Figure 3A–C). Both regions displayed high levels of conservation in the Saccharomyces-Kluyveromyces lineages and low levels in the Candida lineage, implicating these regions in the evolution of α2 repression of asgs. In fact, both regions are critical for α2 repression of the asgs in S. cerevisiae; region 1 is responsible for recruiting the general repressor Tup1 (Komachi et al., 1994), and region 3 forms the interaction with Mcm1 (Tan and Richmond, 1998; Vershon and Johnson, 1993). The importance of the evolution of the Mcm1 interaction region in α2 (region 3) to the evolution of asg repression is consistent with previous work using structural homology modeling (Tsong et al., 2006).
To test these predictions directly, we designed a series of genetic swaps between the C. albicans and S. cerevisiae α2 proteins. The S. cerevisae α2 protein can be divided into five functional and structural regions (Figure 3A). We individually replaced each of these five regions of S. cerevisiae α2 with the homologous region of the C. albicans α2 protein and integrated (in single copy) the fusion proteins driven by the S. cerevisiae α2 promoter (Figure 3D). The ability of the modified α2 protein to repress expression was monitored using a reporter with a S. cerevisiae asg or haploid specific gene cis-regulatory site in the promoter.
As predicted by the bioinformatic analysis, replacement of S. cerevisiae region 1 (Tup1 interaction) or region 3 (Mcm1 interaction) by the equivalent C. albicans sequences eliminated asg repression. Also, as predicted, the swap of region 3 eliminated asg repression, but left intact the protein’s capacity for repression of the haploid specific genes. In contrast, the α2 functional region 1 swap protein (Tup1 interaction) failed to repress either the asg reporter or the haploid specific gene reporter (Figure 3D). Replacing either functional region 1 or 3 with aligning sequence from another species (Pichia pastoris) that diverged prior to the gain of α2 repression at the asgs gave similar results (Figure S1B). These observations show that the gain of asg repression required the creation of two new functional regions within α2—a region that interacts with Mcm1 and a region that interacts with Tup1. In contrast to these two regions, the rest of the S. cerevisiae α2 protein sequence could be swapped for the homologous sequence from C. albicans α2 without a substantial effect on asg repression. (Figure 3D).
Are the acquisition of the Tup1 and Mcm1 interaction regions was sufficient for α2 to acquire the capability to repress the asgs? We swapped these functional regions from S. cerevisiae α2 into the C. albicans α2 protein and measured the ability of these hybrids to repress an asg reporter. Neither region alone “rescued” the C. albicans protein; however, swapping both regions into C. albicans α2 together conferred the ability to repress the asg reporter onto the hybrid protein (Figure 3E). These results demonstrate that the failure of the C. albicans α2 protein to repress the asg reporter in S. cerevisiae reflect the inability of the protein to productively interact with both Tup1 and Mcm1. Consistent with this conclusion, swapping both of these regions into another Candida-group α2 protein (this one from P. pastoris) also conferred the ability to repress the asgs onto that hybrid protein (Figure S1C). In summary, while two regions of α2 (regions 4 & 5) have been functionally conserved over large evolutionary distances (Figure 3B & D), two other regions (regions 1 & 3) evolved more recently in the ancestor of the Saccharomyces/Kluyveromyces groups (Figure 3B–C). These two recent additions are sufficient for α2 to gain its new function. This analysis illustrates how the evolutionary history of the α2 protein gave rise to its modular structural organization.
We also determined whether changes in Mcm1—the binding partner of α2—contributed to the evolution of asg repression. To do this, we relied on ancestral gene reconstruction, an approach proven useful for testing evolutionary predictions (Thornton, 2004). The strategy depends on the accurate protein alignments of the ortholog group of interest, followed by the calculation of amino acid probabilites at each position within the ancestral protein using a species or gene tree as a guide (Figure S2). Given the strong conservation of the Mcm1 MADS-box domain, all amino acid positions could be reconstructed within this domain with high accuracy in each ancestral protein. We synthesized a series of ancestral Mcm1 proteins and replaced the endogenous S. cerevisiae Mcm1 with them. Ancestral Mcm1 proteins dating back to the divergence of S. cerevisiae-C. albicans supported repression at levels equivalent to the modern S. cerevisiae Mcm1 (Figure S2). Thus, the gain of a new interaction between α2 and Mcm1 did not require changes in Mcm1. Instead, it appears that the evolution of the new protein-protein interaction was one-sided, with all the changes occurring in a short module of α2.
Although the evolution of new protein-protein interaction modules in α2 was critical for the rewiring of the asg network, the cis-regulatory sequences of the asgs also evolved to become efficiently recognized by the α2 protein (Figure 2A). The similarities and differences between the a2-regulated (ancestral) and α2-regulated (derived) asg cis-regulatory sequences have been described (Tsong et al., 2006). The most striking similarities are the presence of a binding site for Mcm1 and the close relationship between the cis-regulatory sequences recognized by a2 and α2. Despite belonging to different transcription regulator superfamilies (HMG domain for a2 versus homeodomain for α2), both proteins recognize a core TGT sequence, with the outer nucleotides differing in their respective binding sites (Figure 3G). A major difference between the two regulatory sequences is in their symmetries. The C. albicans a2-regulated asg binding sequence contains information specifying a2 binding on only one side of Mcm1. The S. cerevisiae α2 binding sequence, however, contains information on both sides of the Mcm1 binding site, specifying the binding of an α2 monomer on either side (Johnson and Herskowitz, 1985).
In our next set of experiments, we examined in more detail the differences between the a2 and α2 recognition sequence and how the ancestral a2 site evolved to be recognized by α2. We found that S. cerevisiae α2 could repress Kluyveromyces group species asg cis-regulatory sequences even though they varied significantly from the S. cerevisiae sites (Figure 3F). In fact, α2 efficiently repressed asg cis-regulatory sequences (such as Z. rouxii STE6 and K. lactis STE2) that contained precise a2 binding sites, as assessed by the Position Specific Scoring Matrix for a2 in the Candida clade (Figure 3G). In contrast, each asg cis-regulatory sequence from a Candida group species failed to be repressed by S. cerevisiae α2 (Figure 3F), even when α2 was overexpressed (Figure S3). Thus, the ancestral asg cis-regulatory sequences (recognized by a2) must have been converted to sites recognized by α2 along the Saccharomyces-Kluyveromyces lineage. To determine the minimum number of mutations necessary to convert an a2 site to a functional α2 site, we mutated three positions (positions 6, 26 and 27), from the C. albicans RAM2 cis-regulatory site, to their counterpart in the S. cerevisiae consensus sequence. Mutation of two of these nucleotides generated a construct that could be repressed by S. cerevisiae α2 (Figure 3H). Neither of these positions is highly constrained within the Candida group (Figure 3F–G). This conversion could occur without compromising the ancestral, positive regulatory mode because both proteins recognize the same core sequence (TGT). Specific bases to the “left” of the core are required for efficient a2 binding while specific bases to the “right” are required by α2 (Figure 3F). From these experiments we conclude that (1) Candida clade a-specific cis-regulatory sequences are recognized efficiently by a2, but not α2, (2) a small number of mutations (≤ 2) can convert an a2 site to an α2 site, and (3) these mutations occurred at positions that were likely under weak constraint in the ancestor.
It is simple to envision how a couple of mutations could “convert” a single ancestral asg cis-regulatory sequence into a sequence that can be recognized by α2. However, there are at least 7 asgs in each species. And, as we discussed above, targeting of α2 to asg cis-regulatory sequences also required the evolution of a new protein-protein interaction with Mcm1. How, then, did all of the gains required for this novel regulatory scheme arise? Did the Mcm1-α2 interaction evolve before or after the cis-regulatory changes? Or, did these events occur in concert?
To explore these questions, we mimicked two possible and extreme intermediate states in this evolutionary transition: the presence of the α2-Mcm1 protein-protein interaction without the cis-regulatory changes and the cis-regulatory changes without the α2-Mcm1 interaction. To create the first state, we replaced the S. cerevisiae asg reporter with an asg cis-regulatory sequence from the Candida clade (C. albicans RAM2). For the second state, we compromised the region of the S. cerevisiae α2 protein that binds Mcm1 by substituting it with the aligning sequence in the C. albicans protein. When the C. albicans RAM2 cis-regulatory sequence was tested with wild-type S. cerevisiae α2, we did not observe repression, even when α2 was over-expressed. However, when the Mcm1 interaction region was disrupted but the S. cerevisiae cis-regulatory sequence was used, we did observe repression when α2 was overexpressed. (Figure 4A)
We next determined how the α2 protein lacking the Mcm1 interaction region could still repress an asg reporter, albeit weakly. In principle, either the “ancestral” α2 could bind the asg reporter independently of Mcm1 or Mcm1 could stabilize ancestral α2 binding through non-specific protein-protein interactions. To distinguish between the models, we tested for repression of an a-specific cis-regulatory sequence in which the Mcm1 cis-regulatory site was destroyed by mutation (Figure 4B). (Mcm1, an essential protein, cannot be deleted from the cell.) Using this reporter, overexpression of a modified α2 protein that lacks the Mcm1 interaction region failed to show any detectable repression (Figure 4B). Thus, it appears that the second model best accounts for our results: even before the evolution of a specific Mcm1-interaction region, binding of the “ancestral” α2 was stabilized by its proximity to Mcm1. These results suggest a model where the effects of fortuitous cis-mutations, which stabilized α2 binding to DNA, would have been amplified by the contribution of non-specific interactions with Mcm1 during the earliest steps in the evolution of α2 repression at the asgs.
We hypothesize that once a more optimized Mcm1-α2 protein interaction formed, α2 could have occupied cis-regulatory sequences that deviate from its preferred sequences. These types of sites may have occurred in intermediates and we modeled such an intermediate by mutating a single, key base pair in the S. cerevisiae STE2 cis-regulatory sequence. Even with a mutated α2 binding site, we find that when α2 is overexpressed, it can mediate repression, but only if the Mcm1 interaction region of α2 is present (Figure 4C). Thus, a protein-protein interaction with Mcm1 can stabilize the binding of α2 to imperfect cis-regulatory sequences; such sequences may have been present in early, evolutionary intermediates.
If these ideas are correct, then the changes in cis-regulatory sequences and the evolution of this new protein-protein interaction are linked and must have evolved together. An attractive feature of this co-evolution model is that the interaction energy needed for the α2 and Mcm1 proteins to occupy an asg cis-regulatory sequence can be distributed between the protein-protein and protein-DNA interactions, enabling all the asgs to come under weak influence by α2 and then tuned individually through changes in each gene’s cis-regulatory sequence.
The experiments described here and by Tsong et al., 2006 indicate that the control of asg expression passed through a hybrid regulatory state in which positive control by a2 and negative control α2 operated together. One can envision two, non-mutually exclusive types of such hybrid regulation. In the first, a given asg would be both repressed by α2 in α cells and activated by a2 in a cells. In the second, regulation would be at the network level; some asgs would be activated by a2 in a cells and other asgs would be repressed by α2 in α cells. Both types of hybrid regulation would ensure that each asg is expressed only in a cells. We next investigated the possibility that some form of hybrid regulation still exists in modern species. We chose to examine L. kluyveri and K. wickerhamii because both have an intact a2 gene (Butler et al., 2004), and the α2 protein of both species is able to repress a S. cerevisiae asg cis-regulatory site (Figure 1A and and2B2B).
In L. kluyveri, a genome-wide ChIP of a2 was performed in a cells (Figure 5A, C, E and S4). Ten peaks of a2 binding met our enrichment cut-offs, and six of these peaks were upstream of genes whose orthologs are asgs in either C. albicans or S. cerevisiae (AGA2, ASG7, AXL1, BAR1, STE2, and STE6) (Galgoczy et al., 2004; Tsong et al., 2003). To determine if these genes and the genes associated with the remaining four peaks are expressed in an a-specific pattern, RT-qPCR was performed using wild-type a cells and wild-type α cells (Figure S5A). We also tested the gene RAM1 because RAM1 is an asg in C. albicans (Tsong et al., 2003), and its peak of a2 binding fell just below our significance threshold. Using this data, we defined the following nine genes as L. kluyveri asgs: AGA1, AGA2, ASG7, AXL1, BAR1, RAM1, STE2, STE6, and STE14. Two of these genes, STE14 and AGA1 are asgs in L. kluyveri but not in either S. cerevisiae or C. albicans; the others are asgs in at least two of the three species. (Three genes associated with a2 binding in L. kluyveri (ELA1, TID3, and SAKL0E14784g) did not show asg expression under any condition we tested and were excluded from further tests.) Transcript levels of all nine L. kluyveri asgs were decreased when a2 was deleted (ΔMATa2), indicating that a2 activates these genes by binding to their cis-regulatory sequences (Figure 5G).
Next, full genome ChIP of myc-tagged α2 in α cells was used to ascertain its role, if any, in the regulation of asgs, in L. kluyveri (Figure 5B, D, F and Figure S4). In α cells, binding peaks were observed upstream of two genes—the asgs AGA1 and AGA2 (Figure 5B and D). These peaks are centered over the same region of DNA as the a2 binding peaks observed in a cells, showing that the two regulators associate with the same region of DNA but in different cell types. This result is consistent with the analysis described above showing that the two regulators have overlapping DNA binding specificities and each forms a protein interaction with Mcm1 (Figure 3G). To test whether AGA1 and AGA2 are repressed by α2, we performed RT-qPCR in wild-type α cells and in α2-deletion α cells (ΔM α2) (Figure 5H). The transcript abundance of AT both of these genes increased indicating that α2 represses these genes in α cells. The remaining seven asgs were also tested by RT-qPCR and determined not to be targets of α2 repression in these conditions (Figure 5H). Taken together, these results indicate that all nine of the L. kluyveri asgs are targets of direct a2 activation in a cells and that two of them are also targets of direct α2 repression in α cells. Thus, in L. kluyveri, two of the asgs are regulated in a hybrid fashion. The results also show that, for these two genes, a2 and α2 act through association with the same DNA sequence in the two cell types.
The other species chosen for this analysis, K. wickerhamii, is described in Figure S6. The results indicate that at least two asgs are regulated in a hybrid fashion in K. wickerhamii. We note that the genes that are hybrid-regulated in K. wickerhamii are not the same genes that are hybrid-regulated in L. kluyveri (summarized in Figure 7C).
In addition to changes in the overall form of regulation, we find that the asg network has gained and lost individual target genes over the hemiascomycete lineage. We believe this can be accounted for by the formation and destruction of cis-regulatory sequences. For instance, we found that STE14 is an asg in L. kluyveri but not in the other species examined and that AXL1 is an asg in many species but not S. cerevisiae (Figure 7C, Table S2 and S3 and (Booth et al., 2010; Galgoczy et al., 2004; Tsong et al., 2003)).
The dairy yeast K. lactis diverged from S. cerevisiae after the gain of asg repression, and it retains many of the cis and trans characteristics indicative of a hybrid form of regulation where both a2 with α2 are active (Tsong et al., 2006). Yet, as noted above, the K. lactis α2 protein is unable to repress the asgs when moved into S. cerevisiae (Figure 2B–C).
To determine whether α2 represses the asgs in K. lactis itself, we utilized gene expression profiling to compare transcript levels of wild-type a and wild-type α cells to Δa2 a cells and Δα2 α cells, respectively. Deletion of α2 in α cells did not have an effect on transcript levels of any of the K. lactis asgs (Figure 6E and Figure S5B) nor did it affect the expression of other genes in K. lactis (data not shown). We confirmed this result by measuring transcript levels of asgs by RT-qPCR (data not shown). In contrast, deleting a2 in a-cells resulted in decreased expression of nearly all of the K. lactis asgs (Figure 6E). Consistent with these results, a2 was found to be bound upstream of the K. lactis asgs (Figure 6A, C and data not shown) but α2 binding was not detected at the asgs or any other gene in α cells (Figure 6B, D and data not shown). (As a control, K. lactis α2 binding is observed at the haploid specific genes when α2 and a1 are expressed together (Booth et al., 2010).) Thus, although K. lactis has many of the hallmarks of hybrid regulation (in particular, its asg cis-regulatory sequences support repression by S. cerevisiae (Figure 2A), α2 does not repress the asgs in this species.
Comparison of the α2 sequences from multiple species pointed to a likely cause of the inability of the K. lactis α2 to repress the asgs: amino acid residue 136 in K. lactis is an asparagine, but in all repressing-competent α2 proteins it is a small, hydrophobic residue, either a valine or leucine (Figure 3C). This position has been shown to be important for the interaction between α2 and Mcm1 (Mead et al., 1996; Tan and Richmond, 1998). Using the S. cerevisiae reporter assay, we tested this idea explicitly and found that mutating this single residue in the K. lactis α2 protein to a valine (N136V) restored its function as a repressor (Figure 6G). The simplest interpretation of these observations is that the K. lactis α2 protein recently acquired a mutation that compromised its ability to interact with Mcm1 thereby destroying the derived (repression) mode of asg regulation and reverting to the ancestral (positive) mode. The evolutionary path by which this amino acid substitution likely occurred is explored in detail in Figure S7.
The regulation of a set of cell-type specific genes, the asgs, has changed over evolutionary time in the hemiascomycete branch of the fungal lineage. Based on data from numerous approaches, we describe the likely evolutionary path for the change in the mechanism by which the asgs are regulated. We provide strong experimental evidence for an intermediate hybrid regulatory state in which a2 and α2 both participated in the cell-type regulation of the asgs, and we show that this hybrid state resolved in several distinct ways along the lineages to modern species, generating a diversity of network structures (summarized in Figure 7A).
The gain of α2 repression at the asgs required that α2 navigate a constrained regulatory landscape. As a result, this evolutionary path exploited multiple features of the existing network that both stabilized early intermediates and limited the number of mutations required to evolve this new function. We also show that protein modularity minimized the pleiotropy of the evolved features of the new regulatory mode. This work provides both a mechanistic account of how a particular transcription regulator evolved a new function and insights into the molecular origins of the extraordinary flexibility of transcriptional regulatory network architectures that appear across modern species.
In this discussion we first outline the key features of the ancestral network that were exploited (that is, exaptations) in the evolution of α2-repression of the asgs. We next discuss the concerted changes in the cis-regulatory sequences and the trans regulators that enabled formation of the new mode of regulation. Third, we consider the consequences of the intermediate hybrid regulatory state and its role in the network diversity observed in modern species. Finally, we discuss the relative importance of adaptation and neutral drift to the diversification of gene regulatory networks.
Several key features of the derived form of regulation (repression of the asgs) were in place prior to its evolution. For instance, the new mode of regulation requires that the repressor be expressed in α and a/α cells, but not in a cells. For α2, this is true for virtually every species in the hemiascomycetes and reflects its deeply conserved function: it forms a heterodimer with a1 to regulate the haploid specific genes in a/α cells (Booth et al., 2010; Strathern et al., 1981; Tsong et al., 2003). Thus, the expression pattern necessary for α2 to act as a repressor of the asgs was already present in the ancestor.
In contrast to the popular model wherby new cis-regulatory sequences arise de novo in unused regions of promoters, α2 exploited features of the existing asg cis-regulatory sequences (Tsong et al., 2006). The monomers of a2 and α2 have related DNA-binding specificities (Figure 3G) despite belonging to different transcription regulator families (HMG box vs. homeodomain, respectively). This intrinsic overlap in DNA-binding specificities minimized the number of cis-regulatory mutations required for the transition: only two point mutations are required to convert an optimal a2 recognition sequence to an optimal α2 recognition sequence (Figure 3H). Moreover, we have shown that sequences exist in modern species that are efficiently recognized by both proteins (Figures 5, S4 and S6), thus further reducing the potential fitness barriers to this transition.
In addition to the exploitation of a2 cis-sequences, the binding of α2 to the ancestral sequences was stabilized by the presence of a neighboring DNA-bound protein, Mcm1. We provide evidence for a model where the ancestral presence of Mcm1 at the cis-regulatory sites of the asgs stabilized α2 DNA binding in early evolutionary intermediates through weak, relatively non-specific protein-protein contacts (Figure 4A and B). Subsequently, the protein-protein interaction became stronger and more specific through changes in the α2 protein, which stabilized the binding of Mcm1 and α2 to each other and to DNA. We have shown that the evolution of this specific interaction between Mcm1 and α2 was asymmetric: the α2 protein underwent numerous changes in a previously unconstrained region allowing it to recognize an existing surface of the ancestral Mcm1; therefore, no changes were necessary in Mcm1 (Figure 3B–E). Thus, from the earliest steps in this evolutionary transition, the interaction energy necessary to stabilize α2 binding was shared out between protein-protein and protein-DNA contacts. The exploitation of ancestral cis and trans features strongly guided the evolutionary trajectory of α2 (through stabilizing early intermediates) by minimizing the number of changes necessary.
Although several key network features needed for the evolution of α2-repression of the asgs were already present in the ancestor, changes in both the cis-regulatory sequences and the α2 protein needed to occur for efficient asg repression. The gain and loss of cis-regulatory sequences are readily acknowledged as major contributors to evolutionary novelty, but changes in the transcription regulators themselves are often described as less prevalent, particularly in the absence of gene duplication (Carroll, 2005; Wray, 2007). For example, it is frequently said that changes in transcription regulators will tend to be rare because they are pleiotropic—affecting the regulation of many genes simultaneously and likely disrupting existing networks.
The gain of function of α2 described here occurred within the context of a pre-existing, deeply conserved regulatory landscape: the regulation of the haploid specific genes by the a1-α2 heterodimer (Booth et al., 2010; Herskowitz, 1989; Hull and Johnson, 1999). The modularity of the α2 protein made it possible to gain a new function (repression of the asgs) without compromising its ancestral function (repression of the haploid specific genes). Indeed, it seems likely that the only permissible evolutionary trajectories for the α2 protein to gain a new function would require that its ancestral function be preserved. How did this occur?
Two regions of the α2 protein—the DNA-binding homeodomain and the a1 interaction region—are needed for its ancestral function and are preserved, in sequence and function, through stabilizing selection across the entire hemiascomycete lineage (Figure 3B & D). The protein modules that more recently evolved to make asg repression possible (regions 1 and 3, Figure 3B, C, and E) are short (~10) stretches of amino acids that developed within unconstrained regions of the ancestral protein (Figure 3B and C). The evolution of short, linear protein interaction regions spatially isolated from the ancestral functions bypassed the potential pleiotropic constraints on regulator evolution. We note that the gain of new functional modules in unused portions of the ancestral protein is akin to the acquisition of new cis-regulatory sequences at unconstrained positions in non-coding sequence. More generally, the modular structure of modern transcription regulators is likely the result of the sequential addition of new functions in previously unconstrained regions of the proteins, as described here.
As we have described, the path to the gain of α2-repression of the asgs occurred while the ancestral form of a-specific regulation (activation by a2) was still extant (Tsong et al., 2006). Thus, both forms of regulation existed together in the ancestor of the Kluyveromyes and Saccharomyces clades. We propose that this hybrid regulatory intermediate made possible the subsequent diversification of the asg regulatory network architectures without a loss in regulation. Based on evidence from several modern species, we found that the hybrid regulatory state has diversified (resolved) in three directions:
We suggest hybrid regulatory states, such as the state described here, represent ‘high potential states’ for evolutionary change as they have the ability to resolve in several directions without destroying the overall logic of regulation (Figure 7B). Akin to gene duplication, the formation of a hybrid regulatory state generates a partially redundant intermediate that allows for diversification without a loss of the original function or regulatory logic (Tanay et al., 2005). Within the hybrid regulatory state, network reversion remains a permissible evolutionary trajectory. The reversion to an ancestral regulatory mode that we have described in K. lactis is not a strict molecular reversal. Instead, the K. lactis α2 protein acquired a mutation that inactivates the derived function while maintaining its ancestral function, haploid specific gene repression as a heterodimer with a1.
Our results also show that, over the evolutionary time period considered in this paper, a subset of asgs moved in and out of the network through the gains and losses of cis-regulatory sequences (summarized in Figure 7C). Although some genes are expressed a-specifically in all species (e.g. those encoding pheromones and pheromone receptors), others are not. This implies that for the asgs to undergo a transition from one regulatory mode to another, not all genes within the network would need to experience this switch in regulation. The looser requirements for the regulation of some genes in a network may facilitate changes in the mode of regulation of a network, as not all genes would have to be carried along during the initial phases of the switch.
Selection can only act on the output of a transcription regulatory network; if an evolutionary path exists between different regulatory architectures with near-identical spatial pattern, dynamic range, and kinetics of expression, then the network can be predicted to drift between these different solutions over evolutionary time (Lynch, 2007). The hybrid state we have described spawned a range of evolutionary outcomes (activation, repression or hybrid), each with different regulatory circuit architectures. In all cases, however, the overall logic of regulation (asgs ON in a cells and OFF in the other two cell types) has been preserved. It is possible that each of the different forms of regulation we observed produce different dynamic ranges or kinetics of expression and that these qualities have been selected for on a gene-by-gene basis as different yeast species diversified. However, we favor the simpler model where the regulatory diversification following the formation of the hybrid regulatory state occurred largely through neutral, non-adaptive, drift. In other words, the network could drift between states where the dynamic range of regulation generally remained the same but the relative contributions of the ancestral and derived modes differed through the strengthening and weakening of protein-protein and protein-DNA interactions. The range of network structures observed in modern species would simply reflect the “breathing” of the hybrid regulatory network.
In contrast to the neutral model we favor for network diversification from the hybrid state, we currently favor the idea that the formation of the hybrid state was itself adaptive. For one thing, the gain of asg repression to form the hybrid state required a reasonably large number of mutational events, both in cis and trans. For instance, the gain of two new protein interaction modules within α2 (one for Tup1 and one for Mcm1) involved greater than two-dozen amino acid changes and it seems unlikely that such a large number of amino acid changes that produce a new biochemical function could have reached fixation without directional selection. We cannot know for certain what adaptive value the invention of asg repression had, if any, for the ancestor of the Kluyveromyces and Saccharomyces clades. However, in the supplemental text, we discuss a possible scenario in which the gain of repression at this gene set may have been a necessary regulatory response to another newly evolved trait in this ancestor, the gain of silent mating cassettes (Butler et al., 2004). These arguments are not conclusive, but they are consistent with the idea that positive selection played a role in the gain of α2 repression of the asgs and the formation of the hybrid intermediate, and that the successive circuit diversification was nonadaptive.
Irrespective of the potential role of selection, a hybrid regulatory state can be short-lived (as in the ancestor of S. cerevisiae) or exceedingly long-lived (as in L. kluyveri and K. wickerhamii). We propose that the creation of hybrid regulatory states serves as a general model to rationalize the many examples of network-wide transcriptional regulatory divergence that have been observed among species.
Orthologs of experimentally identified asgs (Galgoczy et al., 2004) (Tsong et al., 2003) were identified and confirmed using BLAST. To identify a Position Specific Scoring Matrix (PSSM) for α2-repression (derived), we submitted to MEME the 600 base pairs upstream of the asgs from S. cerevisiae, Saccharomyces mikatae, Saccharomyces paradoxus, and Saccharomyces bayanus. Similarly, sequences from C. albicans, Candida dubliniensis, and Candida tropicalis were used to calculate a PSSM for a2-activation (ancestral). The 600 base pairs upstream of each asg were scanned to identify the asg cis-regulatory sequences of all genome sequenced hemiascomycetes using MAST (Bailey et al., 2009). See Extended Experimental Procedures for details.
A complete list of all strains used in this study can be found in Table S5. The primers used to generate and confirm these strains are listed in Table S6. For details regarding strain and plasmid construction see Extended Experimental Procedures.
β-galactosidase assays were performed using a standard protocol (Guarente and Ptashne, 1981). Strains were grown in selective media to maintain transformed plasmids. For each strain, colonies were grown overnight, diluted, and allowed to reach late log phase. Cells were harvested and permeabilized, and activation assays were performed.
α2 orthologs were aligned using MUSCLE (Edgar, 2004). The genetic diversity spanned by the Saccharomyces-Kluyveromyces and Candida clade is similar (Taylor and Berbee, 2006), however, we removed from our analysis a subset of closely related sequences from the Saccharomyces-Kluyveromyces species to normalize the levels of conservation between the two groups. The displayed amino-acid conservation was calculated using the PAM250 amino-acid substitution matrix (Henikoff and Henikoff, 1992). The displayed curve (Figure 3B) has been smoothed by averaging each conservation score with the scores of adjacent residues. See Extended Experimental Procedures for details.
RNA was isolated from yeast cultures using hot phenol/chloroform extraction. cDNA was prepared using SuperScript II (Invitrogen). Additional details can be found in the Extended Experimental Procedures.
K. lactis cDNA was hybridized to a custom Agilent array. All data has been deposited in NCBI GEO at accession number (GSE39027). cDNA labeling, hybridization and data analysis are described in the Extended Experimental Procedures.
C-terminally myc tagged a2 and α2 proteins were created for ChIP. Tagged (experimental) and untagged (control) strains were grown, harvested and lysed. Chromatin was precipitated with commercially available anti-myc or anti-HA antibodies. The DNA was amplified, labeled and competitively hybridized to custom Agilent tiling oligonucleotide arrays. Display, analysis and identification of binding events were performed with MochiView (Homann and Johnson, 2010). Details are found in the Extended Experimental Procedures. Data has been deposited in NCBI GEO at accession numbers GSE38919 for K. lactis and (GSE39007) for L. kluyveri.
A complete list of all primers used for qPCR is found in Table S6.
We are grateful to Chiraj Dalal, Clarissa Nobile, and Brian Tuch for critically reading the manuscript and Jennifer Garcia, Victor Hanson-Smith, Oliver Homann, Quinn Mitrovich, and Brian Tuch for their generous and helpful advice concerning experimental and computational procedures. We thank Peter Philippsen for kindly sharing genomic DNA and sequences from Ashbya gossypii. This work was supported by RO1 GM057049 from the National Institute of Health.
C.R.B. performed reporter assays and the Mcm1 ancestral gene reconstruction. L.N.B. performed experiments in L. kluyveri and K. wickerhamii. L.N.B. and T.R.S. performed K. lactis experiments. Data was analyzed and computation experiments performed by C.R.B., L.N.B., and T.R.S. All authors contributed to the design of the study and wrote the paper.
The authors declare no conflict of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.