|Home | About | Journals | Submit | Contact Us | Français|
Regulatory divergence is likely a major driving force in evolution. Comparative genomics is being increasingly used to infer the evolution of gene regulation. Ascomycota fungi are uniquely suited among eukaryotes for regulatory evolution studies, due to broad phylogenetic scope, many sequenced genomes, and tractability of genomic analysis. Here we review recent advances in the identification of the contribution of cis and trans factors to expression divergence. Whereas current strategies have led to the discovery of surprising signatures and mechanisms, we still understand very little about the adaptive role of regulatory evolution. Empirical studies including experimental evolution, comparative functional genomics and hybrid and engineered strains are showing early promise toward deciphering the contribution of regulatory divergence to adaptation.
Divergence in the regulatory mechanisms that control gene expression has been repeatedly postulated to play a major role in evolution. Examples of regulatory differences between species are known in a wide range of species including bacteria , fungi , flies [3,4], and mammals . However, the mechanisms through which regulatory systems evolve are still poorly understood, and in most cases the adaptive importance of regulatory changes is unknown.
Comparative genomics approaches based on whole-genome sequences of diverse organisms are being increasingly used to infer the evolution of gene regulation. These studies rely on two main strategies: (1) computational comparison of cis-regulatory organization between promoters of orthologous genes, and (2) comparative functional analysis of mRNA profiles and TF-promoter interactions measured across different organisms. The latter empirical approach, while less prevalent, is gaining increasing attention and beginning to shed light on the relation between sequence evolution, changes in gene expression and adaptation.
The Ascomycota fungi are a particularly suitable group for studies of regulatory evolution. A large number of eukaryotic species with characterized life styles belong to this monophyletic group, which spans at least 300 million years of evolution (Figure 1A). These include two extensively studied model organisms, Saccharomyces cerevisiae and Schizosaccharomyces pombe, as well as important human pathogens, such as Candida albicans. Many organisms in this group are easy to grow in the lab, and are amenable to genetic manipulation and environmental perturbations, allowing us to effectively delineate the molecular mechanisms underlying biological responses. Ascomycota genomes are small and compact enough to be computationally tractable, while still having many of the hallmarks of a eukaryotic system, thus providing an excellent model.
An unprecedented amount of genomics knowledge has been accrued on Ascomycota. On the one hand, the molecular systems in the model organism S. cerevisiae have been studied using a wide range of genomics tools, from extensive transcription profiling studies (over 2000 profiles available, ), through single cell proteomics , and large-scale screens of protein and genetic interactions [8,9]. On the other hand, sequencing and extensive analysis of over 100 genomes has delineated functional elements in specific genomes as well as global phylogenetic trends. In particular, a whole genome duplication (WGD) event has occurred in the phylogeny [10,11], and sequenced genomes are available from before and after this important event.
Finally, strong evidence suggests that regulatory changes were associated with divergence in major physiological responses among Ascomycota. For example, although central carbon metabolism follows the same general outline in all yeasts, important biochemical, genetic and regulatory variations exist. Some species, including S. cerevisiae and close relatives, follow a respiro-fermentative growth during aerobic growth on glucose (characterized by high glucose uptake, high ethanol secretion rate and low biomass yield); whereas other species (e.g. Kluyveromyces) favor respiratory growth in the same conditions (low glucose uptake and high biomass yield). A shift to a respiro-fermentative lifestyle has occurred more than once in the phylogeny, most notably following the whole genome duplication event [12,13] and independently in Schizosaccharomyces . These metabolic differences are also accompanied by divergence of gene regulation, including the introduction of a host of glucose-dependent repressive mechanisms on respiratory metabolism, the differential transcriptional regulation of isozymes , and the repression of mitochondrial biogenesis genes in log phase growth [15,16].
In this review, we focus on recent advances made in understanding the evolution of gene regulation in Ascomycota from short evolutionary timescales (hundreds of generations to 5 millions years) that are typical to intra-species variation to long timescales spanning tens of millions of years involving extensive adaptive radiation and speciation. We examine conservation and divergence at three levels of study. First, we wish to characterize and quantify the key evolutionary signatures that are observed in these species. Second, we wish to understand the molecular mechanisms, in cis and in trans, underlying these signatures. Finally, we wish to understand the relative role of neutral changes and selection in shaping conservation and divergence of regulatory systems. As we show, whereas current strategies have led to the discovery of surprising signatures and mechanisms, we still understand very little about the adaptive role of regulatory evolution. Empirical studies including experimental evolution, comparative functional genomics and hybrid and engineered strains are showing early promise toward deciphering the contribution of regulatory divergence to adaptation.
Expression profiles collected across organisms allow us to determine the extent of conservation or divergence in the mRNA levels and regulation across orthologous genes. Within Ascomycota, large compendia of mRNA profiles exist for the model organisms S. cerevisiae, S. pombe and C. albicans, whereas smaller datasets are available for other Ascomycota (e.g. other Saccharomyces , C. glabrata , K. lactis , and some Euascomycota [20,21]) as well as different S. cerevisiae strains [22–26]. Using such profiles, and a good mapping of groups of orthologous genes  we can determine the degree of expression divergence (ED) – a quantitative measure of the differences in the expression of a pair of orthologs between two species .
Interestingly, divergence in the expression of gene orthologs follows a broad functional and evolutionary dichotomy [25–28]: genes with conserved expression typically encode proteins involved in growth control and general metabolism (‘growth branch’), whereas those with divergent expression are often subtelomeric, responsive to external and internal signals (e.g. stress response) and are nonessential. This dichotomy in variation is preserved at multiple levels: from variation in isogenic cells in a population , through genetic variants of S. cerevisiae [24,26,30], to different species in the sensu stricto clade . Furthermore, it is also reflected by concomitant constraints on copy number variation at great phylogenetic distances : genes from the low-ED ‘growth’ branch have few duplication and loss events, whereas those in the high-ED ‘stress and metabolism’ end are volatile, and experience substantial variation in copy number between species. For low-ED genes, this suggests a strong selective pressure and functional constraint on the specific amount of gene products in the cell. For high-ED genes, it is tempting to conversely suggest a pressure for flexibility in gene regulation. However, this conclusion must be interpreted with care as we discuss below.
Both cis and trans regulatory mutations/polymorphisms can contribute to expression divergence. A genetic change can affect expression directly in cis, by altering transcription factor binding sites in the promoter region, changing chromatin organization or affecting mRNA stability, or indirectly, by modifying the activity of the gene product and causing expression changes through feed-back control. Alternatively, a polymorphism in one gene can affect the expression of other genes in trans (Figure 1B).
Many cis-regulatory elements are conserved in closely related species. In some cases, the specific site and its location in the promoter is conserved, a feature exploited for motif identification using alignments of orthologous regulatory regions [31,32]. In other cases, gain and loss of cis-regulatory motifs, and the potential for corresponding changes in transcription factor binding, occur on relatively short time scales (on the order of 5 – 20 my), both within and between species [28,33–36]. Doniger et al. [33,34] estimated that, of the lineage-specific binding-site losses within sensu stricto Saccharomyces, over half correspond to newly emerged binding sites in the same regulatory regions. Turnover of one binding site in a promoter for a functionally equivalent one can explain how gene expression can be maintained despite change in regulatory sequences.
In other cases the apparent loss of a binding site corresponds to loss of TF control and a change in gene expression pattern. For example, most species have enriched Rapid Growth Elements (RGE, AATTTT) upstream of all ribosomal proteins (RP), but post-WGD species that can decouple fermentation from respiration have lost the RGE sites upstream of mitochondrial RP genes , consistent with the loss of coregulation of mitochondrial function and cell growth.
More generally, several promoter components can affect the expression plasticity of a gene, including point mutations in binding sites , sequence features affecting its chromatin structure [16,17,27], and the presence of unstable tandem repeats . For example, the dichotomy between high- and low-ED genes discussed above also corresponds to distinct chromatin organization and transcriptional mechanisms. The promoters of genes with conserved expression (low-ED) have well-positioned nucleosomes, and most of their regulatory elements are situated within a substantial nucleosome free region (NFR). Their transcription is TATA-independent and they are less susceptible to chromatin remodeling. Conversely, high-ED genes are associated with promoters with more distributed nucleosomes, their transcription is TATA-dependent, and is more sensitive to chromatin remodeling. Furthermore, the sensitivity of gene’s expression level to mutation increases in the presence of a TATA box. Interestingly, recent studies have shown that promoters of high-ED genes are also associated with the presence of unstable tandem repeats , and that changes in such repeats may drive changes in nucleosome organization and gene expression. The promoters of these genes are enriched for TF binding sites resulting in more potential for combinatorial interactions proposed to enhance evolutionary divergence.
Notably, these promoter features are associated with expression variability both between isogenic cells and between genetically distinct strains and species, suggesting that both short-term ‘responsiveness’ of gene expression and long term evolvability may be inter-twined through promoter organization. However, it is unclear if the latter is the result of direct selection or is simply a by-product of the type of regulation required to respond to environmental stimuli.
Changes in trans-factors contribute to expression divergence through either a change in the factor’s responsiveness to upstream signals, binding to newly emerging sites upstream of new targets, or through the factor’s ability to bind different ‘non-canonical’ sites. There are several known cases where changes in a TF’s binding preferences co-evolved with changes in the regulatory sequences upstream of orthologous targets. For example, in vitro binding studies showed that the ancestral Rpn4 TF bound a wider set of sequences than the modern-day S. cerevisiae protein. The change in sequence specificity corresponds to changes in motif usage upstream of the target proteasome genes – S. cerevisiae targets no longer contain sites that the TF cannot bind, even though these are prominently upstream of C. albicans proteasome genes . It is unclear whether the co-evolution of Rpn4 specificity and its targets’ upstream motifs was driven by selection or emerged simply through neutral drift in one followed by co-evolution of the other.
In other cases, while the specificity of the trans-factor remains unchanged, it facilitates the acquisition of new targets under its control. For example, Borneman et al.  used ChIP-chip to examine binding of two TFs, Ste12p and Tec1p, in S. cerevisiae, S. mikatae, and S. bayanus, estimated to have diverged 20 mya. TF binding events were conserved across all three species in only 20% of promoters, suggesting substantial gain- or loss of individual targets. In many cases the loss of TF binding correlated with loss of the binding site. However, in a number of cases TF binding occurred despite absence of an identifiable underlying DNA motif, confirming that TFs can bind non-canonical sites . Such ‘promiscuous’ binding may be important for acquisition of new targets, since weak TF binding to a non-canonical sequence followed by selection for optimal binding could support the emergence of a recognizable TF element . Similar divergence (15% between S. cerevisiae, K. lactis, and C. albicans) and promiscuity was observed for the direct targets of the Mcm1 transcription factors, suggesting a general trend .
Duplication and divergence of trans factors can have a large impact on expression divergence. For example, the Yeast –specific AP-1 (YAP1) bZIP family of TFs that are conserved from yeast to human are a clear example of the special role of TF duplication in trans-divergence . Changes is specificity of the eight paralogous Yap transcription factors of S. cerevisiae is attributable to both differences in DNA binding motifs and variation in the regulatory domains that mediate response to a variety of stresses. Other factors that could contribute to changes in specificity include cooperative binding with other TFs, TF-homo- or heterodomerization, or different DNA binding kinetics.
Consistent with the major impact of promoter chromatin organization on expression divergence, several recent studies have shown that chromatin remodeling factors can have a major impact on expression divergence. For example, the changes in expression accompanying the perturbation (mutation or deletion) of various chromatin regulators revealed that many high-ED genes are markedly regulated at the chromatin level . Furthermore, much of the divergence in expression level between wild and lab strains of S. cerevisiae can be explained by trans differences in chromatin modifiers , as we discuss below.
It can be challenging to reconcile this substantial evolutionary diversity in cis- and trans-factors controlling the expression of individual genes with the functional organization of regulatory networks into co-regulated modules (‘regulons’) of functionally related genes [43–45]. Comparative studies from bacteria to yeast to human have established that modules of co-expressed genes can be highly conserved. However, how are multiple evolutionary events coordinated across dozens and hundreds of genes to sustain regulons? In some cases, changes occur in trans, thus conserving co-expression while diverging the mRNA levels of all transcripts in a module, while cis changes may primarily serve to tune membership in modules. In addition, as we discuss in a separate review  multiple forms of functional redundancy also allow for more complex divergence of regulatory mechanism while maintaining module identity.
While the examples above are instructive, they are insufficient to assess the relative importance of distinct mechanisms to expression divergence. Two types of studies have distinguished the magnitude of cis- and trans- effects on expression divergence. Within-species studies rely on expression quantitative trait loci (eQTL) analysis using segregants from a cross between distinct strains [23,42,47,48] and monitoring allele-specific expression in intraspecific hybrids [30,49]; between-species studies distinguish cis- and trans-effects by comparing interspecific hybrids to the individual species. Since these strategies rely on crosses or hybrids they are limited to the < 20 Mya scale .
The most extensive mapping of cis- and trans- effects has been conducted with a cross between a lab (BY) and wine (RM) strain of S. cerevisiae, that have substantial differences in gene expression likely due to adaptation to different niches [22,23]. Extensive eQTL analysis has shown that a large fraction (70%) of the variation in gene expression among segregants in this cross can be attributed to trans-effects. Interestingly, many of these trans-effects may involve variation in chromatin modi ers, consistent with their mechanistic role in affecting the expression of high-ED genes, as discussed above. Notably, analysis of allele-specific expression in intraspecific hybrids of two strains can provide more mechanistic information. For example, regulatory variation that acts directly in cis would result in an allele-specific expression pattern. Indeed, in the majority of cases in which the expression level of a gene is linked to it’s own locus based on the segregant analysis, its expression pattern was allele-specific in the BY and RM hybrid, indicating direct cis action due to alterations in the promoter sequence .
The effect of genetic variation on gene expression phenotypes often depends on environmental conditions. A recent study estimated the effects of gene-environment interaction (GEI) on transcript abundance by profiling expression in the segregants of the BY and RM cross in both glucose and ethanol . Numerous loci demonstrated GEI as defined by having opposite effects in glucose and ethanol. These corresponded to polymorphisms that influence trans-factors. Furthermore, genes affected by GEI were nearly twice as numerous as those with genetic-only effects  and were enriched for loci exhibiting the hallmarks of high ED genes. However, an important factor that was not considered in these studies, is transient changes in gene expression as cells transition between environments, a common ecological scenario. Indeed, a recent study in S. cerevisiae strains of different genetic backgrounds responding to heat shock found that half of the transcripts only showed GEI effects during the transition between environments but not in acclimated cells . Transcripts with persistent GEI were enriched for classic high ED genes as in previous studies, whereas those displaying transient GEI were enriched for essential genes .
The emerging field of population genomics represents further fertile ground for distinguishing the role of cis and trans variation within a species. Two recent studies determined the whole genome sequence of S. cerevisiae and S. paradoxus strains from a large variety of sources and locations [52,53]. This repository of natural variation represents a powerful tool to dissect the genetic basis of regulatory variation underlying natural phenotypic diversity. For example, a recent study  has shown that variation in sporulation efficiency between a strain isolated from an oak tree and a vineyard strain is due to allelic variation in the genes encoding the transcription factors ImeI, Rme1 and Rsf1. In this case, the interactions between alleles (epistasis) affecting transcription and hence sporulation efficiency were non-additive and complex.
Hybrids between closely related species offer a complementary approach to quantify the relative contributions of cis- and trans-factors to expression divergence. Such studies compare the expression of orthologs in each species alone to that measured for each ortholog (‘allele’) when they share a common cell in a hybrid. A recent study using an inter-species hybrid between S. cerevisiae and S. paradoxus found that cis effects dominate variation in gene expression . This is consistent with previous reports in flies  and mammals , and is in contrast to the larger contribution of trans-factors to intra-species variation [23,30,47]. In contrast, trans effects were condition-specific, primarily attributable to differential responses to sensory signals and not to variation in direct transcriptional regulators. This observation is consistent with the prominence of trans effects in GEI studies in S. cerevisiae strains [26,50,51].
What may explain the prevalence of cis variation between species and the high levels of intra-specific trans variation? One clue comes from a recent study  showing that trans variation is more subject to dominance effects than cis variation. Thus, variation due to trans-regulatory alleles is biased toward greater deviation from an additive contribution when affecting a complex phenotype than is cis variation. On shorter timescales, the pervasive pleiotropic effects and the much higher rate by which trans-variation is produced can account for the gene expression variation within populations. Over longer timescales, purifying selection could purge trans-regulatory variation. Conversely, although cis-variation is produced at a slower rate, positive selection may act more efficiently to fix cis changes due to their higher additivity and weaker pleiotropic effects. Notably, population genetic modeling of the evolutionary forces affecting the pattern of variation for the cis-regulatory QTL in the RM and BY cross  concluded that purifying selection against mildly deleterious alleles is the dominant force governing cis-regulatory evolution and found evidence that positive selection has played a role in the evolution of major trans acting QTLs.
What is the adaptive significance of regulatory divergence? How to distinguish between adaptive changes and regulatory neutrality and drift? In some cases, regulatory changes are clearly coupled to other adaptive changes in lifestyle. For example, studies comparing regulatory modules between C. albicans and S. cerevisiae showed how a specific loss of an ancestral cis-regulatory element from the promoters of genes in the mitochondrial ribosomal proteins has changed their chromatin organization and decoupled their regulation from cell growth in respiro-fermentative species that no longer relied on respiration for growth in high glucose .
In many other cases, it is unclear whether the regulatory change is adaptive or neutral. For example, Tsong et al. compared the mating transcriptional modules in C. albicans, K. lactis and S. cerevisiae, and reconstructed a series of cis- and trans- regulatory changes that have resulted in a transition from an activator-based control to a repressor-based regulation of the mating response . Since the overall regulatory logic was unchanged by this transition, one possibility is that it is a result of neutral “regulatory drift” rather than an adaptive change .
Experimental evolutionary approaches have the potential to disentangle adaptive changes and from cases where alternate mechanisms have evolved to perform the same function. Most evolutionary studies infer the trajectory of evolution from sampling variation in extant populations and thus are limited in their ability to address evolutionary dynamics. In contrast, experimental evolution studies [58–60] can observe adaptation in real time and under known selective pressures at short time scales. Recent advances in genomic technologies, in particular the advent of rapid and cheap re-sequencing  allow us to efficiently identify all the genetic changes that have occurred in evolved lines subjected to different selective pressures. By quantifying the effects of one or more genetic changes on growth and other measures of fitness and function, we can distinguish adaptive versus neutral regulatory architectures. Finally, by analyzing multiple lines evolved in parallel under the same selective pressure we can assess the range of possible evolutionary trajectories. Notably, the exceptional genetic tractability of S. cerevisiae has rendered it an excellent model for experimental evolution studies [60,62].
This power has been recently demonstrated in a study of chemostat cultures subjected to either glucose, sulfate or phosphate limitation  for ~200 generations. The genetic variation in each of the evolved strains was assessed using tiling microarrays. In addition to point mutations, the spectrum of genetic alterations included frequent genomic amplifications and rearrangements as well as retrotransposition events. Retrospective analysis of the observed frequencies of mutations over the course of evolution in the chemostat suggested that these mutations originated in the batch phase growth of the cultures prior to chemostat inoculation. When comparing multiple strains evolved under each of the selective pressures, Gresham et al. observed several distinct genotypic and phenotypic evolutionary trajectories in the glucose- or phosphate-limited environments, whereas a single trajectory dominated in the sulfate-limited populations. In all cases, adaptation to nutrient-limitation results in massive remodeling of global gene expression. Importantly, even distinct genetic changes often led to convergent mRNA profiles. For example, in several populations that were independently evolved in glucose-limited conditions HXT genes encoding high affinity glucose transporters were amplified, while in another population a retrotranposition event within the MTH1 gene (a negative regulator of glucose sensing) is the likely cause of the observed increase in the expression of several HXT genes. Furthermore, a number of other mutated loci in clones evolved in glucose-limited conditions have known roles as key regulators in carbon metabolism, suggesting a major role for trans-regulation. This is consistent with the observation that trans effects are predominant in intra-specific expression divergence suggesting this may be a major evolutionary strategy for adaptation on shorter timescales.
While novel genomic technologies will continue to fuel the great advances that have been made to the study of the evolution of gene regulation, the relative roles of selective forces driving divergence versus neutral drift remain largely theoretical. Purifying selection can be effectively invoked for the conservation of cis-regulatory elements in closely related species, and for the low-ED of genes involved in general growth processes. However, it is unclear how much of the increased divergence of highED genes is due to direct positive selection, and how much is a by-product of regulatory mechanisms required for environmental responsiveness. Understanding how changes in regulatory control relate to upstream changes in signal sensing and processing may shed light on this question, and recent studies on the evolution of signal transduction in Ascomycota [64,65] will be instrumental in this endeavor.
Most studies to date focused on the effect of trans and cis changes on transcription initiation, but divergence in mRNA levels can also be affected by changes in cis- and trans- factors that impact transcription elongation or termination and mRNA processing and stability. Furthermore, a recent comparative study has discovered that RNA interference (an RNA-silencing pathway), while absent in S. cerevisiae, is present in many other budding yeasts such as S. castellii, C. albicans and K. polysporus . Finally, gene expression is influenced by processes that are downstream of transcription such as nuclear export, translation initiation, elongation and termination, and protein degradation. For example, a recent study found that a genetic change in Mkt1, a protein that affects P-body sequestration of mRNAs encoding mitochondrial proteins, is responsible for a major change in expression between a wild and a lab strain of S. cerevisiae . Furthermore, a proteomics study in the segregants from the BY X RM cross showed that loci influencing protein abundance differed from those that affected transcripts levels highlighting the importance of direct analysis of the proteome . Evolutionary studies that address these additional levels of regulation are scarce . However, new technologies (e.g., Ingolia et. al. ) are emerging that allow genome-wide investigation of translational control and proteomic profiling hold promise for understanding the contribution of post-transcriptional regulation to evolutionary divergence in gene expression.
The foremost advantage of Ascomycota for studies of evolution of gene expression is in the facility of experiments in both model- and non-model organisms. We discern three major trends towards empirical studies of regulatory evolution. Comparative functional genomics follows in the footsteps of sequencing studies by measuring the transcriptional responses and molecular mechanisms across a set of extant species in a phylogeny, and uses these measurements and phenotypic differences to infer the history of gene regulation. Forward evolution studies focus on the immediate impact of selection, by following traces of strains collected along an experiment, and use sequencing, genetics and molecular profiles to infer regulatory evolution in ‘real time’. Finally, engineered strains and hybrids allow us to test evolutionary hypotheses and quantify the contribution of distinct factors to regulatory changes. These range in increasing evolutionary distance from eQTL mapping in segregants from crosses of distinct strains of the same species [23,42,47,48], to hybrids between species , to engineered strains swapping molecular elements from distant species for their endogenous orthologs or introducing ‘random’ engineered variation . Together with elaborate phenotyping these should allow us to decipher the functional and adaptive implications of regulatory variation.
We thank Audrey P. Gasch, Dana J. Wohlbach, the anonymous reviewers and members of the Regev lab for fruitful discussions. AR was supported by the Howard Hughes Medical Institute, the Human Frontiers Science Program, an NIH PIONEER award, a Career Award at the Scientific Interface from the Burroughs Wellcome Fund, and the Sloan Foundation.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.