The Ascomycete yeasts present one of the most promising systems for comparative functional genomics. Fungi have been densely sampled by a number of sequencing projects
], covering an enormous range of divergence. Genome sequence analyses of the Saccharomyces
yeasts and related species have been used to establish the history of gene duplication
], conservation at binding sites
], and co-evolution of binding sites with regulators
]. Thus, a range of evolutionary phenomena can be studied in these species based on their genomic sequence. However, sequence conservation is not always completely predictive of functional conservation. As just one example, we recently reported that only a subset of conserved promoter motifs actually drive periodic gene expression over the cell cycle in two closely related species
Most of the experimental characterization of gene function has been performed in a small number of model fungal systems, which can provide an anchor for these broad genome sequencing surveys. These species include Saccharomyces cerevisiae
, Neurospora crassa, Candida albicans
, and Schizosaccharomyces pombe
, along with several other emerging models such as Ashbya gossypii
. Comparative studies between these species, which by some estimates cover a billion years of divergence, have been informative
]. Analysis of gene expression changes over growth
], the cell cycle
], and stress treatments
] highlighted both similarities and differences in ortholog expression. Unfortunately, the ability to link individual gene expression divergence with the causative molecular factors has been limited because of the vast evolutionary distances involved.
Experimental protocols developed in the model systems are often readily portable to less well-studied sister species, allowing us to choose species well-placed to identify and study functional divergence. Comparisons of gene expression across particular species with interesting characteristics can not only highlight how patterns of gene expression change over evolutionary time, but can also discover genes with particular functions. A comparison among xylose-metabolizing species of yeasts, for example, was able to couple sequence analysis with gene expression profiling to identify important genes via their presence in the genomes of interest and their induction when grown on xylose
]. Followup studies in S. cerevisiae
confirmed these associations.
Due to their close proximity to S. cerevisiae,
studies in the sensu stricto
yeasts have also been particularly informative. These species cover a range of conservation, have high quality annotated reference genomes
], and are becoming even more attractive as the sequences of many strains within each species are forthcoming using new high-throughput sequencing tools (e.g.
]). Furthermore, their ability to form interspecific hybrids leverages the resources available in S. cerevisiae
and allows tests of gene function and regulation in shared cellular environments
]. Recent work on expression-based full-genome characterization is reported in
], which used S. cerevisiae
microarrays to measure the gene expression consequences of heat shock stress and mating induction on three other yeast species. Their data suggest that expression divergence can occur relatively rapidly and is correlated to gene function, though relatively uncorrelated to sequence conservation
]. Due to the S. cerevisiae
arrays used, they were unable to examine more divergent species. In order to broaden these studies to more divergent yeasts, species-specific arrays must be used, as has been done, for example, for Candida glabrata
]. Most importantly, due to the limited condition space of just a small number of treatments in these studies, conclusions about evolution of gene function and regulation have been difficult to generalize.
To address these challenges, we previously developed a computational framework
] to identify a set of experiments that could best characterize gene function in a naive species. Based on available expression date in the S. cerevisiae
literature, we identified and carried out a set of 304 experiments over 46 conditions in the sensu stricto
species S. bayanus var uvarum.
By choosing only the most informative experiments from the vast S. cerevisiae
literature, we were able to survey a large phenotypic space at high accuracy with a modest amount of experimentation.
To compare these expression datasets more carefully, we developed a statistical metric, Local Network Similarity (LNS), to assess correlation patterns of orthologs. This metric is general and robust – it can be used for analysis of individual matched datasets without the need to assume identical response time for the two species, or for integrated analysis of diverse compendia of experimental or genetic perturbations. Using the LNS metric to compare our large S. bayanus
expression compendium with a collection of published S. cerevisiae
expression data, we show that gene expression networks are largely conserved between the species, though much less than within-species comparisons constructed by comparing different conditions. Furthermore, we demonstrated strong and statistically significant evidence for correlation between the divergence of expression and open reading frame sequence, which previous studies using more limited datasets failed to detect (see review
]). Despite this general conservation pattern, we observed that a quarter of orthologs exhibit condition-specific differences in expression, and 4% show strong differences in global co-expression patterns. Genes involved in the same functional groups share similar divergence patterns, indicating that pathways or processes may share characteristics. In sum, our wide-ranging survey of expression profiles and generic metric of expression divergence allowed us to identify both global and local aspects of regulatory evolution and relate these to sequence divergence.